Fcs Notes
Fcs Notes
Science
Computer Science Tripos Part IA
Lawrence C Paulson
Computer Laboratory
University of Cambridge
2 Recursive Functions 14
4 Lists 36
5 More on Lists 46
6 Sorting 56
This course has two aims. The first is to teach programming. The second
is to present some fundamental principles of computer science, especially
algorithm design. Most students will have some programming experience
already, but there are few people whose programming cannot be improved
through greater knowledge of basic principles. Please bear this point in mind
if you have extensive experience and find parts of the course rather slow.
The programming in this course is based on the language ML and mostly
concerns the functional programming style. Functional programs tend to
be shorter and easier to understand than their counterparts in conventional
languages such as C. In the space of a few weeks, we shall be able to cover
most of the forms of data structures seen in programming. The course also
covers basic methods for estimating efficiency.
Learning Guide. Suggestions for further reading, discussion topics, ex-
ercises and past exam questions appear at the end of each lecture. Extra
reading is mostly drawn from my book ML for the Working Programmer
(second edition), which also contains many exercises. You can find relevant
exam questions in the Part IA papers from 1998 onwards. (Earlier papers
pertain to a predecessor of this course.)
Thanks to David Allsopp, Stuart Becker, Gavin Bierman, Chloë Brown,
Silas Brown, Qi Chen, David Cottingham, Daniel Hulme, Frank King, Joseph
Lord, James Margetson, David Morgan, Frank Stajano and Assel Zhiyen-
bayeva for pointing out errors in these notes. Please inform me of further
errors and of passages that are particularly hard to understand. If I use your
suggestion, I’ll acknowledge it in the next printing.
Reading List
My own book is not based on these notes, but there is some overlap. The
Hansen/Rischel and Ullman books are good alternatives. The Little MLer
is a rather quirky tutorial on recursion and types. See Introduction to Algo-
rithms for O-notation.
Computers: a child can use them; NOBODY can fully understand them!
Example I: Dates
Computers have integers like 1066 and reals like 1.066 × 103 .
Floating point numbers are what you get on any pocket calculator. In-
ternally, a float consists of two integers: the mantissa (fractional part) and
the exponent. Complex numbers, consisting of two reals, might be provided.
We have three levels of numbers already!
Most computers give us a choice of precisions, too. In 32-bit precision,
integers typically range from 231 − 1 (namely 2,147,483,647) to −231 ; reals
are accurate to about six decimal places and can get as large as 1035 or so.
For reals, 64-bit precision is often preferred. How do we keep track of so
many kinds of numbers? If we apply floating-point arithmetic to an integer,
the result is undefined and might even vary from one version of a chip to
another.
Early languages like Fortran required variables to be declared as integer
or real and prevented programmers from mixing both kinds of number in
a computation. Nowadays, programs handle many different kinds of data,
including text and symbols. Modern languages use the concept of data type
to ensure that a datum undergoes only those operations that are meaningful
for it.
Inside the computer, all data are stored as bits. Determining which type
a particular bit pattern belongs to is impossible unless some bits have been
set aside for that very purpose (as in languages like Lisp and Prolog). In
most languages, the compiler uses types to generate correct machine code,
and types are not stored during program execution.
I Foundations of Computer Science 5
application
high-level language
operating system
Slide 104
device drivers, . . .
machine language
gates
silicon
These are just some of the levels that might be identified in a computer.
Most large-scale systems are themselves divided into levels. For example,
a management information system may consist of several database systems
bolted together more-or-less elegantly.
This is the programmer’s view. The user sees a different hierarchy, de-
termined by the separate application programs, the operating system, and
the visible hardware such as the screen and DVD-writer. Home comput-
ers are possible because the user’s view can be so much simpler than the
programmer’s.
Communications protocols used on the Internet encompass several layers.
Each layer has a different task, such as making unreliable links reliable (by
trying again if a transmission is not acknowledged) and making insecure
links secure (using cryptography). It sounds complicated, but the necessary
software can be found on many personal computers.
In this course, we focus almost entirely on programming in a high-level
language: ML.
I Foundations of Computer Science 6
What is Programming?
We shall look at all these points during the course, though programs will be
too simple to have much risk of getting the requirements wrong.
I Foundations of Computer Science 7
Abstraction in Programming
Floating-Point, Revisited
Slide 107 Von Neumann doubted whether its benefits outweighed its COSTS !
Lessons:
It is interactive.
val pi = 3.14159;
> val pi = 3.14159 : real
area 2.0;
> val it = 12.56636 : real
toSeconds (5,7);
> val it = 307 : int
fromSeconds it;
> val it = (5, 7) : int * int
Given that there are 60 seconds in a minute, how many seconds are
there in m minutes and s seconds? Function toSeconds performs the trivial
calculation. It takes a pair of arguments, enclosed in brackets.
We are now using integers. The integer sixty is written 60; the real
sixty would be written 60.0. The multiplication operator, *, is used for
type int as well as real: it is overloaded. The addition operator, +, is
also overloaded. As in most programming languages, multiplication (and
division) have precedence over addition (and subtraction): we may write
secs+60*mins instead of secs+(60*mins)
The inverse of toSeconds demonstrates the infix operators div and mod,
which express integer division and remainder. Function fromSeconds returns
a pair of results, again enclosed in brackets.
Carefully observe the types of the two functions:
toSeconds : int * int -> int
fromSeconds : int -> int * int
The underlined symbols val and fun are keywords: they may not be used
as identifiers. Here is a complete list of ML’s keywords.
abstype and andalso as case datatype do else end eqtype exception
fn fun functor handle if in include infix infixr let local
nonfix of op open orelse raise rec
sharing sig signature struct structure
then type val where while with withtype
The negation of x is written ~x rather than -x, please note. Most lan-
guages use the same symbol for minus and subtraction, but ML regards all
operators, whether infix or not, as functions. Subtraction takes a pair of
numbers, but minus takes a single number; they are distinct functions and
must have distinct names. Similarly, we may not write +x.
Computer numbers have a finite range, which if exceeded gives rise to an
Overflow error. Some ML systems can represent integers of arbitrary size.
If integers and reals must be combined in a calculation, ML provides
functions to convert between them:
real : int -> real convert an integer to the corresponding real
floor : real -> int convert a real to the greatest integer not exceeding it
For more details on ML’s syntax, please consult a textbook. Mine [13]
and Wikström’s [16] may be found in many College libraries. Ullman [15],
in the Computer Lab library, is also worth a look.
Exercise 1.1 One solution to the year 2000 bug involves storing years as
two digits, but interpreting them such that 50 means 1950 and 49 means
2049. Comment on the merits and demerits of this approach.
Exercise 1.2 Using the date representation of the previous exercise, code
ML functions to (a) compare two years (b) add/subtract some given number
of years from another year. (You may need to look ahead to the next lecture
for ML’s comparison operators.)
x0 = 1
xn+1 = x × xn .
The function npower raises its real argument x to the power n, a non-
negative integer. The function is recursive: it calls itself. This concept
should be familiar from mathematics, since exponentiation is defined by the
rules shown above. The ML programmer uses recursion heavily.
For n ≥ 0, the equation xn+1 = x × xn yields an obvious computation:
x3 = x × x2 = x × x × x1 = x × x × x × x0 = x × x × x.
The equation clearly holds even for negative n. However, the corresponding
computation runs forever:
Now for a tiresome but necessary aside. In most languages, the types of
arguments and results must always be specified. ML is unusual in providing
type inference: it normally works out the types for itself. However, sometimes
ML needs a hint; function npower has a type constraint to say its result is
real. Such constraints are required when overloading would otherwise make
a function’s type ambiguous. ML chooses type int by default or, in earlier
versions, prints an error message.
Despite the best efforts of language designers, all programming languages
have trouble points such as these. Typically, they are compromises caused
by trying to get the best of both worlds, here type inference and overloading.
II Foundations of Computer Science 15
An Aside: Overloading
are fine provided x and y have the same type and equality testing is possible
for that type.1
Note that x <> y is ML for x 6= y.
1 All the types that we shall see for some time admit equality testing. Moscow ML
allows even equality testing of reals, which is forbidden in the latest version of the ML
library. Some compilers may insist that you write Real.==(x,y).
II Foundations of Computer Science 16
if b then x else y
not(b) negation of b
Slide 203
p andalso q ≡ if p then q else false
p orelse q ≡ if p then true else q
A Boolean-valued function!
Mathematical Justification:
x1 = x
x2n = (x2 )n
x2n+1 = x × (x2 )n .
Expression Evaluation
E0 ⇒ E1 ⇒ · · · ⇒ En ⇒ v
fun nsum n =
if n=0 then 0
else n + nsum (n-1);
Slide 206
> val nsum = fn: int -> int
nsum 3 ⇒3 + nsum 2
⇒3 + (2 + nsum 1)
⇒3 + (2 + (1 + nsum 0))
⇒3 + (2 + (1 + 0)) ⇒ . . . ⇒ 6
A classic book by Abelson and Sussman [1] used iterative to mean tail-
recursive. It describes the Lisp dialect known as Scheme. Iterative functions
produce computations resembling those that can be done using while-loops
in conventional languages.
Many algorithms can be expressed naturally using recursion, but only
awkwardly using iteration. There is a story that Dijkstra sneaked recursion
into Algol-60 by inserting the words “any other occurrence of the procedure
name denotes execution of the procedure”. By not using the word “recur-
sion”, he managed to slip this amendment past sceptical colleagues.
Obsession with tail recursion leads to a coding style in which functions
have many more arguments than necessary. Write straightforward code first,
avoiding only gross inefficiency. If the program turns out to be too slow,
tools are available for pinpointing the cause. Always remember KISS (Keep
It Simple, Stupid).
I hope you have all noticed by now that the summation can be done even
more efficiently using the arithmetic progression formula
1 + · · · + n = n(n + 1)/2.
II Foundations of Computer Science 22
a/xi + xi
xi+1 =
2
fun nextApprox (a,x) = (a/x + x) / 2.0;
Slide 209 > nextApprox = fn : real * real -> real
nextApprox (2.0, 1.5);
> val it = 1.41666666667 : real
nextApprox (2.0, it);
> val it = 1.41421568627 : real
nextApprox (2.0, it);
> val it = 1.41421356237 : real
Exercise 2.4 Functions npower and power both have type constraints, but
only one of them actually needs it. Try to work out which function does not
need its type constraint merely by looking at its declaration.
Exercise 2.5 Recalling that a let declaration declares values that are visible
in a limited scope, explain what the following function computes.
fun findRoot (a, x0) =
let fun next x = (a/x + x) / 2.0
in next (next (next x0))
end;
III Foundations of Computer Science 25
This table (excerpted from Aho et al. [2, page 3]) illustrates the effect
of various time complexities. The left-hand column indicates how many
milliseconds are required to process an input of size n. The other entries
show the maximum size of n that can be processed in the given time (one
second, minute or hour).
The table illustrates how large an input can be processed as a function
of time. As we increase the computer time per input from one second to one
minute and then to one hour, the size of the input increases accordingly.
The top two rows (complexities n and n lg n) increase rapidly: for n, by
a factor of 60. The bottom two start out close together, but n3 (which grows
by a factor of 3.9) pulls well away from 2n (whose growth is only additive).
If an algorithm’s complexity is exponential then it can never handle large
inputs, even if it is given huge resources. On the other hand, suppose the
complexity has the form nc , where c is a constant. (We say the complexity
is polynomial.) Doubling the argument then increases the cost by a constant
factor. That is much better, though if c > 3 the algorithm may not be
considered practical.
III Foundations of Computer Science 27
Comparing Algorithms
for some constant c and all sufficiently large n. The conjunction of f (n) =
O(g(n)) and f (n) = Ω(g(n)) is written f (n) = Θ(g(n)).
People often use O(g(n)) as if it gave a tight bound, confusing it with
Θ(g(n)). Since O(g(n)) gives an upper bound, if f (n) = O(n) then also
f (n) = O(n2 ). Tricky examination questions exploit this fact.
III Foundations of Computer Science 29
To say that O(cn ) is contained in O(dn ) means that the former gives a tighter
bound than the latter. For example, if f (n) = O(2n ) then f (n) = O(3n )
trivially, but the converse does not hold.
III Foundations of Computer Science 30
O(1) constant
O(log n) logarithmic
Slide 306
O(n) linear
O(n2 ) quadratic
O(n3 ) cubic
T (n + 1) = T (n) + 1 linear
T (n + 1) = T (n) + n quadratic
T (0) = 1
T (n + 1) = 2T (n) + 1
Slide 309
Explicit solution: T (n) = 2n+1 − 1
T (n + 1) = 2T (n) + 1
= 2(2n+1 − 1) + 1 induction hypothesis
= 2n+2 − 1
Now we analyze the function nthApprox given at the start of the lecture.
The two recursive calls are reflected in the term 2T (n) of the recurrence. As
for the constant effort, although the recursive case does more work than the
base case, we can choose units such that both constants are one. (Remember,
we seek an upper bound rather than the exact cost.)
Given the recurrence equations for T (n), let us solve them. It helps if we
can guess the closed form, which in this case obviously is something like 2n .
Evaluating T (n) for n = 0, 1, 2, 3, . . . , we get 1, 3, 7, 15, . . . . Obviously
T (n) = 2n+1 − 1, which we can easily prove by induction on n. We must
check the base case:
T (0) = 21 − 1 = 1
In the inductive step, for T (n + 1), we may assume our equation in order to
replace T (n) by 2n+1 − 1. The rest is easy.
We have proved T (n) = O(2n+1 − 1), but obviously 2n is also an upper
bound: we may choose the constant factor to be two. Hence T (n) = O(2n ).
The proof above is rather informal. The orthodox way of proving f (n) =
O(g(n)) is to follow the definition of O notation. But an inductive proof of
T (n) ≤ c2n , using the definition of T (n), runs into difficulties: this bound is
too loose. Tightening the bound to T (n) ≤ c2n − 1 lets the proof go through.
III Foundations of Computer Science 34
This recurrence equation arises when a function divides its input into
two equal parts, does O(n) work and also calls itself recursively on each.
Such balancing is beneficial. Instead dividing the input into unequal parts
of sizes 1 and n − 1 gives the recurrence T (n + 1) = T (n) + n, which has
quadratic complexity.
Shown on the slide is the result of substituting the closed form T (n) =
cn lg n into the original equations. This is another proof by induction. The
last step holds provided c ≥ 1.
Something is wrong, however. The base case fails: if n = 1 then cn lg n =
0, which is not an upper bound for T (1). We could look for a precise closed
form for T (n), but it is simpler to recall that O notation lets us ignore a
finite number of awkward cases. Choosing n = 2 and n = 3 as base cases
eliminates n = 1 entirely from consideration. The constraints T (2) ≤ 2c lg 2
and T (3) ≤ 3c lg 3 can be satisfied for c ≥ 2. So T (n) = O(n log n).
Incidentally, in these recurrences n/2 stands for integer division. To be
precise, we should indicate truncation to the next smaller integer by writing
⌊n/2⌋. One-half of an odd number is given by ⌊(2n+1)/2⌋ = n. For example,
⌊2.9⌋ = 2, and ⌊n⌋ = n if n is an integer.
III Foundations of Computer Science 35
Exercise 3.1 Add a column to the table shown in Slide 302 with the heading
60 hours.
Exercise 3.2 Find an upper bound for the recurrence given by T (1) = 1
and T (n) = 2T (n/2) + 1. You should be able to find a tighter bound than
O(n log n).
Exercise 3.3 Try the proof of T (n) ≤ c2n − 1 mentioned after slide 309.
What does it say about c?
Exercise 3.4 Show that if f (n) = O(a1 g1 (n) + · · · + ak gk (n)) then f (n) =
O(g1 (n) + · · · + gk (n)).
is O(n log n). The notation ⌈x⌉ means truncation to the next larger integer;
for example, ⌈3.1⌉ = 4. (Hint: Note that ⌈x⌉ ≤ x + 1 and that lg(x + 1) ≤
lg x + 1 for x ≥ 1.) (Difficult!)
IV Foundations of Computer Science 36
Lists
[3,5,9];
> [3, 5, 9] : int list
Slide 401
it @ [2,10];
> [3, 5, 9, 2, 10] : int list
List notation
The operator ::, called cons (for ‘construct’), puts a new element on to
the head of an existing list. While we should not be too preoccupied with
implementation details, it is essential to know that :: is an O(1) operation.
It uses constant time and space, regardless of the length of the resulting
list. Lists are represented internally with a linked structure; adding a new
element to a list merely hooks the new element to the front of the existing
structure. Moreover, that structure continues to denote the same list as it
did before; to see the new list, one must look at the new :: node (or cons
cell) just created.
Here we see the element 1 being consed to the front of the list [3,5,9]:
:: → · · · :: → :: → :: → nil
↓ ↓ ↓ ↓
1 3 5 9
Given a list, taking its first element (its head ) or its list of remaining elements
(its tail) also takes constant time. Each operation just follows a link. In the
diagram above, the first ↓ arrow leads to the head and the leftmost → arrow
leads to the tail. Once we have the tail, its head is the second element of the
original list, etc.
The tail is not the last element; it is the list of all elements other than
the head!
IV Foundations of Computer Science 38
fun hd (x::l) = x;
> Warning: pattern matching is not exhaustive
> val hd = fn : ’a list -> ’a
tl [7,6,5];
> val it = [6, 5] : int list
There are three basic functions for inspecting lists. Note their polymor-
phic types!
null : ’a list -> bool is a list empty?
hd : ’a list -> ’a head of a non-empty list
tl : ’a list -> ’a list tail of a non-empty list
The empty list has neither head nor tail. Applying either operation to
nil is an error—strictly speaking, an exception. The function null can be
used to check for the empty list before applying hd or tl.
To look deep inside a list one can apply combinations of these functions,
but this style is hard to read. Fortunately, it is seldom necessary because of
pattern-matching.
The declaration of null above has two clauses: one for the empty list
(for which it returns true) and one for non-empty lists (for which it returns
false).
The declaration of hd above has only one clause, for non-empty lists.
They have the form x::l and the function returns x, which is the head. ML
prints a warning to tell us that calling the function could raise exception
Match, which indicates failure of pattern-matching.
The declaration of tl is omitted because it is similar to hd. Instead,
there is an example of applying tl.
IV Foundations of Computer Science 39
fun nlength [] = 0
| nlength (x::xs) = 1 + nlength xs;
> val nlength = fn: ’a list -> int
Slide 404
nlength[a, b, c] ⇒ 1 + nlength[b, c]
⇒ 1 + (1 + nlength[c])
⇒ 1 + (1 + (1 + nlength[]))
⇒ 1 + (1 + (1 + 0))
⇒ ... ⇒ 3
These are two declarations, not one. First we declare nlength to be a func-
tion that handles only empty lists. Then we redeclare it to be a function
that handles only non-empty lists; it can never deliver a result. We see that
a second fun declaration replaces any previous one rather than extending it
to cover new cases.
Now, let us return to the declaration shown on the slide. The length
function is polymorphic: it applies to all lists regardless of element type!
Most programming languages lack such flexibility.
Unfortunately, this length computation is naı̈ve and wasteful. Like nsum
in Lect. 2, it is not tail-recursive. It uses O(n) space, where n is the length
of its input. As usual, the solution is to add an accumulating argument.
IV Foundations of Computer Science 40
The recursive calls do not nest: this version is iterative. It takes O(1) space.
Obviously its time requirement is O(n) because it takes at least n steps to
find the length of an n-element list.
IV Foundations of Computer Science 41
fun nrev [] = []
| nrev(x::xs) = (nrev xs) @ [x];
> val nrev = fn: ’a list -> ’a list
Slide 407
0 + 1 + 2 + · · · + n = n(n + 1)/2.
It is easy to see that this reverse function performs just n conses, given an
n-element list. For both reverse functions, we could count the number of
conses precisely—not just up to a constant factor. O notation is still useful
to describe the overall running time: the time taken by a cons varies from
one system to another.
The accumulator y makes the function iterative. But the gain in com-
plexity arises from the removal of append. Replacing an expensive operation
(append) by a series of cheap operations (cons) is called reduction in strength,
and is a common technique in computer science. It originated when many
computers did not have a hardware multiply instruction; the series of prod-
ucts i × r for i = 0, . . . , n could more efficiently be computed by repeated
addition. Reduction in strength can be done in various ways; we shall see
many instances of removing append.
Consing to an accumulator produces the result in reverse. If that forces
the use of an extra list reversal then the iterative function may be much
slower than the recursive one.
IV Foundations of Computer Science 44
Exercise 4.1 Code a recursive function to compute the sum of a list’s ele-
ments. Then code an iterative version and comment on the improvement in
efficiency.
Exercise 4.3 Code a function to return the list consisting of the even-
numbered elements of the list given as its argument. For example, given
[a, b, c, d] this it should return [b, d].
V Foundations of Computer Science 46
Applications of take and drop will appear in future lectures. Typically, they
divide a collection of items into equal parts for recursive processing.
The special pattern variable _ appears in both functions. This wildcard
pattern matches anything. We could have written i in both positions, but
the wildcard reminds us that the relevant clause ignores this argument.
Function take is not iterative, but making it so would not improve its
efficiency. The task requires copying up to i list elements, which must take
O(i) space and time.
Function drop simply skips over i list elements. This requires O(i) time
but only constant space. It is iterative and much faster than take. Both
functions use O(i) time, but skipping elements is faster than copying them:
drop’s constant factor is smaller.
Both functions take a list and an integer, returning a list of the same
type. So their type is ’a list * int -> ’a list.
V Foundations of Computer Science 47
Linear Search
All the list functions we have encountered up to now have been polymor-
phic, working for lists of any type. Function member uses linear search to
report whether or not x occurs in l. Its polymorphism is restricted to the
so-called equality types. These include integers, strings, booleans, and tuples
or lists of other equality types.
Equality testing is not available for every type, however. Functions are
values in ML, and there is no way of comparing two functions that is both
practical and meaningful. Abstract types can be declared in ML, hiding
their internal representation, including its equality test. Equality is not even
allowed for type real, though some ML systems ignore this. We shall discuss
function values and abstract types later.
If a function’s type contains equality type variables, such as ’’a, ’’b,
then it uses polymorphic equality testing.
V Foundations of Computer Science 49
Equality Polymorphism
Function inter computes the ‘intersection’ of two lists, returning the list
of elements common to both. It calls member. The equality type variables
propagate: the intersection function also has them even though its use of
equality is indirect. Trying to apply member or inter to a list of functions
causes ML to complain of a type error. It does so at compile time: it detects
the errors by types alone, without executing the offending code.
Equality polymorphism is a contentious feature. Some researchers com-
plain that it makes ML too complicated and leads programmers to use linear
search excessively. The functional programming language Haskell general-
izes the concept, allowing programmers to introduce new classes of types
supporting any desired collection of operations.
V Foundations of Computer Science 50
If the lists are of unequal length, zip discards surplus items at the end of
the longer list. Its first pattern only matches a pair of non-empty lists. The
second pattern is just a wildcard and could match anything. ML tries the
clauses in the order given, so the first pattern is tried first. The second only
gets arguments where at least one of the lists is empty.
V Foundations of Computer Science 51
Given a list of pairs, unzip has to build two lists of results, which is
awkward using recursion. The version shown about uses the local declaration
let D in E end, where D consists of declarations and E is the expression
that can use them.
Note especially the declaration
val (xs,ys) = unzip pairs
which binds xs and ys to the results of the recursive call. In general, the
declaration val P = E matches the pattern P against the value of expres-
sion E. It binds all the variables in P to the corresponding values.
Here is a version of unzip that replaces the local declaration by a function
(conspair) for taking apart the pair of lists in the recursive call. It defines
the same computation as the previous version of zip and is possibly clearer,
but not every local declaration can be eliminated as easily.
fun conspair ((x,y), (xs,ys)) = (x::xs, y::ys);
Making the function iterative yields revUnzip above, which is very sim-
ple. Iteration can construct many results at once in different argument posi-
tions. Both output lists are built in reverse order, which can be corrected by
reversing the input to revUnzip. The total costs will probably exceed those
of unzip despite the advantages of iteration.
V Foundations of Computer Science 52
The till has unlimited supplies of coins. The largest coins should be tried
first, to avoid giving change all in pennies. The list of legal coin values, called
till, is given in descending order, such as 50, 20, 10, 5, 2 and 1. (Recall
that the head of a list is the element most easily reached.) The code for
change is based on simple observations.
Let us generalize the problem to find all possible ways of making change,
returning them as a list of solutions. Look at the type: the result is now a
list of lists.
> change : int list * int -> int list list
Exercise 5.2 Code a function that takes a list of integers and returns two
lists, the first consisting of all nonnegative numbers found in the input and
the second consisting of all the negative numbers.
Exercise 5.3 How does this version of zip differ from the one above?
fun zip (x::xs,y::ys) = (x,y) :: zip(xs,ys)
| zip ([], []) = [];
Exercise 5.5 Show that the number of ways of making change for n (ignor-
ing order) is O(n) if there are two legal coin values. What if there are three,
four, . . . coin values?
VI Foundations of Computer Science 56
a few applications:
2C(n) ≥ n!,
therefore C(n) ≥ log(n!) ≈ n log n − 1.44n
Never mind how this works, but note that generating statistically good ran-
dom numbers is hard. Much effort has gone into those few lines of code.
VI Foundations of Computer Science 58
Insertion Sort
fun insort [] = []
| insort (x::xs) = ins(x, insort xs);
Items from the input are copied one at a time to the output. Each new
item is inserted into the right place so that the output is always in order.
We could easily write iterative versions of these functions, but to no
purpose. Insertion sort is slow because it does O(n2 ) comparisons (and a
lot of list copying), not because it is recursive. Its quadratic runtime makes
it nearly useless: it takes 174 seconds for our example while the next-worst
figure is 1.4 seconds.
Insertion sort is worth considering because it is easy to code and illus-
trates the concepts. Two efficient sorting algorithms, mergesort and heap-
sort, can be regarded as refinements of insertion sort.
The type constraint :real resolves the overloading of the <= operator;
recall Lect. 2. All our sorting functions will need a type constraint some-
where. The notion of sorting depends upon the form of comparison being
done, which in turn determines the type of the sorting function.
VI Foundations of Computer Science 59
Slide 604
• Divide: partition the input into two sublists:
– those at most a in value
– those exceeding a
• Conquer using recursive calls to sort the sublists
• Combine the sorted lists by appending one to the other
fun quick [] = []
| quick [x] = [x]
| quick (a::bs) =
Slide 605 let fun part (l,r,[]) : real list =
(quick l) @ (a :: quick r)
| part (l, r, x::xs) =
if x<=a then part(x::l, r, xs)
else part(l, x::r, xs)
in part([],[],bs) end;
Our ML quicksort copies the items. It is still pretty fast, and it is much
easier to understand. It takes roughly 0.74 seconds to sort rs, our list of
random numbers.
The function declaration consists of three clauses. The first handles the
empty list; the second handles singleton lists (those of the form [x]); the
third handles lists of two or more elements. Often, lists of length up to five
or so are treated as special cases to boost speed.
The locally declared function part partitions the input using a as the
pivot. The arguments l and r accumulate items for the left (≤ a) and right
(> a) parts of the input, respectively.
It is not hard to prove that quicksort does n log n comparisons, in the
average case [2, page 94]. With random data, the pivot usually has an
average value that divides the input in two approximately equal parts. We
have the recurrence T (1) = 1 and T (n) = 2T (n/2) + n, which is O(n log n).
In our example, it is about 235 times faster than insertion sort.
In the worst case, quicksort’s running time is quadratic! An example is
when its input is almost sorted or reverse sorted. Nearly all of the items
end up in one partition; work is not divided evenly. We have the recurrence
T (1) = 1 and T (n + 1) = T (n) + n, which is O(n2 ). Randomizing the input
makes the worst case highly unlikely.
VI Foundations of Computer Science 61
Append-Free Quicksort
The list sorted accumulates the result in the combine stage of the quick-
sort algorithm. We have again used the standard technique for eliminating
append. Calling quik(xs,sorted) reverses the elements of xs and prepends
them to the list sorted.
Looking closely at part, observe that quik(r,sorted) is performed first.
Then a is consed to this sorted list. Finally, quik is called again to sort the
elements of l.
The speedup is significant. An imperative quicksort coded in Pascal
(taken from Sedgewick [14]) is just slightly faster than function quik. The
near-agreement is surprising because the computational overheads of lists ex-
ceed those of arrays. In realistic applications, comparisons are the dominant
cost and the overheads matter even less.
VI Foundations of Computer Science 62
Merging means combining two sorted lists to form a larger sorted list. It
does at most m + n comparisons, where m and n are the lengths of the input
lists. If m and n are roughly equal then we have a fast way of constructing
sorted lists; if n = 1 then merging degenerates to insertion, doing much work
for little gain.
Merging is the basis of several sorting algorithms; we look at a divide-
and-conquer one. Mergesort is seldom found in conventional programming
because it is hard to code for arrays; it works nicely with lists. It divides the
input (if non-trivial) into two roughly equal parts, sorts them recursively,
then merges them.
Function merge is not iterative; the recursion is deep. An iterative version
is of little benefit for the same reasons that apply to append (Lect. 4).
VI Foundations of Computer Science 63
fun tmergesort [] = []
| tmergesort [x] = [x]
| tmergesort xs =
Slide 608
let val k = length xs div 2
in merge(tmergesort (take(xs, k)),
tmergesort (drop(xs, k)))
end;
Mergesort’s divide stage divides the input not by choosing a pivot (as
in quicksort) but by simply counting out half of the elements. The conquer
stage again involves recursive calls, and the combine stage involves merging.
Function tmergesort takes roughly 1.4 seconds to sort the list rs.
In the worst case, mergesort does O(n log n) comparisons, with the same
recurrence equation as in quicksort’s average case. Because take and drop
divide the input in two equal parts (they differ at most by one element), we
always have T (n) = 2T (n/2) + n.
Quicksort is nearly 3 times as fast in the example. But it risks a quadratic
worst case! Merge sort is safe but slow. So which algorithm is best?
We have seen a top-down mergesort. Bottom-up algorithms also exist.
They start with a list of one-element lists and repeatedly merge adjacent
lists until only one is left. A refinement, which exploits any initial order
among the input, is to start with a list of increasing or decreasing runs of
input items.
VI Foundations of Computer Science 64
Exercise 6.2 Implement selection sort (see previous exercise) using ML.
Exercise 6.4 Implement bubble sort (see previous exercise) using ML.
VII Foundations of Computer Science 66
An Enumeration Type
The list shown on the slide represents a bicycle, a Reliant Robin and a large
motorbike. It can be almost seen as a mixed-type list containing integers
and booleans. It is actually a list of vehicles; datatypes lessen the impact of
the restriction that all list elements must have the same type.
VII Foundations of Computer Science 69
Exceptions in ML
NONE signifies error, while SOME x returns the solution x. This approach
looks clean, but the drawback is that many places in the code would have to
check for NONE.
VII Foundations of Computer Science 72
exception Change;
fun change (till, 0) = []
Slide 707 | change ([], amt) = raise Change
| change (c::till, amt) =
if amt<0 then raise Change
else (c :: change(c::till, amt-c))
handle Change => change(till, amt);
> val change = fn : int list * int -> int list
change([5,2],6)
5::change([5,2],1) handle C=>change([2],6)
Slide 708 5::(5::change([5,2],˜4) handle C=>change([2],1))
handle C=>change([2],6)
5::change([2],1) handle C=>change([2],6)
5::(2::change([2],˜1) handle C=>change([],1))
handle C=>change([2],6)
5::(change([],1)) handle C=>change([2],6)
change([2],6)
Here is the full execution. Observe how the exception handlers nest and
how they drop away once the given expression has returned a value.
change([5,2],6)
5::change([5,2],1) handle C => change([2],6)
5::(5::change([5,2],~4) handle C => change([2],1))
handle C => change([2],6)
5::change([2],1) handle C => change([2],6)
5::(2::change([2],~1) handle C => change([],1))
handle C => change([2],6)
5::(change([],1)) handle C => change([2],6)
change([2],6)
2::change([2],4) handle C => change([],6)
2::(2::change([2],2) handle C => change([],4)) handle ...
2::(2::(2::change([2],0) handle C => change([],2)) handle C => ...)
2::(2::[2] handle C => change([],4)) handle C => change([],6)
2::[2,2] handle C => change([],6)
[2,2,2]
VII Foundations of Computer Science 74
datatype ’a tree = Lf
| Br of ’a * ’a tree * ’a tree
Slide 709 1
2 3
4 5
count(t) ≤ 2depth(t) − 1
This function is redundant because of a basic fact about trees, which can be
proved by induction: for every tree t, we have leaves(t) = count(t)+1. The
inequality shown on the slide also has an elementary proof by induction.
A tree of depth 20 can store 220 −1 or approximately one million elements.
The access paths to these elements are short, particularly when compared
with a million-element list!
VII Foundations of Computer Science 76
fun preorder Lf = []
| preorder(Br(v,t1,t2)) =
[v] @ preorder t1 @ preorder t2;
Slide 711
fun inorder Lf = []
| inorder(Br(v,t1,t2)) =
inorder t1 @ [v] @ inorder t2;
fun postorder Lf = []
| postorder(Br(v,t1,t2)) =
postorder t1 @ postorder t2 @ [v];
A
/ \
B C
/ \ / \
D E F G
These three types of tree traversal are related in that all are depth-first.
They each traverse the left subtree in full before traversing the right sub-
tree. Breadth-first search (Lect. 9) is another possibility. That involves going
through the levels of a tree one at a time.
VII Foundations of Computer Science 78
Exercise 7.2 Continuing the previous exercise, write a function that eval-
uates an expression. If the expression contains any variables, your function
should raise an exception indicating the variable name.
Exercise 7.3 Show that the functions preorder, inorder and postorder
all require O(n2 ) time in the worst case, where n is the size of the tree.
Exercise 7.4 Show that the functions preord, inord and postord all take
linear time in the size of the tree.
VIII Foundations of Computer Science 79
Dictionaries
exception Missing;
James, 5
Slide 803
Gordon, 4 Thomas, 1
Lookup in the binary search tree goes to the left subtree if the desired
key is smaller than the current one and to the right if it is greater. It raises
exception Missing if it encounters an empty tree.
Since an ordering is involved, we have to declare the functions for a
specific type, here string. Now exception Missing mentions that type: if
lookup fails, the exception returns the missing key. The exception could
be eliminated using type option of Lect. 7, using the constructor NONE for
failure.
VIII Foundations of Computer Science 83
Update
Also O(log n): it copies the path only, not whole subtrees!
Arrays
The elements of a list can only be reached by counting from the front.
Elements of a tree are reached by following a path from the root. An ar-
ray hides such structural matters; its elements are uniformly designated by
number. Immediate access to arbitrary parts of a data structure is called
random access.
Arrays are the dominant data structure in conventional programming
languages. The ingenious use of arrays is the key to many of the great
classical algorithms, such as Hoare’s original quicksort (the partition step)
and Warshall’s transitive-closure algorithm.
The drawback is that subscripting is a chief cause of programmer error.
That is why arrays play little role in this introductory course.
Functional arrays are described below in order to illustrate another way
of using trees to organize data. Here is a summary of our dictionary data
structures in order of decreasing generality and increasing efficiency:
• Linear search: Most general, needing only equality on keys, but ineffi-
cient: linear time.
• Binary search: Needs an ordering on keys. Logarithmic access time in
the average case, linear in the worst case.
• Array subscripting: Least general, requiring keys to be integers, but
even worst-case time is logarithmic.
VIII Foundations of Computer Science 85
The path to an element follows the binary code for its subscript.
Slide 807 1
2 3
4 6 5 7
8 12 10 14 9 13 11 15
exception Subscript;
| update (Br(v,t1,t2), k, w) =
if k = 1 then Br (w, t1, t2)
else if k mod 2 = 0
then Br (v, update(t1, k div 2, w), t2)
else Br (v, t1, update(t2, k div 2, w))
Exercise 8.1 Draw the binary search tree that arises from successively in-
serting the following pairs into the empty tree: (Alice, 6), (Tobias, 2), (Ger-
ald, 8), (Lucy, 9). Then repeat this task using the order (Gerald, 8), (Alice,
6), (Lucy, 9), (Tobias, 2). Why are results different?
Exercise 8.2 Code the function insert, which resembles update but if the
item is already present, raises the exception Collision (which has no value
attached).
Exercise 8.3 Continuing the previous exercise, it would be natural for ex-
ceptional Collision to return the value previously stored in the dictionary.
Why is that goal difficult to achieve?
Exercise 8.5 Code the delete function outlined in the previous exercise.
Exercise 8.6 Write a function to remove the first element from a functional
array. All the other elements are to have their subscripts reduced by one.
The cost of this operation should be linear in the size of the array.
IX Foundations of Computer Science 89
fun nbreadth [] = []
| nbreadth (Lf :: ts) = nbreadth ts
Slide 902 | nbreadth (Br(v,t,u) :: ts) =
v :: nbreadth(ts @ [t,u])
Queues require efficient access at both ends: at the front, for removal, and
at the back, for insertion. Ideally, access should take constant time, O(1).
It may appear that lists cannot provide such access. If enq(q,x) performs
q@[x], then this operation will be O(n). We could represent queues by
reversed lists, implementing enq(q,x) by x::q, but then the deq and qhd
operations would be O(n). Linear time is intolerable: a series of n queue
operations could then require O(n2 ) time.
The solution is to represent a queue by a pair of lists, where
([x1 , x2 , . . . , xm ], [y1 , y2 , . . . , yn ])
The function norm puts a queue into normal form, ensuring that the front
part is never empty unless the entire queue is empty. Functions deq and enq
call norm to normalize their result.
Because queues are in normal form, their head is certain to be in their
front part, so qhd (also omitted from the slide) looks there.
fun qhd(Q(x::_,_)) = x
Let us analyse the cost of an execution comprising (in any possible order) n
enq operations and n deq operations, starting with an empty queue. Each
enq operation will perform one cons, adding an element to the rear part.
Since the final queue must be empty, each element of the rear part gets
transferred to the front part. The corresponding reversals perform one cons
per element. Thus, the total cost of the series of queue operations is 2n cons
operations, an average of 2 per operation. The amortized time is O(1).
There is a catch. The conses need not be distributed evenly; reversing
a long list could take up to n − 1 of them. Unpredictable delays make the
approach unsuitable for real-time programming, where deadlines must be
met.
IX Foundations of Computer Science 94
fun wheels v =
Slide 906 case v of Bike => 2
| Motorbike _ => 2
| Car robin =>
if robin then 3 else 4
| Lorry w => w;
It tries the patterns one after the other. When one matches, it evaluates the
corresponding expression. It behaves precisely like the body of a function
declaration. We could have defined function wheels (from Lect. 7) as shown
above.
A program phrase of the form P1 => E1 | · · · | Pn => En is called a
Match. A match may also appear after an exception handler (Lect. 7) and
with fn-notation to expression functions directly (Lect. 10).
IX Foundations of Computer Science 95
fun breadth q =
if qnull q then []
else
Slide 907 case qhd q of
Lf => breadth (deq q)
| Br(v,t,u) =>
v :: breadth(enq(enq(deq q, t), u))
Breadth-first search is not practical for big problems: it uses too much
space. Consider the slightly more general problem of searching trees whose
branching factor is b (for binary trees, b = 2). Then breadth-first search
to depth d examines (bd+1 − 1)/(b − 1) nodes, which is O(bd ), ignoring the
constant factor of b/(b−1). Since all nodes that are examined are also stored,
the space and time requirements are both O(bd ).
Depth-first iterative deepening combines the space efficiency of depth-first
with the ‘nearest-first’ property of breadth-first search. It performs repeated
depth-first searches with increasing depth bounds, each time discarding the
result of the previous search. Thus it searches to depth 1, then to depth 2,
and so on until it finds a solution. We can afford to discard previous results
because the number of nodes is growing exponentially. There are bd+1 nodes
at level d + 1; if b ≥ 2, this number actually exceeds the total number of
nodes of all previous levels put together, namely (bd+1 − 1)/(b − 1).
Korf [11] shows that the time needed for iterative deepening to reach
depth d is only b/(b − 1) times that for breadth-first search, if b > 1. This
is a constant factor; both algorithms have the same time complexity, O(bd ).
In typical applications where b ≥ 2 the extra factor of b/(b − 1) is quite
tolerable. The reduction in the space requirement is exponential, from O(bd )
for breadth-first to O(d) for iterative deepening.
IX Foundations of Computer Science 97
A stack is a sequence such that items can be added or removed from the
head only. A stack obeys a Last-In-First-Out (LIFO) discipline: the item
next to be removed is the one that has been in the queue for the shortest
time. Lists can easily implement stacks because both cons and hd affect
the head. But unlike lists, stacks are often regarded as an imperative data
structure: the effect of push or pop is to change an existing stack, not return
a new one.
In conventional programming languages, a stack is often implemented by
storing the elements in an array, using a variable (the stack pointer ) to count
them. Most language processors keep track of recursive function calls using
an internal stack.
IX Foundations of Computer Science 98
Exercise 9.3 Write a version of the function shown on slide 907 using a
nested let construction rather than case.
Functions as Values
Curried Functions
The fn-notation lets us package n*2 as the function (fn n => n*2), but
what if there are several variables, as in (n*2+k)? If the variable k is defined
in the current context, then
fn n => n*2+k
The sorting functions of Lect. 6 are coded to sort real numbers. They
can be generalized to an arbitrary ordered type by passing the ordering
predicate (≤) as an argument.
Functions ins and sort are declared locally, referring to lessequal.
Though it may not be obvious, insort is a curried function. Given its first
argument, a predicate for comparing some particular type of items, it returns
the function sort for sorting lists of that type of items.
X Foundations of Computer Science 106
Note: op<= stands for the <= operator regarded as a value. Although
infixes are functions, normally they can only appear in expressions such as
n<=9. The op syntax lets us write op<=(n,9), but on the slide we use op<=
to pass the comparison operator to function insort.
To exploit sorting to its full extent, we need the greatest flexibility in
expressing orderings. There are many types of basic data, such as integers,
reals and strings. On the overhead, we sort integers and strings. The op-
erator <= is overloaded, working for types int, real and string. The list
supplied as insort’s second argument resolves the overloading ambiguity.
Passing the relation ≥ for lessequal gives a decreasing sort. This is no
coding trick; it is justified in mathematics. If ≤ is a partial ordering then so
is ≥.
There are many ways of combining orderings. Most important is the
lexicographic ordering, in which two keys are used for comparisons. It is
specified by (x′ , y ′ ) < (x, y) ⇐⇒ x′ < x ∨ (x′ = x ∧ y ′ < y). Often part
of the data plays no role in the ordering; consider the text of the entries in
an encyclopedia. Mathematically, we have an ordering on pairs such that
(x′ , y ′ ) < (x, y) ⇐⇒ x′ < x.
These ways of combining orderings can be expressed in ML as functions
that take orderings as arguments and return other orderings as results.
X Foundations of Computer Science 107
A Summation Functional
m X
X i
sum (sum f) m = f (j)
i=1 j=1
Historical Remarks
Exercise 10.1 What does the following function do, and what are its uses?
fun sw f x y = f y x;
Exercise 10.4 Explain the second example of sum on the overhead. What
is (sum f)?
XI Foundations of Computer Science 111
fun map f [] = []
| map f (x::xs) = (f x) :: map f xs
Slide 1101 > val map = fn: (’a -> ’b) -> ’a list -> ’b list
a
T d
a b c
= b e
d e f
Slide 1102 c f
fun hd (x::_) = x;
fun tl (_::xs) = xs;
A matrix can be viewed as a list of rows, each row a list of matrix elements.
This representation is not especially efficient compared with the conventional
one (using arrays). Lists of lists turn up often, though, and we can see how
to deal with them by taking familiar matrix operations as examples. ML for
the Working Programmer goes as far as Gaussian elimination, which presents
surprisingly few difficulties.
a d
The transpose of the matrix ad eb fc is b e , which in ML corresponds
c f
to the following transformation on lists of lists:
[[a,b,c], [d,e,f]] 7→ [[a,d], [b,e], [c,f]]
A recursive call transposes the latter matrix, which is then given the column
[a,d] as its first row.
The two functions expressed using map would otherwise have to be de-
clared separately.
XI Foundations of Computer Science 113
B1
..
Slide 1103 A1 ··· ·
Ak . = A1 B 1 + · · · + Ak B k
Bk
(a1 , . . . , ak ) · (b1 , . . . , bk ) = a1 b1 + · · · + ak bk .
1 1 5 −1 2
(2, 0) · (1, 4) = 2 × 1 + 0 × 4 = 2.
Matrix Multiplication in ML
fun matprod(Arows,Brows) =
let val cols = transp Brows
in map (fn row => map (dotprod row) cols)
Arows
end
These functionals start with an initial value e. They combine it with the
list elements one at a time, using the function ⊕. While foldl takes the list
elements from left to right, foldr takes them from right to left. Here are
their types:
> val foldl = fn: (’a * ’b -> ’a) -> ’a * ’b list -> ’a
> val foldr = fn: (’a * ’b -> ’b) -> ’a list * ’b -> ’b
:: → :: → :: → :: → nil
↓ ↓ ↓ ↓
1 2 3 4
Compare with the expression computed by foldr(⊕, e). The final nil is
replaced by e; the conses are replaced by ⊕.
⊕→⊕→⊕→⊕→e
↓ ↓ ↓ ↓
1 2 3 4
XI Foundations of Computer Science 116
The sum of a list’s elements is formed by starting with zero and adding
each list element in turn. Using foldr would be less efficient, requiring linear
instead of constant space. Note that op+ turns the infix addition operator
into a function that can be passed to other functions such as foldl. Append
is expressed similarly, using op:: to stand for the cons function.
The sum-of-sums computation is space-efficient: it does not form an
intermediate list of sums. Moreover, foldl is iterative. Carefully observe
how the inner foldl expresses a function to add a number of a list; the
outer foldl applies this function to each list in turn, accumulating a sum
starting from zero.
The nesting in the sum-of-sums calculation is typical of well-designed
fold functionals. Similar functionals can be declared for other data struc-
tures, such as trees. Nesting these functions provides a convenient means of
operating on nested data structures, such as trees of lists.
The length computation might be regarded as frivolous. A trivial function
is supplied using fn-notation; it ignores the list elements except to count
them. However, this length function takes constant space, which is better
than naı̈ve versions such as nlength (Lect. 4). Using foldl guarantees an
iterative solution with an accumulator.
XI Foundations of Computer Science 117
fun member(y,xs) =
exists (fn x => x=y) xs;
fun disjoint(xs,ys) =
all (fn x => all (fn y => x<>y) ys) xs;
> val disjoint = fn: ’’a list * ’’a list -> bool
Tree Functionals
fun maptree f Lf = Lf
| maptree f (Br(v,t1,t2)) =
Br(f v, maptree f t1, maptree f t2);
Slide 1109 > val maptree = fn
> : (’a -> ’b) -> ’a tree -> ’b tree
fun fold f e Lf = e
| fold f e (Br(v,t1,t2)) =
f (v, fold f e t1, fold f e t2);
> val fold = fn
> : (’a * ’b * ’b -> ’b) -> ’b -> ’a tree -> ’b
The ideas presented in this lecture generalize in the obvious way to trees
and other datatypes, not necessarily recursive ones.
The functional maptree applies a function to every label of a tree, return-
ing another tree of the same shape. Analogues of exists and all are trivial
to declare. On the other hand, filter is hard because removing the filtered
labels changes the tree’s shape; if a label fails to satisfy the predicate, there
is no obvious way to include the result of filtering both subtrees.
The easiest way of declaring a fold functional is as shown above. The
arguments f and e replace the constructors Br and Lf, respectively. This
functional can be used to add a tree’s labels, but it requires a three-argument
addition function. To avoid this inconvenience, fold functionals for trees can
implicitly treat the tree as a list. For example, here is a fold function related
to foldr, which processes the labels in inorder:
fun infold f (Lf, e) = e
| infold f (Br(v,t1,t2), e) = infold f (t1, f (v, infold f (t2, e)));
Its code is derived from that of the function inord of Lect. 7 by generalizing
cons to the function f.
Our primitives themselves can be seen as a programming language. This
truth is particularly obvious in the case of functionals, but it holds of pro-
gramming in general. Part of the task of programming is to extend our
programming language with notation for solving the problem at hand. The
levels of notation that we define should correspond to natural levels of ab-
straction in the problem domain.
XI Foundations of Computer Science 120
Exercise 11.1 Without using map, write a function map2 such that map2 f
is equivalent to map (map f). The obvious solution requires declaring two
recursive functions. Try to get away with only one by exploiting nested
pattern-matching.
Computer Algebra
Univariate polynomials an xn + · · · + a 0 x0
Example of data representation and algorithms in practice
{3, 4}
Slide 1202 representations not unique: ւ ց
[3, 4] [4, 3]
Decreasing exponents
Polynomial addition
fun polyprod [] us = []
| polyprod [(m,a)] us = map (termprod(m,a)) us
Slide 1207 | polyprod ts us =
let val k = length ts div 2
in polysum (polyprod (take(ts,k)) us)
(polyprod (drop(ts,k)) us)
end;
Polynomial division
This pair tells us that the quotient is x − 1 and the remainder is 2. We can
easily verify that (x + 1)(x − 1) + 2 = x2 − 1 + 2 = x2 + 1.
XII Foundations of Computer Science 129
fun polygcd [] us = us
| polygcd ts us = polygcd (polyrem us ts) ts;
Slide 1209 needed to simplify rational functions such as
x2 − 1 x+1
=
x2 − 2x + 1 x−1
strange answers
TOO SLOW
Exercise 12.1 Code the set operations of membership test, subset test,
union and intersection using the ordered-list representation.
A Pipeline
Slide 1301
Produce sequence of items
Lazy lists have practical uses. Some algorithms, like making change, can
yield many solutions when only a few are required. Sometimes the original
problem concerns infinite series: with lazy lists, we can pretend they really
exist!
We are now dealing with infinite, or at least unbounded, computations.
A potentially infinite source of data is processed one element at a time, upon
demand. Such programs are harder to understand than terminating ones
and have more ways of going wrong.
Some purely functional languages, such as Haskell, use lazy evaluation
everywhere. Even the if-then-else construct can be a function, and all lists
are lazy. In ML, we can declare a type of lists such that evaluation of the
tail does not occur until demanded. Delayed evaluation is weaker than lazy
evaluation, but it is good enough for our purposes.
The traditional word stream is reserved in ML parlance for input/output
channels. Let us call lazy lists sequences.
XIII Foundations of Computer Science 133
Lazy Lists in ML
Slide 1303
datatype ’a seq = Nil sequences
| Cons of ’a * (unit -> ’a seq);
The primitive ML type unit has one element, which is written (). This
element may be regarded as a 0-tuple, and unit as the nullary Cartesian
product. (Think of the connection between multiplication and the number 1.)
The empty tuple serves as a placeholder in situations where no informa-
tion is required. It has several uses:
Consuming a Sequence
fun get(0,xq) = []
| get(n,Nil) = []
Slide 1305
| get(n,Cons(x,xf)) = x :: get(n-1,xf());
> val get = fn : int * ’a seq -> ’a list
The function get converts a sequence to a list. It takes the first n ele-
ments; it takes all of them if n < 0, which can terminate only if the sequence
is finite.
In the third line of get, the expression xf() calls the tail function, de-
manding evaluation of the next element. This operation is called forcing the
list.
XIII Foundations of Computer Science 136
Sample Evaluation
get(2, from 6)
Slide 1306 ⇒ get(2, Cons(6, fn()=>from(6+1)))
⇒ 6 :: get(1, from(6+1))
⇒ 6 :: get(1, Cons(7, fn()=>from(7+1)))
⇒ 6 :: 7 :: get(0, Cons(8, fn()=>from(8+1)))
⇒ 6 :: 7 :: []
⇒ [6,7]
Here we ask for two elements of the infinite sequence. In fact, three ele-
ments are computed: 6, 7 and 8. Our implementation is slightly too eager. A
more complicated datatype declaration could avoid this problem. Another
problem is that if one repeatedly examines some particular list element using
forcing, that element is repeatedly evaluated. In a lazy programming lan-
guage, the result of the first evaluation would be stored for later reference.
To get the same effect in ML requires references [13, page 327].
We should be grateful that the potentially infinite computation is kept
finite. The tail of the original sequence even contains the unevaluated ex-
pression 6+1.
XIII Foundations of Computer Science 137
A fair alternative. . .
filtering
fun iterates f x =
Cons(x, fn()=> iterates f (f x));
Close enough?
Square Roots!
Exercise 13.2 Consider the list function concat, which concatenates a list
of lists to form a single list. Can it be generalized to concatenate a sequence
of sequences? What can go wrong?
fun concat [] = []
| concat (l::ls) = l @ concat ls;
Exercise 13.3 Code a function to make change using lazy lists, delivering
the sequence of all possible ways of making change. Using sequences allows
us to compute solutions one at a time when there exists an astronomical
number. Represent lists of coins using ordinary lists. (Hint: to benefit from
laziness you may need to pass around the sequence of alternative solutions
as a function of type unit -> (int list) seq.)
Procedural Programming
The slide presents the ML primitives, but most languages have analogues
of them, often heavily disguised. We need a means of creating references (or
allocating storage), getting at the current contents of a reference cell, and
updating that cell.
The function ref creates references (also called pointers or locations).
Calling ref allocates a new location in the machine store. Initially, this loca-
tion holds the value given by expression E. Although ref is an ML function,
it is not a function in the mathematical sense. For example, ref(0)=ref(0)
evaluates to false.
The function !, when applied to a reference, returns its contents. This
operation is called dereferencing. Clearly ! is not a mathematical function;
its result depends upon the store.
The assignment P :=E evaluates expression P , which must return a ref-
erence p, and E. It stores at address p the value of E. Syntactically, := is a
function and P :=E is an expression, even though it updates the store. Like
many functions that change the state, it returns the value () of type unit.
If τ is some ML type, then τ ref is the type of references to cells that
can hold values of τ . Please do not confuse the type ref with the function
ref. This table of the primitive functions and their types might be useful:
ref ’a -> ’a ref
! ’a ref -> ’a
op := ’a ref * ’a -> unit
XIV Foundations of Computer Science 143
while B do C
fun length xs =
Slide 1405
let val lp = ref xs list of uncounted elements
val np = ref 0 accumulated count
in
while not (null (!lp)) do
(lp := tl (!lp); np := 1 + !np);
!np the count is returned!
end;
As you may have noticed, ML’s programming style looks clumsy com-
pared with that of languages like C. ML omits the defaults and abbrevia-
tions they provide to shorten programs. However, ML’s explicitness makes
it ideal for teaching the fine points of references and arrays. ML’s references
are more flexible than those found in other languages.
The function makeAccount models a bank. Calling the function with a
specified initial balance creates a new reference (balance) to maintain the
account balance and returns a function (withdraw) having sole access to that
reference. Calling withdraw reduces the balance by the specified amount and
returns the new balance. You can pay money in by withdrawing a negative
amount. The if-construct prevents the account from going overdrawn (it
could raise an exception).
Look at the (E1 ; E2 ) construct in the else part above. The first expres-
sion updates the account balance and returns the trivial value (). The second
expression, !balance, returns the current balance but does not return the
reference itself: that would allow unauthorized updates.
This example is based on one by Dr A C Norman.
XIV Foundations of Computer Science 147
Slide 1409
Array.tabulate(n,f ) create a n-element array
ML arrays are like references that hold several elements instead of one.
The elements of an n-element array are designated by the integers from 0
to n − 1. The ith array element is usually written A[i]. If τ is a type then τ
Array.array is the type of arrays (of any size) with elements from τ .
Calling Array.tabulate(n,f ) creates an array of the size specified by
expression n. Initially, element A[i] holds the value of f (i) for i = 0, . . . , n−1.
This function is analogous to ref in that it allocates mutable storage cells
to hold the specified values.
Calling Array.sub(A,i) returns the contents of A[i].
Calling Array.update(A,i,E) modifies the array A by storing the value
of E as the new contents of A[i]; it returns () as its value.
Why is there no function returning a reference to A[i]? Such a function
could replace both Array.sub and Array.update, but allowing references to
individual array elements would complicate storage management.
XIV Foundations of Computer Science 150
Array Examples
Array.sub(ar,2);
Slide 1410
> val it = 4 : int
Array.sub(ar,20);
> uncaught exception Subscript
Array.update(ar,2,˜33);
ar;
> val it = [|0,1,˜33,9,16,25,...|] : int array
advantages
Exercise 14.2 Write a version of function power (Lect. 2) using while in-
stead of recursion.
References to References
3 5 9·
Slide 1501
NESTED BOXES v pointers
3 5 9 Nil
mlp;
> ref(Cons("a", ref(Cons("b", ref Nil))))
Destructive Concatenation
Side-Effects
join(ml1,ml2);
ml1; IT ’ S CHANGED !?
> Cons("a",
> ref(Cons("b", ref(Cons("c", ref Nil)))))
In this example, we bind the mutable lists ["a"] and ["b","c"] to the
variables ml1 and ml2. ML’s method of displaying reference values lets us
easily read off the list elements in the data structures.
Next, we concatenate the lists using join. (There is no room to display
the returned value, but it is identical to the one at the bottom of the slide,
which is the mutable list ["a","b","c"].)
Finally, we inspect the value of ml1. It looks different; has it changed?
No; it is the same reference as ever. The contents of a cell reachable from
it has changed. Our interpretation of its value of a list has changed from
["a"] to ["a","b","c"].
This behaviour cannot occur with ML’s built-in lists because their inter-
nal link fields are not mutable. The ability to update the list held in ml1
might be wanted, but it might also come as an unpleasant surprise, especially
if we confuse join with append. A further surprise is that
join(ml2,ml3)
also affects the list in ml1: it updates the last pointer of ml2 and that is now
the last pointer of ml1 too.
XV Foundations of Computer Science 159
A Cyclic List
join(ml,ml);
> Cons(0,
> ref(Cons(1,
> ref(Cons(0,
> ref(Cons(1,...)))))))
argument
Nil
a a
Slide 1508
b b
c c
Nil result
List reversal can be tricky to work out from first principles, but the code
should be easy to understand.
Reverse for ordinary lists copies the list cells while reversing the order
of the elements. Destructive reverse re-uses the existing list cells while re-
orienting the links. It works by walking down the mutable list, noting the
last two mutable lists encountered, and redirecting the second cell’s link field
to point to the first. Initially, the first mutable list is Nil, since the last link
of the reversed must point to Nil.
Note that we must look at the reversed list from the opposite end! The
reversal function takes as its argument a pointer to the first element of the
list. It must return a pointer to the first element of the reversed list, which
is the last element of the original list.
XV Foundations of Computer Science 161
Slide 1509
| Cons(_,mlp2) =>
let val ml2 = !mlp2 next cell
in mlp2 := prev; re-orient
reversing (ml, ml2)
end;
> reversing: ’a mlist * ’a mlist -> ’a mlist
drev ml;
> Cons(9, ref(Cons(5, ref(Cons(3, ref Nil)))))
ml; IT ’ S CHANGED !?
> val it = Cons(3, ref Nil) : int mlist
In the example above, the mutable list [3,5,9] is reversed to yield [9,5,3].
The effect of drev upon its argument ml may come as a surprise! Because
ml is now the last cell in the list, it appears as the one-element list [3].
XV Foundations of Computer Science 163
fun mfilter p ml =
case ml of
Slide 1511 Nil => Nil
| Cons(x,xr) =>
let val ml2 = mfilter p (!xr)
in if p x then (xr := ml2; ml) else ml2
end;
In the 2006 exams, Question 6 of Paper 1 asked for a filter functional for
mutable lists. The code above presents a solution. The tail of a non-empty
list is filtered recursively, and this will be the result unless the head satisfies
the predicate. In that case, its tail link is updated to refer to the filtered
tail.
The ideas presented in this lecture can be generalized in the obvious
way to trees. Another generalization is to provide additional link fields. In
a doubly-linked list, each node points to its predecessor as well as to its
successor. In such a list one can move forwards or backwards from a given
spot. Inserting or deleting elements requires redirecting the pointer fields in
two adjacent nodes. If the doubly-linked list is also cyclic then it is sometimes
called a ring buffer [13, page 331].
Tree nodes normally carry links to their children. Occasionally, they
instead have a link to their parent, or sometimes links in both directions.
XV Foundations of Computer Science 164
Exercise 15.1 Write a function to copy a mutable list. When might you
use it?
Exercise 15.2 What is the value of ml1 (regarded as a list) after the fol-
lowing declarations and commands are entered at top level? Explain this
outcome.
val ml1 = mlistOf[1,2,3]
and ml2 = mlistOf[4,5,6,7];
join(ml1, ml2);
drev ml2;
References
[1] Harold Abelson and Gerald J. Sussman. Structure and Interpretation
of Computer Programs. MIT Press, 1985.
[2] Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. The Design
and Analysis of Computer Algorithms. Addison-Wesley, 1974.
[3] C. Gordon Bell and Allen Newell. Computer Structures: Readings and
Examples. McGraw-Hill, 1971.
[4] Arthur W. Burks, Herman H. Goldstine, and John von Neumann.
Preliminary discussion of the logical design of an electronic computing
instrument. Reprinted as Chapter 4 of Bell and Newell [3], first
published in 1946.
[5] Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest.
Introduction to Algorithms. MIT Press, 1990.
[6] Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete
Mathematics: A Foundation for Computer Science. Addison-Wesley,
2nd edition, 1994.
[7] Matthew Halfant and Gerald Jay Sussman. Abstraction in numerical
methods. In LISP and Functional Programming, pages 1–7. ACM
Press, 1988.
[8] John Hughes. Why functional programming matters. Computer
Journal, 32(2):98–107, 1989.
[9] Donald E. Knuth. The Art of Computer Programming, volume 3:
Sorting and Searching. Addison-Wesley, 1973.
[10] Donald E. Knuth. The Art of Computer Programming, volume 1:
Fundamental Algorithms. Addison-Wesley, 2nd edition, 1973.
[11] R. E. Korf. Depth-first iterative-deepening: an optimal admissible tree
search. Artificial Intelligence, 27:97–109, 1985.
[12] Stephen K. Park and Keith W. Miller. Random number generators:
Good ones are hard to find. Communications of the ACM,
31(10):1192–1201, October 1988. Follow-up discussion in Comm. ACM
36 (7), July 1993, pp. 105-110.
[13] Lawrence C. Paulson. ML for the Working Programmer. Cambridge
University Press, 2nd edition, 1996.
[14] Robert Sedgewick. Algorithms. Addison-Wesley, 2nd edition, 1988.
[15] Jeffrey D. Ullman. Elements of ML Programming. Prentice-Hall, 1993.
[16] Å. Wikström. Functional Programming using ML. Prentice-Hall, 1987.