Functional Programming Victoria University of Wellington
Functional Programming Victoria University of Wellington
Neil Leslie
Centre for Logic, Language and Computation
School of Mathematical and Computing Sciences
Victoria University of Wellington
2008
L
@
@
@
@
Bβ @ Bβ
@
@
@
@
M R
@ N
@
@
@
@
∃Bβ @ ∃Bβ
@
@
@
@
R
@
P
Neil Leslie asserts his moral right to be identified as the author of this work.
1 Introduction 1
1.1 Course outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Texts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Compilers and interpreters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Re-cap of what you (are supposed to have) learned in COMP 304 . . . . . . . . . . 2
1.4 Next chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4.1 How to read a scientific paper . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Limitations of reduce 15
3.1 Trying to use reduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Parameterising on the sorting order . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 Finding the minimum of a list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3.1 First solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3.2 Second solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.3 Third solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 List comprehensions 25
5.1 Pythagorean triples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2 Quicksort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.3 n Queens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
7 Types 33
7.1 Hindley-Milner, and parametric polymorphism . . . . . . . . . . . . . . . . . . . . . 33
7.2 Extensions to Hindley-Milner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.3 Basic types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.4 Type synonyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.5 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.5.1 Declaring a class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
v
Contents
9 Parser combinators 59
9.1 The type of parsers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
9.2 From grammar to parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
9.2.1 Simple parsers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
9.2.2 Combining parsers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
9.2.3 Parsing balanced brackets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
9.2.4 More uses for <@ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
9.3 Extending our parsing toolset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
9.3.1 sp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
9.3.2 just . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
9.3.3 some . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
9.3.4 <:&> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
9.3.5 Kleene ∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
9.3.6 Kleene + . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
9.3.7 first . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
9.3.8 bang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
9.3.9 optionally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.4 Parsing sequences of items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.4.1 Example: S-expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
9.4.2 Lists with meaningful separators . . . . . . . . . . . . . . . . . . . . . . . . . 73
9.5 <&=> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
9.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
10 The λ-calculus 77
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
10.2 Syntax of λ-terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
10.2.1 α convertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
10.2.2 De Bruijn terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
10.3 Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
10.3.1 Dynamic scope in LISP, Jensen’s device . . . . . . . . . . . . . . . . . . . . . . 84
10.4 Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
10.4.1 β reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
10.5 The Church-Rosser Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
10.6 Normal form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
10.7 Reduction strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
10.7.1 Leftmost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
10.7.2 Other reduction strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
vi
Contents
12 Continuations 99
12.1 Introducing tail-recursion and continuations . . . . . . . . . . . . . . . . . . . . . . . 99
12.2 Some simple functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
12.2.1 A CPS version of Fibonacci’s function . . . . . . . . . . . . . . . . . . . . . . 100
12.2.2 Historical note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
12.3 Uses of continuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
12.3.1 Continuations and I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
12.4 Further examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
12.4.1 Functions on types not defined inductively . . . . . . . . . . . . . . . . . . . . 104
12.4.2 Functions on inductively defined types . . . . . . . . . . . . . . . . . . . . . . 105
vii
Contents
18 Monads 153
18.1 A categorical perspective on monads . . . . . . . . . . . . . . . . . . . . . . . . . . 153
18.2 A computer scientist’s view of monads . . . . . . . . . . . . . . . . . . . . . . . . . . 153
18.2.1 Input/Output in functional programming . . . . . . . . . . . . . . . . . . . . 154
18.2.2 IO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
18.2.3 Generalising from IO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
viii
Contents
ix
Contents
x
Programs
xi
PROGRAMS
xii
PROGRAMS
xiii
PROGRAMS
10.1 Factorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
10.2 Factorial, again . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
10.3 Factorial, another time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
xiv
PROGRAMS
xv
PROGRAMS
xvi
PROGRAMS
xvii
PROGRAMS
xviii
Figures
xix
FIGURES
xx
Proof figures
10.1 ∀ elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
10.2 Invalid ∀ elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
10.3 Valid ∀ elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
10.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
10.5 Alternative clause for . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
10.6 Confluence of . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
xxi
PROOF FIGURES
xxii
1 Introduction
These are the notes that I have prepared for COMP 432: Functional Programming since 2000. They
are notes, not a text book. In particular, there is no guarantee that they will be error-free: caveat lector.
This is a course about functional programming as programming, not about the implementation of
functional programming languages, nor about the λ-calculus as a topic of study in its own right.1
To understand programming we must learn both principles and practice. Each informs the other:
unprincipled practice is almost always poor practice; unpracticed principles are almost always poorly
understood. Hence this course will involve writing programs, and thinking about programs.
Extended examples will be used to illustrate the concepts we are trying to understand.
1.1.1 Texts
The textbook on which I have relied most is:
• Functional Programming and Parallel Graph Rewriting by Rinus Plasmeijer and Marko van Eeke-
len [65];
• The Haskell School of Expression Learning Functional Programming Through Multimedia by Paul
Hudak [40];
1
1 Introduction
There is a HUGS interpreter available on the MCS system. Type hugs at the prompt. Emacs users may
find the Haskell mode useful. Haskell is also available for the Mac and for PC’s.
• functions defined on inductively defined data types (e.g. Int, lists, trees) typically use recursion;
2. Read the abstract. Ask yourself the question ‘What is this paper trying to tell us?’ Now you have
a general idea of what the paper is about.
3. Flick to the back of the paper, reading only the headings. This gives you an idea of the structure
of the paper, and an impression of how the paper tells you about what it tells you about.
4. Read the conclusion. Ask yourself the question ‘What was actually achieved in the paper?’
5. Look at the references. These tell you what other work was used to construct the current work.
6. Go back to the beginning and read the paper carefully. Don’t be afraid to skip sections at the
first reading.
A well-written scientific paper encourages the reader to move through these steps.
2 Nor usually as interesting.
2
1.5 Summary
1.5 Summary
From this chapter you should have learned:
• some administrative details;
• what you you should have learned in COMP 304;
• how to read a scientific paper.
3
1 Introduction
Questions
1. Why are you doing this course?
2. What do you expect from this course?
3. Use the WWW to find sites related to functional programming languages. Bookmark the best
ones.
4. What advantages do you think functional programming languages have compared to:
a) object-oriented languages, such as C++, SmallTalk and Java;
b) logic programming languages, such as Prolog and Gödel;
c) imperative programming languages, such as C.
5. What disadvantages do you think functional programming languages have compared to:
a) object-oriented languages, such as C++, SmallTalk and Java;
b) logic programming languages, such as Prolog and Gödel;
c) imperative programming languages, such as C.
6. Look at the Standard Prelude provided with the HUGS system. What is in it and why?
4
2 Why functional programming matters
2.1 Summary
In [41] John Hughes argues that two features of functional languages, higher-order functions and
lazy evaluation contribute greatly to support for program modularity. Since modularity is of great
importance in good programming, functional languages are of importance in the ‘real world’.
As usual we write:
• [] for Nil,
• [1] for Cons 1 [] (or Cons 1 Nil),
• [1, 2] for Cons 1 [2] (or Cons 1 (Cons 2 []), or Cons 1 (Cons 2 Nil)).
When we define a function like sum:
sum [] = 0
sum (h:t) = h + (sum t)
5
2 Why functional programming matters
we observe that the only parts of this definition which are unique to summation are the 0 in the first
clause and the + in the second. If we have not seen many examples of list processing functions we can
look at a few more:
prod [] = 1
prod (h:t) = h * (prod t)
and_all [] = True
and_all (h:t) = h && (and_all t)
or_all [] = False
or_all (h:t) = h || (or_all t)
evens [] = []
evens (h:t) = if (even h) then (h : (evens t))
else (evens t)
reduce f x [] = x
reduce f x (h:t) = f h (reduce f x t)
We can now define the functions from Programs 2.2 and 2.3 using reduce:
evens = reduce
(\x y -> if (even x) then (x : y) else y)
[]
6
2.4 Lazy evaluation
Other examples of higher order-functions which we can use to glue other functions together are:
• function composition
• map
(f . g) x = f (g x)
map _ [] = []
map f (h:t) = (f h) : (map f t)
We can also define higher-order functions analogous to reduce for other inductively defined types.
N
an + an
an+1 =
2
Since the ai converge quickly we only want to compute as many of them as we need to reach some
tolerance, usually denoted . Hughes asserts that the usual conventional programs for computing
square roots are very un-modular. We can use laziness to improve modularity. We do this by lazily
generating the list of approximations, and testing for the situation when two successive approximations
are within of each other.
We obtain the next approximation from the current one by:
next n x = (x + n/x)/2.0
7
2 Why functional programming matters
The list approxs n a0 is, of course, infinite. But, so long as we only need a finite initial segment
of it we can use laziness to operate on it.
Now we can define a function to test whether two successive values in a list are within of each other,
and hence define a square root function:
Evaluation of sqroot start eps square will terminate, even though the list of approximations
is infinite. We can use laziness to support modularity here. Suppose we decide that it is not the differ-
ence between successive approximations that we care about but the ratio of successive approximations.
Then we can define:
We have not had to alter the code which generates the approximations, only the code which com-
pares the values.
Similar examples from numerical differentiation and integration are presented where lazy evaluation
helps support modularisation.
8
2.5 Example: α-β pruning
a game-playing program for a two-player game, in which the computer is one of the players. We can
represent the current state of play, and we have a function which allows us to generate the state of play
after each possible move by either player. The game can then be represented as a tree, with states of
play as values at nodes. The sub-trees of each node represent the consequences of the possible valid
moves from that node. Our task is to let the computer pick the best move for it to make. We use a
minimax strategy to pick the best move, and we use α-β pruning to cut out branches of the tree where
the best move cannot reside.
Each game tree is a node with a value and some (possibly none, of course) sub-trees.2 Recall that
we can define two mutually recursive higher-order functions on such trees:
Each state of play will be represented by an object of type Position, the details of which we do not
care about. The moves available from any given position can be represented by moves, a function of
type Position -> [Position]. A game tree is built by applying moves to the tree that we start
with. A complete game tree will have by the initial state of play as the value at the node. The subtrees
will be the list of trees with values of possible initial moves at their nodes, and developments from them
as sub-trees. This structure is very like that list of approximations that we looked at in §2.4. We can
define reptree, an analogue of rept:
Now we can define a function gametree which takes a position p and constructs the game tree
which can develop from that position:
2 trees like this are sometimes called ‘rose trees’.
9
2 Why functional programming matters
We have a function, which Hughes calls static, which evaluates positions, i.e. static has type
Position -> Float. If we map static over the game tree we can find the values of all the states
of play. Selecting the best move can be done after inspecting this tree of values of states of play. We
use a minimax strategy to choose the next move: to do this we need to inspect the subtrees of any node
to decide how good it really is. Although a node may have a good rating using static, subsequent
moves may end up being poor.
2.5.2 Minimax
Anyhow, let’s implement minimax. We are trying to produce a function which will give us the ‘actual’
value of each position, a value which we obtain by considering all the reachable positions. We define
maximise and minimise:
Hughes points out that there are two, related, problems here. If the game tree is infinite then this
definition will never terminate, and even for finite but large trees this definition is practically unworkable.
Now the problem of selecting a move in a game is starting to look very much like the one we had
before:
So we need to implement a way to cut off the upper branches of the tree, that is to prune it:
10
2.5 Example: α-β pruning
Here prune 5. gametree uses lazy evaluation to implement a 5-move lookahead. Hughes here
makes the point here that laziness supports greater modularity. If we did not have the ability to exploit
laziness we would have had to fold the pruning into the generation of the game tree. Notice also that
the two really problematic functions are:
Laziness allows us to generate the input for maximise only as it is needed, and to reclaim storage
as it becomes free, thus giving a useful optimisation. Again laziness supports modularity: to do this
without laziness would require us to lump all these functions together.
2.5.3 α-β
What we have so far implements minimax searching. We can optimise this using α-β pruning. The
crucial point to observe is that we are interested in the maximum minimum (or the minimum maximum)
in the tree. Thus we can often cut parts of the tree out without ever visiting them. We begin by re-
implementing minimise and maximise:
The point of doing this is to re-define mapmax and mapmin to allow them to ignore the ignorable.
We follow Hughes and only show the re-definition of mapmin, asserting that mapmax can be given a
similar treatment. What we shall do is omit minima which cannot be the largest minimum. We define:
mapmin (nums:numss) =
(minimum nums) : (omit (minimum nums) numss)
11
2 Why functional programming matters
omit pot [] = []
omit pot (ns:nss)
| minleq ns pot = omit pot nss
| otherwise = (minimum ns) : (omit (minimum ns) nss)
Omit takes a potential maximum minimum and ignores minima less than this. Minleq is where the
clever bit comes in: it is given a potential maximum minimum and a list of numbers. Minleq is true
if the minimum of the list is less than or equal to the potential maximum minimum. If any number in
the list is is less than or equal to the potential maximum minimum, then the minimum of the list surely
is too. (The property of being the minimum in a list is that the minimum is less than or equal to any
number in the list, and ≤ is transitive!) Hence minleq does not have to look at all the values in the
list. We define minleq as:
minleq [] _ = False
minleq (n:ns) pot = (n <= pot) || minleq ns pot
Hughes points out that we have made a number of efficiency gains through laziness, and hence we
can look 8 moves ahead now!
Following this development has been, I think, quite hard. The main point is that laziness has sup-
ported modularity. We have been able to improve evaluate as defined in Program 2.19 to implement
α-β pruning by making purely local changes to maximise. The great modularity of evaluate also
allows us to implement other optimisations simply.
2.6 Conclusion
The conclusions of the paper are that:
• modularity is known to be a good thing;
• functional programming languages offer higher-order functions and laziness;
• higher-order functions and laziness support modularity;
• hence functional programming languages have a lot to offer.
12
2.6 Conclusion
Questions
1. Define append and map using reduce.
2. filter :: (a -> Bool) -> [a] -> [a] takes a test, of type a -> Bool and filters
items from the input list which fail the test. Define filter using reduce.
3. Define the function evens from Program 2.3 using filter.
4. Binary trees can be defined in Haskell as:
Define the function redbintree, the analogue of reduce for binary trees.
5. Define 3 functions inorder, preorder and postorder to flatten a tree (i.e. to convert a
BinTree a into a [a]) using redbintree. The three functions should list the values using in-,
pre-, and post-order traversals.
6. Define treemap :: (a -> b) -> (BinTree a) -> BinTree b, which maps a func-
tion over a tree, using redbintree.
7. The two functions rept in Program 2.9 and reptree in Program 2.14 have some obvious
similarities. Is it possible to define, in Haskell, a function which generalises them? What type
would this function have?
8. Draw the game tree for a game of noughts and crosses (tic-tac-toe).
9. Define a type Oxo which represents the state of a game of noughts and crosses.
10. Generate a game tree for noughts and crosses.
11. Implement mapmax.
13
2 Why functional programming matters
14
3 Limitations of reduce
The function reduce is very useful, but it has limitations. In this Chapter we shall look at two of these.
sort = reduce f []
where f x l = ??
Now we have the task of defining a suitable f, which takes a value and an ordered list, and returns
an ordered list.
We begin by observing:
• if we are given a value v and the empty list then we can construct the singleton list [v] which is
ordered;
• if we are given a value v and a non-empty, ordered list then we can construct an ordered list by
comparing v with the head of the list. If v is less than the head of the list then we cons v onto
the list given to us, otherwise we cons the head of the list onto what we got from constructing
an ordered list from v and the tail of the list given to us.
We have a problem here: we do not have access to the tail of the original list! This sorting algorithm
does not fit into exactly the pattern which reduce generalises. We can scratch our heads for a bit
at this point: fortunately the solution to this problem is well-known. Instead of reduce we use an
15
3 Limitations of reduce
operator called listrec. We discuss listrec in § 11.4 on page 96, and [55, 58, 76] explain in much
more detail. This operator is defined as follows:
listrec d _ [] = d
listrec d e (h:t) = e h t (listrec d e t)
f v = listrec [v]
(\h t r -> if v < h
then (v:h:t)
else h:r)
Although anything we can do with reduce can be done with listrec in many situations reduce
is more convenient and neater.
One key point to notice here is that we have defined insertion sort, not a particularly good algorithm.
There are better sorting algorithms which do not follow the pattern of reduce and listrec.
16
3.3 Finding the minimum of a list
Now we can pass sort < or > depending on whether we want to sort numbers in ascending or
descending order, a feature which generalises to other orders, and to other types.
minl [] = -1
minl [s] = s
minl (h:t) = min h (minl t)
this program does not fit the pattern of reduce, but this is not its only flaw.
We might expect to have:
minl(l + +m) = min(minll)(minlm)
But consider the list [1], which we choose to write as [1] + +[]. Now we can show that 1 = −1.
The problem is that minl is simply not defined for the empty list. Pretending that it is will only lead us
into trouble. Furthermore, reduce only lets us write total functions.
There is more than one way to solve this problem. We look at two, and will see a third later on.
minl [] = Error
minl (h:t) = Ok (minl’ h t)
minl’ x [] = x
minl’ x (h:t) = min x (minl’ h t)
Gofer?
minl []
17
3 Limitations of reduce
Gofer?
minl’’ [4,5,3,6,2]
2 :: Int
Letting the system handle errors for us is more convenient.
3.4 Summary
In this Chapter we highlighted some limitations of reduce. We presented another operator, called
listrec, which allows us to overcome one of these limitations. We also briefly discussed error
handling.
18
3.4 Summary
Questions
1. Define treerec, the analogue of listrec for the binary trees defined in Program 2.25.
2. Does the following hold?
19
3 Limitations of reduce
20
4 More lazy programming
Hughes’ [41] argues strongly that laziness has a lot of advantages. In this Chapter we will look more
closely at laziness, following the presentation given in Chapter 17 of [77].1 [61] has a rather nice
chapter on laziness, despite (or perhaps because of) being about ML, a strict functional language.
When we use lazy evaluation:
As a simple illustration of lazy evaluation we show how to take the first few items from an infinite list.
Haskell gives us a convenient notation for lists of integers:2
Gofer?
[1..5]
[1, 2, 3, 4, 5] :: [Int]
(44 reductions, 84 cells)
Gofer?
[1, 3..10]
[1, 3, 5, 7, 9] :: [Int]
(46 reductions, 94 cells)
Gofer?
[1, 3 ..]
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29,
31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57,
59, 61, 63, 65, 67, 69, 71, 73, 75{Interrupted!}
The function take :: Int -> [a] -> [a] from the standard prelude returns the first n items
of a list. Lazy evaluation allows us to take the first 5 items from an infinite list:
Gofer?
take 5 [1, 3..]
[1, 3, 5, 7, 9] :: [Int]
(28 reductions, 75 cells)
21
4 More lazy programming
where f and g are functions. There are two possible expressions we could choose to work on first,
which we underline:
f 1 (g 2 3)
Two questions spring immediately to mind:
1. Does it matter which we choose?
2. If it does matter, what difference does it make?
The λ-calculus provides the best setting in which to provide answers to these questions: when we
look at the λ-calculus in Chapter 10 we will give more detailed explanations, but for the moment we
will work with Haskell.
The answer to question 1 is yes: we would not have bothered to spend so much time building up to
it is if was not! Question 2 clearly has a much more interesting answer.
The evaluation strategy that Haskell adopts is that it picks the leftmost, outermost expression to
evaluate first. In the example above we would first evaluate:
f 1 (g 2 3)
This strategy is what is required for lazy evaluation. Lazy evaluation is, however, more than just
leftmost, outermost evaluation as we also must take care never to evaluate further than we must, nor
twice. The leftmost, outermost evaluation strategy has one remarkable property: if any evaluation
strategy finds a value of an expression then leftmost, outermost will. Leftmost, outermost is also called
normal order evaluation.
Applicative order evaluation evaluates the arguments to a function before evaluating the function. In
the example above we would first evaluate:
g23
If we think in terms of parameter-passing mechanisms then applicative order corresponds to call-by-
value, and normal order to call-by-name.
Notice that normal-order evaluation may find a value for an expression when applicative order
cannot. Consider the (contrived) example:
bomb :: Int
bomb = bomb + 1
k :: a -> b -> a
k x y = x
22
4.2 Summary
4.2 Summary
Laziness has three parts. Normal-order evaluation deals with one part of laziness: not evaluating
expressions that we can ignore. The λ-calculus is the best environment in which to discuss many of the
notions introduced in this Chapter, so we will re-visit them later.
23
4 More lazy programming
Questions
1. Define take and drop. The function drop has type Int -> [a] -> [a] and drops the first
n items from a list.
2. Lazy evaluation has a correspondence with ‘call-by-name’ parameter passing. Find out what
parameter-passing mechanism is used by:
• Algol-60
• Algol-68
• Pascal
• C
• C++
• Java
• LISP
• LISP, originally
• Scheme
• ML
• Miranda
• SmallTalk
• your favourite language
Why did the language designers make these choices?
3. Notice that normal-order evaluation may find a value for an expression when applica-
tive order cannot.
Does this mean that there are functions we can write in Haskell that we can’t write in a strict
language like ML? You may want to read Chapter 5 of ML for the Working Programmer [61].
4. Find out what strictness analysis is.
24
5 List comprehensions
We pause to discuss list comprehensions. List comprehensions were introduced in Miranda as ZF-
expressions. ZF is a set theory due to Zermelo and Fraenkel. In ZF we can define set comprehensions,
e.g.
{x|x ∈ N ∧ x mod 2 = 0}
Gofer?
[x | x <- [0..], x ‘mod‘ 2 == 0]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28,
30, 32, 34, 36, 38, 40{Interrupted!}
List comprehensions give us a very powerful way to describe lists. There is one crucial difference be-
tween set comprehensions in ZF and list comprehensions in Haskell. The items in a list comprehension
are generated in a particular order, and sometimes we need to think carefully about how they are gen-
erated. This is rather similar to the distinction between the declarative and the procedural semantics of
Prolog.
The syntax for list comprehensions provided by Haskell is quite simple. After the list bracket we have
an expression, then a vertical bar, and then a sequence of generators and tests. A generator is a
pattern followed by <- followed by a list. The symbol <- is supposed to remind us of ∈. A test is an
expression of type Bool.
We can have more than one generator, and more than one test. If we are using more than one
generator we need to be careful about the order in which the results are generated. If we define this
function:
Gofer?
pairs 5
[(1,’a’),
(2,’a’), (2,’b’),
(3,’a’), (3,’b’), (3,’c’),
(4,’a’), (4,’b’), (4,’c’), (4,’d’),
(5,’a’), (5,’b’), (5,’c’), (5,’d’), (5,’e’)]
25
5 List comprehensions
The definition looks obscure: but it comes from a careful consideration of the problem. Powerful
programming techniques are also helped by careful thought. Observe that:
(n2 − m2 )2 = n4 + m4 − 2n2 m2
(2nm)2 = 4n2 m2
(n − m ) + (2nm)2
2 2 2
= n4 + m4 + 2n2 m2
n4 + m4 + 2n2 m2 = (n2 + m2 )2
Hence the sum of the squares of the first two is the square of the third, so they certainly are
Pythagorean triples. The first generator selects n from the list of natural numbers from 2 upwards.
The second generator selects m from the list of natural numbers from 1 to n − 1. The (n, m) pairs
are generated in just the same order as the pairs in Program 5.1. The stipulation that n and m are
mutually prime ensures that we ignore triples such as (pa, po, ph), as the triple (a, o, h) is already in the
list. The stipulation that m + n is odd ensures that we ignore triples of the form (2o, 2a, 2h) as the triple
(a, o, h) is already in the list.
And, in case you don’t believe the argument above, the first 10 items in this list are:
5.2 Quicksort
As an example of a function defined using list comprehensions we give a version of quicksort.
Quicksort is defined as follows:
• To sort a list with a head and a tail we sort the values in the tail less than the head and append
that to the head consed onto what we got from sorting the values in the tail greater than the
head.
quick [] = []
quick (h:t) = quick [x | x <- t, x < h] ++
h :
quick [x | x <- t, x >= h]
26
5.3 n Queens
quick [] = []
quick (h:t) = quick smalls ++
h :
quick bigs
where (smalls, bigs) = partition h t
Which should we prefer? What are the criteria on which to choose? Paulson [61] writes:
Correctness must come first. Clarity must usually come second and efficiency third. When-
ever you sacrifice clarity for efficiency, be sure to measure the improvement in performance
and decide whether the sacrifice is worth it. A judicious mixture of realism and principle,
with plenty of patience, makes for efficient programs.
Programs 5.3 and 5.4 are both correct.1 Program 5.3 is, surely, clearer. Check their relative
efficiencies out for yourselves.
5.3 n Queens
And now we program n Queens. The original problem is to place 8 Queens on a chess board, such
that no two Queens threaten each other. We call an arrangement where no two Queens threaten
each other safe. The number 8 plays no significant role, so we generalise to the problem of placing
n Queens safely on an n × n chess board. We could represent the positions of the Queens using an
n × n array. This not a very good representation as we know that no two Queens can possibly be in
the same row. For each row all we need to record is which column is occupied, so we can use a list
of integers. We can also notice that no two Queens can be in the same column either, so solution to
the problem must be permutations of the list [1..n]. One algorithm to solve the problem is simply to
generate all permutations, and then test each one to find whether it is a safe arrangement. Even using
lazy evaluation this is not a very good solution. Instead we define a recursive solution where we place
n Queens by adding one Queen to the nth row of a board on which we have already safely placed
n − 1 Queens in the first n − 1 rows.
So we begin by defining:
27
5 List comprehensions
queens n = place n n
place _ 0 = [[]]
place size n = ??
Now we must fill out the ??. Of course we will use a list comprehension. As we are writing a recursive
function we expect to use place size (n - 1) :: [Solution] in one of the generators. The
solutions to place size (n - 1) are the ways to place n − 1 Queens safely in the first n − 1 rows.
If we pick one of these solutions, and pick a value for the column to put the nth Queen in, and show
that this is a safe arrangement then we have solved the problem. So now we are working along these
lines:
We still need to explain what safe is. If p has length l then we are placing a Queen on column q
in row l + 1. So we must check that the co-ordinates (l + 1, q) do not clash with the co-ordinates of
any of the existing Queens. Remember p is a list of the columns, and the ith item in p is in row i.
there is a function in the standard prelude zip :: [a] -> [b] -> [(a, b)] which we can use
to generate the list of co-ordinates implicit in p. The list we are interested in is simply:
zip [1..] p
If we have two pairs of co-ordinates (r1 , c1 ) and (r2 , c2 ) then the check for safety is that:
We know that the rows differ, so we only need to check the other three conditions. In Haskell we can
express this as:
We need to perform the check for all the co-ordinates we have generated, so safe is defined as:
28
5.4 Summary
safe p n =
and [ check (i,j) (m,n) | (i,j) <- zip [1..] p ]
where m = 1 + length p
The function and :: [Bool] -> Bool is from the standard prelude, and is defined as reduce
(&&) True.
The whole program is:
queens n = place n n
place _ 0 = [[]]
place size n = [p ++ [q] | p <- place size (n - 1),
q <- [1 .. size],
safe p q]
safe p n =
and [ check (i,j) (m,n) | (i,j) <- zip [1..] p ]
where m = 1 + length p
Working through the development of this algorithm may suggest a change in our representation
of the solutions. We used a very irredundant representation of a solution, and we generated the co-
ordinates when we needed them. We might instead decide to trade some space off against time and
use a quadruple (r, c, r + c, r − c) to represent the position of each Queen. Some careful analysis is
needed here. [7] discusses the n Queens problem in Prolog at some length.
5.4 Summary
In this Chapter we have introduced list comprehensions. They provide us with a powerful and succinct
way to describe lists. They operate using a ‘generate-and-test’ mechanism.
We will see more examples of list comprehensons as we proceed.
29
5 List comprehensions
Questions
1. Define filter using list comprehensions.
2. Define map using list comprehensions.
3. Program n Queens without using list comprehensions.
30
6 Case study: searching a graph
Graphs turn up all over the place. One simple and appealing problem where graphs come to our aid
is the following, from Programming in Prolog by Chris Mellish and William F. Clocksin [12]:
It is a dark and stormy night. As you drive slowly down a lonely country road, your car
breaks down, and you stop in front of a splendid palace. You go to the door, find it open,
and begin looking for a telephone. How do you search the palace without getting lost, and
know that you have searched every room? Also, what is the shortest path to the telephone?
It is just for such emergencies that graph searching methods have been devised.
Before we can get to the stage of writing a solution to this problem we have to look at how we will
represent a graph. We expect to represent a graph as a relation, where a relation will be represented
as a set of pairs. So we will have to look at how we can represent a set. Sets, of course, are best
represented using an abstract data type, so we will have to look at how Haskell handles ADT’s. In
order properly to discuss ADT’s we need to say something about algebraic types too. So our starting
point is a long way away from the graph search problem.
6.1 Summary
In order to complete the task of this Chapter we have to take a major detour through the type system
that Haskell provides us with. We will return to our initial problem in § 8.5 on page 56.
31
6 Case study: searching a graph
32
7 Types
Typing is a Good Thing. Our main concern is that the type system offers benefits for the programmer.
For example, an attempt to take the head of an integer can be spotted as a type error at compile time.
Early detection of errors makes them easier to fix. However we should also be aware that the type
system also provides the compiler with information. For example, in the standard reference on the C
programming language [44] types are first mentioned on p. 9, and, at the first mention of types, we
are told how many bytes are to be allocated for each of the basic types. Later on (p. 42) we are told
that:
A char is just a small integer, so chars may be freely used in arithmetic expressions.
There is clearly a tension here. Knowing that a letter and a number are both stored in the same amount
of space may allow the programmer to perform some cute tricks, but code full of cute tricks is hard
to understand and rarely clear, or easily modifiable. We follow Paulson’s advice quoted in § 5.2 on
page 27, and leave the information for the compiler implicit. It is the compiler writer’s job to exploit
this information, not the programmer’s.
id x = x
α→α
where α is a variable which ranges over types. This is the principal type of the identity function. In the
Hindley-Milner system:
1 The analogy between propositions and types is enough to make a career out of.
33
7 Types
34
7.3 Basic types
• Char characters e.g. ’a’, ’A’, ’0’. The most recent versions of Haskell (are supposed to) use
Unicode.
• Bool the type with the two values True and False in it.
Lists and tuples are also basic to Haskell. Lists behave as if they were defined by Program 2.1 on
page 5. The concrete syntax that Haskell uses for a list and for the type of lists is the same. Out of
context is it not clear whether [a] is the type of lists of objects of type a, or whether it is the singleton
list containing the variable a. So an alternative way to express Program 2.1 would be:
data [a] = []
| [a : [a]]
The concrete syntax for tuples has the same property, and tuples behave as if they were defined by:
There are lots of functions defined on the basic types in the standard prelude.
Notice that we start a type synonym off with type, rather than data.
7.5 Classes
We have already mentioned type classes, which, to borrow a phrase from Phil Wadler and Stephen
Blott [84], let us make ad hoc polymorphism less ad hoc. A class is a collection of types. A type which
is in a class is called an instance of that class. Classes can be derived from other classes. The derived
class inherits operations. There are some basic classes, and we can also define type classes ourselves.
The basic classes include:
• Eq
• Ord
• Enum
35
7 Types
• Bounded
• Show
• Read
• Num
• Fractional
class Eq a where
(==), (\=) :: a -> a -> Bool
x /= y = not (x == y)
x == y = not (x/=y)
Program 7.5: How the equality class might have been defined.
The definitions of == and /= are the default definitions. Any instance of the class will pick up the
default definitions, unless these are over-ridden. For example Bool is an instance of the Eq class:
The instance definition of == over-rides the default definition, and the default definition defines /=
for Bool.
There is nothing to stop us from defining, for example:
Haskell does not check that our definition of == is sensible: that it up to us.
The class Ord is a derived class. It is defined as:
max x y | x >= y = x
| y >= x = y
min x y | x <= y = x
| y <= x = y
36
7.6 Algebraic types
37
7 Types
Of course, we can use the wildcard pattern _ to cut out some clauses.
The Either type is rather general. As a more concrete example, suppose we were trying to represent
shapes. We might define a type like:3
2 From a session with HUGS on the SMCS Unix system.
3 As indeed Thompson does on p246 of [77].
38
7.6 Algebraic types
area (Circle r) = pi * r * r
area (Rectangle l b) = l * b
We can think of the two constructors Circle and Rectangle as functions of types Float ->
Shape and Float -> Float -> Shape, respectively.
Lets return to Either. We can think Inl and Inr as constructor functions too. Inl has type a ->
Either a b and Inr has type b -> Either a b.
We can write a higher-order function to make use of a value of type Either a b. This function will
take two auxiliary functions, of types a -> c and b ->c, and will apply the appropriate one.
when d _ (Inl x) = d x
when _ e (Inr y) = e y
You may want to compare Inl, Inr, and when with the natural deduction rules for ∨ introduction
and elimination:
A
∨ I left
A∨B
Rule 7.1: ∨ Introduction left
B
∨ I right
A∨B
Rule 7.2: ∨ Introduction right
[A] [B]
· ·
· ·
· ·
A∨B C C
∨E
C
Rule 7.3: ∨ Elimination
One tagged union type which has a lot of uses is Maybe a, defined as follows:
39
7 Types
A total function from α to β is defined for all values in α. A partial function from α to β is undefined
for some values in α. We can use Maybe to turn partial functions into total ones. Given a partial
function, f , from α to β, we define a total function, g, from α to Maybe β, which co-incides with f
wherever f is defined, and has the value Error elsewhere. For example:
totalhead [] = Error
totalhead l = Ok (head l)
totalfac n
| n < 0 = Error
| otherwise = Ok (fac n)
fac 0 = 1
fac n = n * fac (n-1)
Gofer?
fac (-3)
Error: the Macintosh stack collided with the heap
Gofer?
totalfac (-3)
Error :: Maybe Int
(9 reductions, 16 cells)
Gofer?
fac 3
6 :: Int
(11 reductions, 17 cells)
Gofer?
totalfac 3
Ok 6 :: Maybe Int
(18 reductions, 28 cells)
One important application of the totalising nature of Maybe is in error handling. We have already
seen this in § 3.3.1 on page 17, and Question 9 on page 43 relates to this. We will see a related
technique using monads in § 18.3.4 on page 158.
40
7.6 Algebraic types
• Zero is nullary and lets us make a natural number from nothing at all;
• Succ is unary and lets us construct a new Nat from an existing one.
In general, to write a function on the natural number we will write two clauses, one saying what to
do with Zero, and saying what to do with Succ n, where n :: Nat. Such functions will look like:
An immediate question is: what can we use to fill in xxx and yyy? In both xxx and yyy natfun
itself is in scope, so we can write recursive functions like:
The function for natural numbers which is analogous to listrec (Program 3.3 on page 16) is
natrec:
natrec d _ Zero = d
natrec d e (Succ n) = e n (natrec d e n)
The function nat2str follows the pattern described by natrec. Suppose we are using natrec to
define a function f. When defining f (Succ n) we are only allowed to use the value of f at n. This
pattern of recursion is called primitive or structural recursion.4 Haskell has a more liberal notion than
structural recursion. When we are defining f (Succ n) we can use the value of f at any number
whatsoever. So we can define functions like:
41
7 Types
Functions defined using structural recursion have pleasant termination properties: functions like
natbomb don’t. Why then do we allow this unlimited notion of recursion? As always you get nothing
for nothing. There are functions on the natural numbers that are clearly computable which cannot
be computed using (just) structural recursion on the natural numbers. The most famous of these are
the Ackermann functions. The original Ackermann function was constructed to show that the primitive
recursive functions are a proper subset of the computable functions. This function is clearly computable
but is not definable using primitive recursion.5 We have also seen other useful functions which are not
structurally recursive, such as quick from § 5.2 on page 26 and maximise and minimise from
Program 2.16 on page 10. The pleasant termination properties of structurally recursive functions come
from the fact that recursive calls are on values lower in the natural order generated by the inductive
definition of the type. In order to show termination (ignoring laziness, for the moment) for the more
general recursion that Haskell makes available we have to find some well-founded ordering such that
recursive calls are below the initial one in this order. In the case of quicksort this ordering is the length
of the list on which the recursive calls are made.
Important point
The crucial point is that there is a very close relationship between types defined by induction and
functions defined by recursion.
Functions written on Odd and Even will, typically, be mutually recursive. For example:
7.7 Summary
In this Chapter we have looked at:
• parametric polymorphism;
• ad hoc polymorphism, and type classes;
5 [63] goes into much more detail about this sort of stuff.
42
7.7 Summary
Questions
1. Draw an inheritance diagram for the basic classes.
2. What are the merits and demerits of the use of the notation [a] for both the list of types, and
particular types?
3. Investigate the relationship between Haskell’s type classes and object-oriented programming.
4. What are the rules for defining classes, properly?
5. Define types to represent:
a) arithmetic expressions, e.g. 5 + 6, 4 × −6, 2 × (3 + 8);
b) Prolog terms, e.g. X, x, f(x, X, g(h));
c) logical expressions, e.g. A ∧ B, A ⊃ (B ⊃ ¬(C ∨ A)).
6. Define oddrec and evenrec, the structural recursion operators for Odd and Even in Program
7.25. What observation can you make about oddrec and evenrec?
7. Define a type for Prolog terms using mutual recursion.
or as:
43
7 Types
44
8 Abstract data types
There are various more-or-less equivalent definitions of ADT’s in the literature:
• ‘An abstract type consists of the type name, the signature of the type, and the implementation
equations for the names in the signature’ [77];
• ‘An abstract data type consists of a type together with an explicit set of operations, which are the
only means by which to compute with values of that type.’ [61].
Implementing ADT’s requires us to hide some information, specifically exactly how the ADT is real-
ized. In this respect ADT’s are like type classes, and one of the formal approaches to ADT’s involves
describing them using existentially quantified types. [8] discusses this further, although not in a specifi-
cally functional setting. In Haskell we achieve the information hiding by using the modules.
8.1 Modules
Haskell has a module system. Using modules allows us to keep local information local, and only export
what should be exported. A module has an interface which tells us what is exported from the module,
that is which parts of it another module can import. The module system of Haskell gives us control over
what is imported and exported.
We call the module that we are creating Queue, and we will define, and export:
• a type Queue;
• emptyQ :: Queue a
45
8 Abstract data types
module Queue
( Queue,
emptyQ,
isEmptyQ,
addQ,
remQ
) where
????
this tell us that the module Queue defines (and exports) a type Queue and functions emptyQ, addQ,
remQ, isEmptyQ. There is no information here about the types of these functions, nor about their
properties. We adopt the convention used in [77] and add the types as comments:
module Queue
( Queue,
emptyQ, -- Queue a
isEmptyQ, -- Queue a -> Bool
addQ, -- a -> Queue a -> Queue a
remQ -- Queue -> (a, Queue a)
) where
????
Now we must fill in the ???? with the details of the implementation. Our first choice is which type
we will use to represent queues. Lists look like a good bet so we declare:
The keyword newtype is a new one. Program 8.3 has exactly the same meaning as:
emptyQ = Qu []
To define addQ and remQ we have to make a decision. Do we add elements to the head or the tail
of the list? Suppose we decide to add them at the tail. Then we would define:
46
8.2 Queue ADT using modules
module Queue
( Queue,
emptyQ, -- Queue a
isEmptyQ, -- Queue a -> Bool
addQ, -- a -> Queue a -> Queue a
remQ -- Queue -> (a, Queue a)
) where
emptyQ = Qu []
47
8 Abstract data types
There is one minor problem with the interaction between type classes and modules. We did not
declare show to be exported, but it is now available everywhere. Although show is not likely to cause
us problems it is conceivable that the class system could allow details of the concrete representation to
leak out, with unwanted consequences.
One option that we might have taken would be to use a data declaration and define:
The definitions of addQ and remQ given in Program 8.6, we based on a decision about where to add
and remove from the queue. Had we made the other choice we would have defined:
So, what difference does this make? As far as the equations which hold about adding and removing
items from queues go there is no difference at all. However, in Program 8.6 it is cheap to remove
items and expensive to add them; whereas in Program 8.10 it is cheap to add items and expensive
to remove them. Part of the point of having ADT’s is so that we can find better representations of our
datatypes and use them, without having to change our code globally. Can we find a representation of
queues in which it is cheap both to add and remove items? If we use a pair of lists, one, representing
the back of the queue, for adding to, and one, representing the front of the queue, for removing from,
then adding and removing will both be cheap, most of the time. Sometimes we will try to remove an
item from a queue which has a back, but no front, and this operation will be expensive. However this
operation will be no more expensive than the expensive operation in the naïve representation, and will
be performed a lot less often, and so overall we get a benefit. Our new implementation will now look
like:
48
8.3 A set ADT
emptyQ = Qu [] []
[59] gives a very thorough treatment of issues relating to purely functional data structures, particularly
in terms of their efficiency, and is well-worth further study.
• set intersection;
• set union;
• set difference;
• membership test;
• equality test;
• showing a set.
49
8 Abstract data types
So, the signature for the sets that we are defining will look like:
module Set
(Set,
empty, -- Set a
addSet, -- Ord a => a -> Set a -> Set a
memSet, -- Ord a => Set a -> a -> Bool
union, inter, diff, -- Ord a => Set a -> Set a -> Set a
eqSet, -- Eq a => Set a -> Set a -> Bool
showSet, -- (a -> String) -> Set a -> String
card -- Set a -> Int
)
where
????
There are a lot of other set operations that we might define in the ADT itself. For example we can
make a singleton set by inserting an item into the empty set; or we might decide that this is such a
commonly required operation that we should define it in the ADT.
Now we fill in the ???? in Program 8.12. We follow Thompson and choose to represent sets as
ordered lists, without repetitions. We will import the definitions from the List module, but since union
is already defined for lists we hide this definition. And we explain how Set a is an instance of Eq and
Ord. So we begin with:
Next we start to define the operations. We start with empty and addSet:
empty :: Set a
empty = SetI []
adds i [] = [i]
adds i (l@h:t)
| i < h = i:l
| i == h = l
| otherwise = h : (adds i t)
50
8.3 A set ADT
This looks fine, but it might strike us that we could define a function to turn a list into a set, and define
both empty and addSet in terms of this. With the definitions we have we have ensured that sets are
represented by ordered lists with no repetitions. Now we can get on with defining more operations:
and intersection:
51
8 Abstract data types
module Set
(Set,
empty, -- Set a
sing, -- a -> Set a
memSet, -- Ord a => Set a -> a -> Bool
union, inter, diff, -- Ord a => Set a -> Set a -> Set a
eqSet, -- Eq a => Set a -> Set a -> Bool
subSet, -- Ord a => Set a -> Set a -> Bool
makeSet, -- Ord a => [a] -> Set a
mapSet, -- Ord b => (a -> b) -> Set a -> Set b
filterSet,-- (a -> Bool) -> Set a -> Set a
foldSet, -- (a -> a -> a) -> a -> Set a -> a
showSet, -- (a -> String) -> Set a -> String
card -- Set a -> Int
)
where
????
8.4.1 Relations
A binary relation on a set is a set of pairs. In Haskell we can express this as:
Image
The image of a value v under a relation R is the set of all values x such that R(v, x). How do we
construct this set? We must find the second item in all the pairs in the relation whose first item is v.
Recall that a relation is a set, so:
filterSet ((==v).fst) r
will find the set of pairs in r which have v as the left item. Now we just need to form the set of the
right item from these pairs. So, all we need is:
52
8.4 Relations and graphs using the set ADT
Set image
We can extend this notion to form images of sets. The image of a set is the set of the images of the
items in the set. We can compute this by forming the set of images (of type Set (Set a)), and then
forming the union of these.
Composition
Given two relations we can form their composition, which will itself be a relation. In set comprehension
notation the composition is given by:
How do we compute this? Thompson does it by computing the set of all pairs of the pairs in the two
relations (i.e. pairs of pairs), and then filter out those which we are not interested it, and then taking
only the ‘outer’ items from the pair of pairs:
compose r1 r2 =
mapSet outers (filterSet innerseq (setProduct r1 r2))
where
outers ((x, _) (_, y)) = (x, y)
innerseq ((_, w) (z, _)) = w == z
SetProduct :: Ord a => Set a -> Set a -> Set a takes two sets and computes the
set of pairs, where the first item in the pair is from the first set and the second is from the second set.
Completing this definition is left as an exercise (Question 6).
53
8 Abstract data types
limit f x
| x == (f x) = x
| otherwise = limit f (f x)
The transitive closure of a relation is then the limit of composing the relation with itself:
8.4.2 Graphs
We can treat a relation as a description of a (directed) graph: the relation records which nodes have
arcs between them. Of course, as soon as we see a graph we are determined to search it. And as
soon as we think of searching we are not sure whether to choose depth- or breadth-first search. So we
follow Thompson and implement both. The type of both searches is the same:
Both these functions take a graph and a node and return all the nodes reachable from the given
node. The difference lies in the ordering of the list returned.
Implementing the solution to this problem brings us up against the ‘abstraction barrier’: we are
going to make use of a function to listify a set. Thompson defines this (as an extension to the set ADT)
as:
flatten (SetI l) = l
This really is a case of the implementation details leaking out. However we will live with this.
For both search strategies we will make use of a function to find the unvisited (immediate) descen-
dants of a given node:
Image r here computes the set of descendants of here. makeSet old is the set of nodes
already visited. Taking the difference of these two sets gives us the new nodes.
54
8.4 Relations and graphs using the set ADT
Breadth-first
Performing a breadth-first search is very like computing the transitive closure: we must simply build up
a list of visited nodes (in order, of course) until we reach a limit. Given a list of nodes reached after n
steps the list of nodes at n + 1 steps will be this list plus those new nodes reachable from them. If ris the
relation and xs the list of nodes already visited then map (findDescs r xs) xs :: [[a]] is
the list of lists of new nodes. In the standard prelude concat :: [[a]] -> [a] is defined which
will concatenate a list of lists into one list. So, concat (map (findDescs r xs) xs) :: [a]
is the list of new nodes reachable from xs. As r is a graph, not a tree, it is possible that there will
be duplicates in list. Again the standard prelude comes to our rescue: nub :: Eq a => [a] ->
[a] removes second and subsequent occurrences of an item from a list. Hence breadth-first search
can be implemented as:
breadthFirst r v =
limit step [v]
where step xs =
xs ++
nub (concat (map (findDescs r xs) xs))
Depth-first
Depth-first search is slightly less straightforward. We must carry a list of visited nodes around, and so
we define:
depthFirst r x = depthSearch r x []
depthSearch r x old =
x:depthList r (findDescs r (x:old)) (x:old)
Where depthList will find all the descendants of a list of nodes. It is defined as:
depthList :: Ord a => Relation a -> [a] -> [a] -> [a]
depthList _ [] _ = []
depthList r (h:t) old =
next ++ (depthList r t (old ++ next))
where
next = if elem h old
then []
else depthSearch r h old
55
8 Abstract data types
routes :: Ord a => Relation a -> a -> a -> [a] -> [[a]]
8.6 Summary
In this Chapter we have looked at:
• the module mechanism of Haskell;
• how we can define ADT’s using this;
• examples of ADT’s;
• examples of the use of an ADT.
56
8.6 Summary
Questions
1. Define mkSet :: Ord a => [a] -> Set a, and define both empty and addSet in terms
of this.
2. Define diff, from Program 8.12. Diff a b is the set of items in a, but not b.
3. Define leqSet, from Program 8.13.
4. Define the remaining operations in Program 8.19. One of the types given is not as general as it
might be. Which one is it?
5. [From [77]] A binary search tree is an object of BinTree a (as defined in Program 2.25 on
page 13 ) whose elements are ordered. The empty tree is ordered. A tree Node v l r is
ordered if: all the values in l are smaller than v; all the values in r are greater than v; l and r
are themselves ordered. Implement a search tree ADT.
6. Complete the definition of compose from Program 8.23.
7. Define the reflexive and the symmetric closures of a relation.
8. Implement best-first search.
9. Compare the implementation of depth-first and breadth-first search given in §8.4.2 with those
given in [61], [7], and your favourite imperative programming textbook.
10. Compare the algorithm in § 8.5 with that given in [12].
57
8 Abstract data types
58
9 Parser combinators
Parser combinators1 are discussed in a number of places, with tutorial introductions in [25, 42, 20].
[77] uses parser combinators as an example where laziness is exploited. We follow the development
given in [20]. The parser combinators that we define in this Chapter will allow us to construct recursive-
descent parsers. [15] discusses applications of recursive-descent techniques to compiling in some
depth.
This type looks like a good start, but a moment’s reflection will convince us that it is not really
good enough. This type is really the type of a rather monolithic parser. It is better not to try to build
monolithic parsers, but to use the support we have for modularity to build parser components. What
sort of behaviour would we expect of a parsing component? It should make sense of what it can parse,
and leave the rest of the string for the next component to work on. The type of a function which does
this is:
A parser now takes a list of symbols and returns a pair consisting of a list of symbols and a parse
tree. Can we do better than this? Suppose there are a number of different ways in which an initial part
of a list of symbols can be understood by our parser. For example, we may have a rule which says
that a variable name is a sequence of lower-case letters. Then the string which begins "xyz" could
be the variable x followed by something, the variable xy followed by something, or the variable xyz
followed by something. One way to deal with situations like this is to use some sort of backtracking
search mechanism, where we come back to explore alternative solutions when the one we first picked
fails. Another way is, rather than relying on failure, is to work with a list of successes. This technique
is described in [81]. We have already used it in § 8.5 on page 56 when we implemented depth-first
search using list comprehensions. Instead of producing just one possible pair we produce them all.
Then we have this type:
Now a parser is a function from lists of symbols to lists of list of symbol, parse tree pairs. In many
practical cases we are trying to parse strings, i.e. lists of Char, and so the type of parsers will usually
be String -> [(String, result)].
1 Remember that ‘combinator’ is just another word for ‘function’.
59
9 Parser combinators
Lbra −→ ‘(0
Rbra −→ ‘)0
Examples of such expressions are: (), ()()(), ((()())), ()()()(((((()())))())). We shall develop parser
combinators as we try to write a parser for this grammar.
The parser for Lbra will accept a (, that for Rbra a ). The parser of Expr will consist of a choice
between the parser and a sequence of parsers.
60
9.2 From grammar to parser
The parser single_symbol is clearly also generalisable to parse tokens larger then one symbol.
The parser token2 will recognise a token:
token k xs
| k == take n xs = [(drop n xs, k)]
| otherwise = []
where n = length k
More importantly we can now define parsers like digt, a parser to recognise digits:
61
9 Parser combinators
infixr 4 <|>
(<|>) :: (Parser s a) -> (Parser s a) -> Parser s a
(p1 <|> p2) xs = p1 xs ++ p2 xs
For convenience we define <|> as an infix operator. Thompson calls this operator alt. The parser
p1 <|> p2 finds all the parses that p1 finds and all the parses that p2 finds.
The second parser combining function that we define is <&>, which Thompson calls >*>.
infixr 6 <&>
(<&>) :: (Parser s a) -> (Parser s b) -> Parser s (a, b)
(p1 <&> p2) xs = [(xs2, (v1, v2)) |
(xs1, v1) <- p1 xs,
(xs2, v2) <- p2 xs1]
Again, for convenience, we define <&> as an infix operator. The parser p1 <&> p2 finds all the
parses which can be obtained by using p2 after using p1.
We can build a parser which always fails:
fail :: Parser s r
fail _ = []
And we can use succeed to implement epsilon, a parser for the empty string:
epsilon = succeed []
62
9.2 From grammar to parser
Gofer?
Reading script file "moreparsers.hs":
Parsing...Done
Dependency analysis...Done
Type checking...
ERROR "moreparsers.hs" (line 5): Type error in application
*** expression : (lbr <&> bras <&> rbr) <&> bras <|> epsilon
*** term : (lbr <&> bras <&> rbr) <&> bras
*** type : [Char] -> [([Char],((Char,(a,Char)),a))]
*** does not match : [b] -> [([b],[c])]
We have simply not thought about the types involved. we need to think carefully about the type of
parse trees that we are trying to construct. In this case we might define a type like:
We could then define a parser which constructed tree like this for us. A moment’s reflection will show
us that we can generalise here. Suppose we were not interested in merely that the brackets balanced,
but also are interested in the depth of their nesting. We might implement a function to compute the
depth of nesting which parsed a string of brackets, and constructed a tree, and then traversed the tree
to compute the nesting. Whenever we construct a data structure and then take it to pieces we should
hear alarm bells ringing. Why put something together only to take it apart? Why not compute what is
required as the tree is constructed? What we need is a way to apply an arbitrary function to the output
of the parser, not just a tree constructing function. We can define a function to do just this for us, using
a list comprehension:
infixl 5 <@
(<@) :: (Parser s a) -> (a -> b) -> Parser s b
We have replaced epsilon <@ k Nil with succeed Nil, for brevity. We get this behaviour:
Gofer?
bras "()"
[([],Node (Nil,Nil)), ("()",Nil)] :: [([Char],Tree)]
(64 reductions, 172 cells)
Gofer?
bras "(())"
[([],Node (Node (Nil,Nil),Nil)), ("(())",Nil)]
:: [([Char],Tree)]
(109 reductions, 277 cells)
63
9 Parser combinators
Gofer?
bras "()()"
[([],Node (Nil,Node (Nil,Nil))),
("()",Node (Nil,Nil)),
("()()",Nil)] :: [([Char],Tree)]
(118 reductions, 326 cells)
This is fine, as far a behaviour goes. If we look at the semantic function ((((_, a),_), b)
-> Node(a, b)) we see that it involves a rather complicated pattern to pick out the things we are
interested in and ignore the rest. The structure of this pattern reflects the structure of the parser itself.
In larger examples this pattern may become very much more complicated, and in any case we have
written the structure of the parser out in the parser already, and it is surely foolish to write it out again
just to do some pattern matching. Instead we can choose to use <@ to allow us to ignore the ignorable
when it is made, and rewrite program 9.19 as:
The final semantic function is certainly neater: in fact it could just be Node, but the use of fst and
snd makes the parser more complicated than it might be. Hence we define two new operators:
infixr 6 <&
(<&) :: (Parser s a) -> (Parser s b) -> Parser s a
p <& q = p <&> q <@ fst
infixr 6 &>
(&>) :: (Parser s a) -> (Parser s b) -> Parser s b
p &> q = p <&> q <@ snd
bras = (lbr &> bras <& rbr) <&> bras <@ Node
<|> succeed Nil
We claimed above that one of the advantages of <@ was that it would make it easy to implement
function which parsed a string of balanced brackets and computed the depth of the nesting:
nst = (lbr &> nst <& rbr) <&> nst <@ (\(x, y) ->
max (x+1) y)
<|> succeed 0
64
9.3 Extending our parsing toolset
Gofer?
nst "()()"
[([],1), ("()",1), ("()()",0)] :: [([Char],Int)]
(142 reductions, 317 cells)
Gofer?
nst "(())()"
[([],2), ("()",2), ("(())()",0)] :: [([Char],Int)]
(191 reductions, 408 cells)
We see immediately that nst and bras follow the same pattern, and whenever we see the same
pattern in two places we write a higher-order function to capture it:
In a more complicated example, where we might be parsing a program, we can see that <@ can
be used to apply whatever manipulations we may require to be performed on the expression of the
language.
9.3.1 sp
We would prefer our parsers not to get upset by white space. We define a function, sp, which will
apply a parser after dropping any white space, using one of the functions from the Standard Prelude:
65
9 Parser combinators
sp p = p . dropWhile isSpace
Program 9.27: sp
We can now go back and re-define foldparens, and nst to ignore white space:
Gofer?
nst2 "( (() ) )()( ) "
[([],3),
("( ) ",3),
("()( ) ",3),
("( (() ) )()( ) ",0)] :: [([Char],Int)]
(796 reductions, 1445 cells)
9.3.2 just
Another handy function is just, which ensures that there is nothing left over from the parse:
Just works by filtering out the successful parses which have some work left to do. For example:
Gofer?
(just nst2) "( (() ) )()( ) "
[([],3)] :: [([Char],Int)]
(756 reductions, 1296 cells)
9.3.3 some
Just is not quite the correct function: we really only want to see the 3 and not all the brackets. We
can remedy this by defining a new function some. Some is used where we know that there is exactly
one parse.
66
9.3 Extending our parsing toolset
Gofer?
(some nst2) "( (() ) )()( ) "
3 :: Int
(592 reductions, 1071 cells)
Whereas just has type (Parser a b) -> Parser a b, some has type (Parser a b) ->
[a] -> b. Since the effect of some is to construct a deterministic parser we define:
9.3.4 <:&>
There are a number of times when we can expect to parse a sequence of items, and return a list of
parse trees. For situations like this we define the operator <:&>, which makes a list from whatever <&>
gives back:
infixr 6 <:&>
∗
9.3.5 Kleene
The first use we find for <:&> is to define ∗ . A∗ is a list of zero or more A’s. We choose to implement
a unary function called star:
-- Kleene star
star :: (Parser s a) -> Parser s [a]
star p = p <:&> star p
<|> succeed []
67
9 Parser combinators
+
9.3.6 Kleene
+
And of course we next define . A+ is a list of one or more A’s. We choose to implement a unary
function called plus:
Suppose in our favourite language an identifier is any sequence of one or more alphabetic charac-
ters:
-- identifier
identifier :: Parser Char String
Gofer?
identifier "haskell"
[([],"haskell"),
("l","haskel"),
("ll","haske"),
("ell","hask"),
("kell","has"),
("skell","ha"),
("askell","h")] :: [([Char],[Char])]
(321 reductions, 864 cells)
9.3.7 first
This is probably not how we want a parser for an identifier to behave. We almost certainly don’t want
the backtracking in case of a later failure, and so we define first:
first p = (take 1) . p
-- identifier
identifier :: Parser Char String
68
9.3 Extending our parsing toolset
Gofer?
identifier "haskell"
[([],"haskell")] :: [([Char],[Char])]
(154 reductions, 351 cells)
9.3.8 bang
Because first affects backtracking behaviour it reminds us of Prolog’s !. So we define:
And of course we see that these two definitions follow the same pattern, so we define:
bang pc = first . pc
foldl f z [] = z
foldl f z (x:xs) = foldl f (f z x) xs
Foldl is a tail-recursive function which uses an accumulating argument, and can be used like this:
Gofer?
foldl (flip (:)) [] [1..5]
[5, 4, 3, 2, 1] :: [Int]
69
9 Parser combinators
Gofer?
(some natural) "123"
123 :: Int
(94 reductions, 195 cells)
9.3.9 optionally
We know that there are many grammars where we have optional elements, so we define a combinator
for these:
-- integer
minussign = symbol ’-’
Gofer?
(some integer) "123"
123 :: Int
(121 reductions, 254 cells)
Gofer?
(some integer) "-123"
-123 :: Int
(120 reductions, 253 cells)
70
9.4 Parsing sequences of items
The sorts of things that we will typically pack between two symbols will be lists of items separated by
(meaningless) separators, so lets define a parser combinator for these:
One particular type of list that we will require will be lists separated by commas:
71
9 Parser combinators
In Program 9.39 on page 68 we gave a suitable parser for identifiers, and in Program 9.46 on
page 70 we gave a suitable parser for integers. All we need do now is describe a list of items separated
by spaces:
Gofer?
sexpr "(set x 3)"
[([],Compound [Ident "set", Ident "x", Number 3])]
Gofer?
sexpr "(lambda (x y) x)"
[([],Compound [Ident "lambda",
Compound [Ident "x", Ident "y"], Ident "x"])]
Gofer?
sexpr "(defun k (lambda (x y) x))"
[([],Compound [Ident "defun",
Ident "k",
Compound [Ident "lambda",
Compound [Ident "x", Ident "y"],
Ident "x"]])]
72
9.4 Parsing sequences of items
5 + 4 − −3 ∗ 12
We can think of this expression as a list of integers separated by meaningful symbols. We have
already seen examples of expressions like this, when we use foldr (a.k.a. reduce) and foldl. We
can think of foldr (+) 0 [1..5] as a short-hand for:
(1 + (2 + (3 + (4 + (5 + 0)))))
(((((0 + 1) + 2) + 3) + 4) + 5)
Since + is associative these two expressions have the same value. this is not the case for a non-
associative operation, such as division on floating point numbers:
Gofer?
foldr (/) 1.0 [1.0 .. 5.0]
1.875 :: Float
(44 reductions, 82 cells)
Gofer?
foldl (/) 1.0 [1.0 .. 5.0]
0.00833333 :: Float
(44 reductions, 92 cells)
Notice that foldr makes the operator associate to the right, foldl to the left.
We shall explain how to parse a string like "1+2+3+4+5", and return the value we would expect
form computing 1 + 2 + 3 + 4 + 5. We are clearly going to adapt the listOf parser, but whereas
listOf threw the separators away we must keep and use them. Now we define a parser which will
parse a list of values separated by symbols representing functions, and apply the functions on the fly.
The type of this parser is:
We have jumped ahead of ourselves by calling the parser combinator chainl as we are presuming
that there will be a chainr. When we use chainl we will typically instantiate the type as:
(Parser Char Int) -> (Parser Char (Int-> Int -> Int)) -> Parser Char Int
chainl p s =
p <&> star (s <&> p) <@
(\(e0, l) -> foldl (\x (op, y) -> op x y) e0 l)
73
9 Parser combinators
chainr p s =
star (p <&> s) <&> p <@
(\(l, e0) -> foldr (\(x, op) y -> op x y) e0 l)
9.5 <&=>
We return to the problem of dealing with tuples that we came across before when we defined &> and
<&. We can define a new parser combinator to compose parsers in a way similar to <&>. We can the
new combinator <&=>:
74
9.6 Summary
infixr 6 <&=>
Some authors call <&=> bind. We have made <&=> deal with taking the tuples to pieces.
We can use <&=> in similar way to the way we used <&>. For example, a parser to read and add
two integers:
9.6 Summary
In this Chapter we have covered a lot of material.
75
9 Parser combinators
Questions
1. Write some parsers for e.g. arithmetic expressions, λ-terms, Prolog terms, your favourite pro-
gramming language.
2. Write the Chapter summary.
3. Find out why <&=> is also called bind.
76
10 The λ-calculus
In this Chapter we will look at the un-typed λ-calculus. In Chapter 11 we will look at the typed theory.
This should be the most formal part of the course, but we will avoid full formality wherever possible.
10.1 Introduction
The λ-calculus was invented in the 1930’s by the American logician Alonzo Church [9]. Church was
interested in the foundations of mathematics, and he hoped to construct a language in which all and
only the consistent part of mathematics could be expressed. We now know, thanks to Gödel [30], that
this is impossible, but the λ-calculus has proved to be useful in other ways, particularly in computer
science. In this respect Church was like Columbus: he set off in search of one thing, but found another,
rather more interesting, thing.
The λ-calculus was one of the first formal models of computation. Alan Turing’s work on Turing
machines [79] was contemporary,1 and the phrase ‘Turing machine’ was first used in print by Church,
in a review of [79] in the Journal of Symbolic Logic. At about the same time Emil Post proposed another
formal model of computation [67]. There have been various other formal models of computation
proposed, such as:
• Minsky systems;
• Markov systems;
Although they look very different all of these models of computation have been formally shown to
describe the same class of functions. Church’s Thesis is the assertion that this class of functions coincides
with our intuitive notion of what a computable function is, and can properly be called the computable
functions. Church’s Thesis is not amenable to formal proof as it relates an informal notion with a
formal one. To give a formal proof of Church’s Thesis we should have to formalise correctly our
informal notion of what a computable function is: that is we should have to have a proof of Church’s
Thesis to hand already.
The main feature of the λ-calculus is that it is a notation to discuss functions as rules. This contrasts
with the notion of functions as graphs. Since the latter part of the 19th century it has been common
for mathematicians to attempt to found mathematics on set theory, and to treat functions as a derived
notion. Using this model we treat a function F as a relation, that is as a set of pairs, with the extra
condition that if (x, y) ∈ F and (x, z) ∈ F then y = z. This is an extensional notion of what a function
is, and is quite reasonable for (some) mathematical applications, but it is a poor model for computer
scientists. The graph of the addition function is infinite: where do we store this infinite graph in a finite
machine? We have also seen many examples of pairs of functions which have the same graph, but
which behave very differently. We do not want to identify all sorting algorithms as being the same
function. As computer scientists we are concerned very much with algorithmic aspects of functions, and
we naturally want to think of a function as a rule, and treat the rules themselves as objects of study. The
λ-calculus provides us with an excellent setting in which to investigate functions-as-rules. Just as the
WAM provides us with a model for the implementation of Prolog, and the JVM provides us with a model
1 Church was Alan Turing’s Ph.D. supervisor at Princeton.
77
10 The λ-calculus
for the implementation of Java, so the λ-calculus provides us with a model for the implementation of
our favourite functional programming language. Of course, since all of these models of computation
are formally equivalent, they all provide us with an implementation model for each other, but not
necessarily a very user-friendly model.
The λ-calculus itself is very simple, but the theory which can be built around it is very rich. The
two best-known texts on the λ-calculus are probably [4] and [39]. [4] covers (almost) everything that
you could possibly want to know about the untyped theory; [39] is more introductory, and covers the
relationship with combinatory logic in some depth. [5] describes the contribution of the λ-calculus to
computer science.
• a variable; or
Such terms are called ‘pure’ because they do not involve constants. Applications and abstractions
are sometimes called compound terms. Variables (and, in the impure theory, constants) are sometimes
called atomic.
We will use x, y, z . . . , possibly with subscripts for (arbitrary) variables, and L, M , N . . . , for
(arbitrary) terms. We can describe such terms using this (almost formal) grammar:
Term −→ Var
| λVar . Term
| Term Term
| (Term)
Var −→ x, y, z, . . .
The terms described by our grammar have far too many brackets in them, so we adopt the usual
conventions:
• abstraction associates to the right, so we can write λx.λy.λz.M rather than λx.(λy.(λz.M ))
• we allow one λ to stand for many, and write λxyz.M rather than λx.λy.λz.M
• application binds tighter than abstraction, so we write λx.M N rather than λx.(M N )
2 At least as far as I am concerned.
3 That is: fiddly.
78
10.2 Syntax of λ-terms
x y λx.x λx.y
xy λy(λx.y) λx(λx.x) (λx.x x)(λz.z z)
Figure 10.2: Some λ-terms
The λ-calculus is a theory about when two terms are equal. We have the following basic rules for
equality:
M = M (reflexivity)
if L = M then M = L (symmetry)
if L = M and M = N then L = N (transitivity)
if L = M then NL = NM
if L = M then LN = M N
if L = M then λx.L = λx.M
Figure 10.3: Equality rules
We need to define various notions relating to the syntax of λ-terms. All these definitions should be
relatively familiar to computer scientists: the λ-calculus is the prototypical programming language,
after all.
Length of a term
The length of a term is the number of occurrences of variables in it. It is defined inductively:
|x| = 1
|M N | = |M | + |N |
|λx.M | = 1 + |M |
Subterms
subterms(x) = {x}
subterms(M N ) = {M N } ∪ subterms(M ) ∪ subterms(N )
subterms(λx.M ) = {λx.M } ∪ subterms(M )
Notice that:
79
10 The λ-calculus
Occurrence
The notion of an occurrence is a slightly different notion from that of a sub-term. A term N occurs in
a term M at a location, and it may occur at more than one location. A location can be labelled by
the path through the λ-term (considered as a tree) required to get there. We write P to indicate an
occurrence of P .
An occurrence of λx is called an abstractor and the occurrence of x in an abstractor is called a
binding occurrence. So z occurs in λz.M , even if it is not a subterm of M . Occurrences of a variable
in a term which are not binding occurrences are called non-binding occurrences.
We can define the occurrences in a term as follows:
occurences(M ) = occs(M, ↓)
occs(x, path) = [(path, x)]
occs(M N, path) = [(path, M N )]++occs(M, path .)++occs(N, path &)
occs(λx.M, path) = [(path, λx.M ), (path ., x)]++occs(M, path &)
We are using a cute notation of sequences of arrows to represent the paths, but strings of 0’s and
1’s will do fine too.
Components
All the occurrences of terms in M , other than binding occurrences of variables, are called components
of M .
Scope
If λx.P is a component of some term, then we say that:
• P is the body of λx.P
• P is the scope of the abstractor λx
The covering abstractors of P are the abstractors whose scopes contain P .
Notice that by this definition scopes don’t have holes.
x(λx.x)
80
10.2 Syntax of λ-terms
Combinators
A term in which no variable occurs free is called closed term or a combinator. The following table gives
some well-know combinators, and their commonly used names:
λx y z.x(y z) aka B
λx y z.x z y aka C
λx y.x aka K
λx.x aka I
λx y z.x z(y z) aka S
λx.(λy.x(y y))(λv.x(v v)) aka Y
λx y.y(x x y) aka A
AA aka Θ
λx y.y aka 0
λx y.x y aka 1
λx y.xn y aka n
10.2.1 α convertibility
Two terms which differ only in the names of their bound variables are said to be α-convertible. We
consider two α-convertible terms to be the same term. Thus, for example, λx.x and λy.y are indistin-
guishable, as are λxvu.ux(λy.xuλw.w) and λzux.xz(λy.zxλq.q).
If two terms L and M are syntactically identical we write:
L≡M
A term M in which an abstractor λx occurs, and in which a variable x occurs outside the scope of the
abstractor (and not in the abstractor itself!) is said to have a bound variable clash.
Consider the term:
λx.λx.x
λy.λx.y
λx.λy.y
Any term M with a bound variable clash can be replaced by a term N without a bound variable
clash, such that M ≡ N , by renaming the bound variable, using a new name which we pick from our
endless supply of variable names, and which does not occur anywhere in the term (or terms) we are
considering. Such a variable is called a fresh variable. In future we will attempt to avoid terms with
bound variable clashes. changing the notion of scope to allow scopes with holes is probably easiest,
but it is not done (explicitly) in [38].
81
10 The λ-calculus
λx.λy.x(λz.y z w)
Section Summary
De Bruijn terms deal with many of the problems associated with the naming of (bound) variables, by not
giving bound variables names. Each De Bruijn term can be thought of as representing an equivalence
class of ordinary λ-terms: if L ≡ M then they both correspond to the same De Bruijn term.
10.3 Substitution
We leave De Bruijn terms and continue working with ordinary λ-terms. We can define a notion of
substitution for terms. In fact we can define a variety of (sensible4 ) notions of substitution. We choose
to define a notion which is relatively close to an implementable function. One important feature of the
substitutions that we define is that substitution is always possible, although this may require re-naming
of bound variables.
We write:
[N/x]M
which we read as ‘substitute N for x in M ’, or ‘replace x with N in M ’. Beware that different authors
use different notations: instead of [N/x]M [4] writes M [x := N ] and [61] writes M [N/x].
We must take care not to capture variables. If L ≡ M then [N/x]L ≡ [N/x]M . Consider:
[x/y]λz.zy
This is just:
λz.zx
If we naïvely perform the substitution:
[x/y]λx.xy
4 We can define any number of nonsensical notions of substitution.
82
10.3 Substitution
we get:
λx.xx
which is a different term. When we substitute a term N into an abstraction we must take care that no
free variable in N comes into the scope of the abstractor. We can ensure this by re-naming the bound
variable, using a fresh variable. So:
[x/y]λx.x y ≡ λv.v x
if x ≡ y
[N/x]y = then N
else y
In the third clause we could rename the bound variable anyway, even if it was not needed, but this is
just pointless work. We could, perhaps, save a little work if we observe that [N/x]P = P if x 6∈ FV(P ),
but for the moment we are more concerned with clarity.
(∀x)(∀y)(∃z)(x = y + z ∨ y = x + z)
This is the valid proposition that for every two natural numbers there is a natural number which is
their difference. The difference may be zero, of course.
The rule for ∀ elimination is:
(∀v)P
[t/v]P
Rule 10.1: ∀ elimination
(∀x)(∀y)(∃z)(x = y + z ∨ y = x + z)
(∀y)(∃z)(z = y + z ∨ y = z + z)
Rule 10.2: Invalid ∀ elimination
(∀x)(∀y)(∃z)(x = y + z ∨ y = x + z)
(∀y)(∃w)(z = y + w ∨ y = z + w)
Rule 10.3: Valid ∀ elimination
83
10 The λ-calculus
In the conclusion to Rule 10.2 z is bound, in the conclusion to Rule 10.3 z is free. The conclusion
to Rule 10.2 is that every natural number is either zero or is even, the conclusion to Rule 10.3 is that,
for every natural number, there is a difference (possibly zero, of course) between it and an arbitrary
natural number.
10.4 Conversions
What we have done so far is to describe the syntax of λ-terms, and the (syntactic) operation of substi-
tution on them. We now connect substitution to an operation on the meaning of terms.
10.4.1 β reduction
If we have an abstraction applied to a term we can simplify the term, using a β reduction:
if L =β M then L = M
Figure 10.10: Extending the equality rules
Redex
A redex is a reducible expression, that is, an occurrence of an application of an abstraction.
5 Actually, this is confluence, but we will ignore the distinction. If a system (such as the λ-calculus with β-reduction) is Church-
Rosser, then it is also confluent. Conversely every confluent system is also Church-Rosser. For full details consult [3].
84
10.5 The Church-Rosser Theorem
♦
The
Church-
Rosser The-
orem is often
called the diamond
property, be-
cause we can
draw it
as:
♦
L
@
Bβ @ B
@ β
@
M R N
@
@
∃Bβ @ ∃Bβ
@
@
@
R
P
Figure 10.11: The diamond property
Proof
See [4], for the full details of proofs of the Church-Rosser theorem. We will outline one, due to Tait and
to Martin-Löf.
First we need a lemma: If a relation has the diamond property them so does its transitive closure.
Proof: draw a diagram!
Now we define a relation on λ-terms such that:
• is confluent;
M M
M M0
λx.M λx.M 0
M M0 N N0
M N M 0N 0
M M0 N N0
(λx.M )N [N 0 /x]M 0
Rule 10.4:
85
10 The λ-calculus
M M0 N N0 (λx.M 0 )N 0 B1β Q
(λx.M )N Q
Rule 10.5: Alternative clause for
1. is confluent;
M L
(∀N )(M N ⊃ (∃Q)(L Q&N Q))
Rule 10.6: Confluence of
The proof of this proceeds by ∀ introduction, ⊃ introduction, and then by induction on the definition
of .
We show 2 as follows. We think of our relations as sets of pairs. The reflexive closure of B1β is
a subset of , which is itself a subset of Bβ . However, since Bβ is the transitive closure of reflexive
closure of B1β it is also the transitive closure of .
Proof
Application of Church-Rosser Theorem.
Corollary 1.2 Normal forms are unique: any term which can be reduced to normal form can only be
reduced to one normal form.
Proof
Application of Church-Rosser Theorem.
Because normal forms are unique we can take the normal form of a term to be the value of the term.
Proof
By ‘consistent’ we mean that we cannot show P =β Q, for all P , Q. Let L and M be in normal form,
and be distinct (i.e. we do not have L ≡ M ). Suppose L =β M . Then L would have two normal forms,
itself and M . But this contradicts Corollary 1.2, hence ¬(L =β M ). Hence it is not the case that we
can show P =β Q, for all P , Q.
86
10.7 Reduction strategies
10.7.1 Leftmost
The leftmost redex of a term P is the redex whose leftmost parenthesis is to the left of all the other
parentheses of redexes in P . The leftmost reduction of a term P is the sequence of terms generated
from P by reducing the leftmost redex of each term generated.
Theorem 2 A term has a normal form iff its leftmost reduction is finite. The last term in the reduction
is the normal form.
Proof
See [4].
This theorem tells us that if any reduction strategy will find a normal form then the leftmost strategy
will find it. It does not tell us that every term has a normal form, nor even give us a bound on how long
we will have to look for a normal form for.
• first we show that we can represent the booleans, then we show that we can represent natural
numbers;
• then we get a bit serious and outline what is required to show that the λ-definable functions
coincide with the recursively definable functions.
We don’t get too serious: we do not properly define the class of the recursive function, nor do we
carry the proof through properly.
10.8.1 Booleans
The booleans are the values TRUE and FALSE. They have the property that TRUE and FALSE are
different. Furthermore we need to define IF where IF behaves like this:
IF TRUE L M = L
IF FALSE L M = M
How do we construct λ-terms which behave like this? To begin with IF had better take three ar-
guments, so it will look like λx y z.Q. The term (IF Q) will return either its first or second argument
depending on whether Q is TRUE or FALSE.6 Since we can’t test cases we will simply let TRUE and
6 We don’t care what happens if Q is neither.
87
10 The λ-calculus
FALSE decide whether to return the first or second argument. IF, TRUE and FALSE can then be defined
as:
IF =def λx y z.x y z
TRUE =def λx y.x
FALSE =def λx y.y
So:
IF TRUE L M
is just:
(λx y z.x y z)(λw v.w)L M
λx y.y aka 0
λx y.xn y aka n
With these numerals we can define addition, multiplication and exponentiation. (Exercise).
10.8.3 λ-definability
Now we outline the proof that the recursively definable functions are λ-definable. Recall that the class
of recursive functions is the least class of functions which:
– zero Z(n) = 0
– successor Succ(n) = n + 1
– projections Pin (x1 , . . . , xn ) = xi
• is closed under:
– composition,
– primitive recursion,
– minimisation;
We will not show all the details here: the interesting part is how we can represent recursion, so we
look at this. The representation of the minimisation follows similar lines, but we will not cover this.
88
10.8 Representing data and functions
Proof
Let W =def λx.F (xx) and X =def W W . Now, W W B1β F (W W ), which is just F (X).
Y0 =def Y
n+1
Y =def Yn (SI)
Y1 is just Θ.
This is all good stuff, but what does it tell us about recursion? We will show that we can use Y to
represent the recursively defined functions. As an example consider the the factorial function:
fac 0 = 1
fac n = n * fac (n -1)
fac n = if (n == 0)
then 1
else (n * fac (n -1))
89
10 The λ-calculus
Or as:
fac = \n -> if (n == 0)
then 1
else (n * fac (n -1))
Now, we will suppose that we are allowed to make recursive definitions in theλ-calculus, and can
write definitions like:
fac =rec λn.if (n = 0) then 1 else n ∗ fac(n − 1)
We have used =rec to emphasise that this is a recursive definition.
We will show that we can give a definition of fac which does not require that we use recursion. The
other parts of the definition are all things that we know how to encode in the λ-calculus already.
The first step is to do a β expansion, and introduce a redex:
q = Hq
This definition does not use recursion explicitly: we are using Y instead. So, we have shown that
we can represent the factorial function in the λ-calculus, without needing to add a new mechanism for
making recursive definitions. We can generalise from here to show tha any recursively defined function
can be encoded in the λ-calculus.
λx1 . . . xn .xM1 . . . Mm
where n, m ≥ 0
90
10.10 Graphs and laziness
λx1 . . . xm .L1 . . . Ln
where n, m ≥ 0, m > n
Any term which is in NF is also in HNF, although not necessarily vice versa, and a term which is in
HNF is also in WHNF, although not necessarily vice versa.
These notions of normal form are of interest to us, because the represent intermediate points on
the road to (fully) normal forms. A term in WHNF tells us enough about itself for us to say what its
outermost form is, without telling us exactly what its value is. But this is just what we need for laziness.
10.10.1 η reduction
So far we have only discussed the λ-calculus with β-reduction. This calculus is sometimes called the
λβ -calculus. We can add another notion of reduction, called η-reduction:
Now we should go thorough all the previous definitions of reduction, redex and so on and call
them β-redex, β-normal form, and so on. Then we can define η-redex, η-normal form and so on.
The calculus with both β and η reductions is sometimes called the λβη -calculus. The Church-Rosser
Theorem holds for the λβη -calculus.
10.11 Summary
In this chapter we have looked at the un-typed λ-calculus.
In fact we have looked at one untyped calculus: there are a lot of variations on the principal theme,
and we have not touched on these. The interested reader is referred to [4] for more details.
91
10 The λ-calculus
92
11 The typed λ-calculus
Thee are two intertwined aspects to the relation between λ-calculus and functional programming:
If we focus on 2 it strikes us immediately that the λ-calculus that we have discussed so far is un-typed,
and types play a hugely important rôle in Haskell. We need to consider a typed λ-calculus. There is
one crucial consequence of adding types: any typable term has a normal form. Hence the slogan:
typing implies termination.
This is a remarkable result. Typing is a purely static, syntactic notion, and yet it tells us about the
dynamic behaviour of terms. Unfortunately we get nothing for nothing. We are unable to type any fix-
point finder. This is a bit of a blow as we used fix-point finders to show that the recursive functions were
λ-definable. If we have no fix-point finders then we are forced to add recursion operators explicitly.
We must take care here. If we add recursion operators that are too powerful we will lose the property
‘typing implies termination’. On the other hand if we add weak recursion operators we find that there
are functions which we cannot express. The choice which Haskell makes is, of course, to give up ‘typing
implies termination’.
TyVar −→ a, b, c, . . .
M :τ
93
11 The typed λ-calculus
A type context or environment is a set of type assignments. Upper-case Greek letters Γ, ∆ . . . are
usually used for type contexts. We will typically abuse notation and omit braces and commas from
sets.1
The rules for type assignment given in [38] are (almost):
Γ 7→ x : α (x : α ∈ Γ)
Γ 7→ P : σ → τ ∆ 7→ Q : σ
→ Elim
Γ∆ 7→ P Q : τ
Γ, x : σ 7→ P : τ
→ Intro
Γ 7→ λx.P : σ → τ
Rule 11.1: Type assignment
These rules are expressed in a hybrid system which mixes Gentzen’s natural deduction (N style)
systems, and his sequent calculus (L style) systems to get the worst of both. Such systems might be
called NL systems and offer neither the freedom of natural deduction proper, nor the precision of the
sequent calculus proper. If one were a logician, and wrote ⊃ instead of →, and upper-case Roman
letters instead of lower-case Greek ones, and ignored the λ-terms one would get:
Γ 7→ A (A ∈ Γ)
Γ 7→ A ⊃ B ∆ 7→ A
⊃ Elim
Γ∆ 7→ B
Γ, A 7→ B
⊃ Intro
Γ 7→ A ⊃ B
Rule 11.2: The implicational fragment of minimal propositional logic
x : a 7→ x : a
→ Intro
7→ λx.x : a → a
Rule 11.3: I : a → a
x : a, y : b 7→ x : a
→ Intro
x : a 7→ λy.x : b → a
→ Intro
7→ λxy.x : a → b → a
Rule 11.4: K : a → b → a
x : a → a 7→ x : a → a x : a 7→ x : a
→ Intro → Intro
7→ λx.x : (a → a) → a → a 7→ λx.x : a → a
Defn. Defn.
7→ I : (a → a) → a → a 7→ I : a → a
→ Elim
7→ II : a → a
Rule 11.5: II : a → a
We will make clearer exactly what ‘same’ means in the next section.
1 The notation we are using is originally from Gentzen’s sequent calculus [27] where Γ denotes a list (sequence) of formulas.
The use of sets frees us from having to state a number of so-called ‘structural’ rules.
94
11.2 Type inference, and the principal type algorithm
We see that there is an obvious relationship between proof search and type inference. In the next
section we will outline a type inference algorithm.
a→a
(a → b) → a → b
((a → b) → a → b) → (a → b) → a → b
The language that we are using here should sound reminiscent of the language that we use when
talking about unification of simple terms (like Prolog terms, for example). Recall that if two Prolog
terms unify then they have a most general unifier. The similarity is no accident, as the type inference
algorithm depends on unification. One way to think of unification is in terms of equation solving, or
constraint satisfaction and we can think of the type inference algorithm as setting up constraints on
types (alternatively: equations between type variables).
The principal type algorithm takes a term of the λ-calculus and returns either:
The basis of the algorithm is that we use the rules from Figure 11.1 to tell us how to attempt to
construct a type for the term we have been given. We use unification to match the types. Since we
construct a most-general unifier, we construct a principal type.
We defer the details of the type inference algorithm until Chapter 15.
Gofer?
\x -> x x
Gofer?
\x -> (\y -> x(y y))(\v -> x(v v))
ERROR: Type error in application
*** expression : y y
*** term : y
*** type : a -> b
*** does not match : a
*** because : unification would give infinite type
95
11 The typed λ-calculus
The second of these terms presents us with a problem: it is the Y combinator and in Chapter 10
we used Y to represent the recursive functions. The untypability of Y would seem to be a blow to
the expressive power of the un-typed λ-calculus. A moment’s reflection will convince us that this is
inevitable. If any typable term has a normal form, and terms are programs, and normal forms values
then there must be terms which we cannot type, or else we would be able to solve the halting problem.
All is not totally lost. We can define recursion operators, and give types for them, and we can use them
to express programs. Our problem now is that there are, inevitably, things we can’t express.
Notice that Haskell’s type system does not have the property that well-typed programs will terminate:
typing and termination are separated. Well-typed programs have fewer ways to go wrong than un-
typed programs.
7→ [] : List(τ )
Γ 7→ P : τ ∆ 7→ Q : List(τ )
Γ, ∆ 7→ Cons(P, Q) : List(τ )
Rule 11.6: Lists
The rules should remind you strongly of the type declaration for lists in Haskell given in Program 7.2
on page 35. Along with this definition we define a structural recursion operator for lists, listrec . We
must explain how to compute with listrec, i.e. we define new redexes and give their reduction rules:
l B [] d B d0
listrec(d, e, l) B d0
If we use the structural recursion operator listrec we retain the property ‘typing is termination’. We
loose this property if we use a more powerful recursion.
The extended type system that we are considering here, with dependant types, and the ability to
add new types freely is very expressive. In fact it is expressive enough to allow us to write program
specifications as types. Already in Haskell we can think of the type of a function as describing its
behaviour, that is as a partial specification. The specification of a program is a proposition, and
2 At least if we wish to retain consistency, and ‘typing as termination’.
3 We are allowing dependent types, which is a large step away from the system we have had so far, and much stronger than
the system which Haskell has.
96
11.5 Summary
the identification of propositions as types is often called the Curry-Howard isomorphism, although I
personally prefer ‘propositions-as-types analogy’. The step from Figures 11.1 to 11.2 is, essentially,
the recognition of propositions as types. [19] gives a general introduction to the ideas behind Curry-
Howard. The work of Per Martin-Löf [55] and others [58, 69, 76] on type theory is an extended exercise
on the utility of this idea in programming.
11.5 Summary
In this Chapter we have:
• discussed typing of the λ-calculus;
• mentioned the principal type algorithm;
97
11 The typed λ-calculus
98
12 Continuations
Informally, a continuation is a function which tell us ‘what to do next’ [2]. We can think of the contin-
uation as embodying the ‘future’ of the computation. It is natural to think of state in dynamic terms,
so this intuition helps us to see how continuations give us one way to handle stateful computation in a
functional setting.
Sections 12.1 to 12.3 are based on Chapter 3 of [50].
In many situations continuation-based approaches can be replaced by approaches based on the use
of monads, which we discuss in Chapter 18.
x1 −→ x01 ψ(x01 , . . .) −→ α
φ(x1 , . . .) −→ α
This can be read roughly as: ‘To evaluate the function φ, we must evaluate the function ψ.’ Alas the
crudity of this reading leads to two different definitions of tail recursion:
• the first is that a function is tail-recursive if recursive calls are tail calls (for example, §10.2 of
[52]);
• the second is that a function is tail-recursive if all calls are tail calls (for example, §6.8 of [76]).
As a simple example we look at 3 versions of a function to compute the length of a list.
We will call a function which is not tail-recursive a direct function. The function len in Program 12.1
is a direct function to compute the length of a list.
len [] = 0
len h:t = 1 + len t
The function len1 in Program 12.2 is tail-recursive by the first definition, but not the second. A call
to len1 l 0 will compute the length of the list l.
len1 [] n = n
len1 (h:t) n = len1 t (n + 1)
The function len2 in Program 12.3 is a variant of this function which is tail-recursive by the second
definition too. A call to len2 l f will compute the value of the function f applied to the length of the
list l.
99
12 Continuations
len2 [] k = k 0
len2 (h:t) k = len2 t (\n -> k(n + 1))
The auxiliary argument k to len2 in Program 12.3 is called a tail function or continuation. A
function like len2 is said to written in continuation-passing style. We will now illustrate some of the
uses of continuations.
cpsfac 0 k = k 1
cpsfac n k = cpsfac (n - 1) (\x -> k(n * x))
A call to cpsfac n f will compute the value of the function f applied to the value of n!. Notice
that in fac in Program 10.1 the multiplication is outside the recursion, whereas in cpsfac in Program
12.4 the multiplication is inside the recursion. This is the typical pattern that we see when we write a
CPS function.
Suppose we call cpsfac 3 k for some arbitrary k. Evaluation will be as follows (we name some of
the expressions involved to aid readability):
cpsfac 3 k
=> cpsfac 2 (\x -> k(3 * x))
=> cpsfac 1 (\y -> k’(2 * y))
-- where k’ is \x -> k(3 * x)
=> cpsfac 0 (\z -> k’’(1 * z))
-- where k’’ is \y -> k’(2 * y)
=> k’’’ 1
-- where k’’’ is \z -> k’’(1 * z)
=> k’’ 1
=> k’ 2
k 6
100
12.2 Some simple functions
fib 0 = 1
fib 1 = 1
fib n = fib (n - 1) + fib (n - 2)
Notice that this function is not written in primitive recursive form. Since we are interested in functions
written in primitive recursive form we look at a primitive recursive version of Fibonacci’s function. We
define an auxiliary function:
fibs 0 = (1, 1)
fibs n = (snd(fibs (n - 1)),
snd(fibs (n - 1)) + fst(fibs (n - 1)))
fibs 0 = (1, 1)
fibs n = let (lo, hi) = fibs (n-1)
in (hi, hi + lo)
These two functions are essentially the same: the let just aids readability.2 CPS-converting these
functions is straightforward: again we supply an auxiliary argument and again the order of the opera-
tions on the right gets inverted:
cpsfibs 0 k = k (1, 1)
cpsfibs n k = cpsfibs (n - 1)
(\(lo, hi) -> k(hi, hi + lo))
Evaluation of cpsfibs 3 k for some arbitrary k will be as follows (allowing for the simplification
of arithmetic expressions):
101
12 Continuations
cpsfibs 3 k
=> cpsfibs 2 (\(a, b) -> k(b, b + a))
=> cpsfibs 1 (\(p, q) -> k’(q, q + p))
-- where k’ is \(a, b) -> k(b, b + a)
=> cpsfibs 0 (\(x, y) -> k’’(y, y + x))
-- where k’’ is \(p, q) -> k’(q, q + p)
=> k’’’(1, 1) -- where k’’’ is \(x, y) -> k’’(y, y + x)
=> k’’(1, 2)
=> k’(2, 3)
k (3, 5)
We are now at a point where we can make some observations about the CPS versions of the functions
we have produced. The first observation is that the time complexity of cpsfibs is much better than
that of fib. The second observation is that there is a possibility to optimise the evaluation of cpsfib.
Whatever structure we build to represent the call to cpsfibs 3 k can simply be replaced by the
structure that we build to represent the call to cpsfibs 2 (\(a, b) -> k(b, b + a)). In fact
the algorithm that we have written is very similar to the C [44] function in Program 12.10.
The function repeat in Program 12.9 implements a looping construct in Haskell.
repeat 0 f a = f a
repeat n f a = repeat (n-1) (\x -> f x) a
lo = 0;
hi = 1;
So we see that the use of tail-recursive functions allows us to write algorithms which behave like
imperative loops, and which a compiler can treat in the same way as it treats an imperative loop. This
is a crucial point: using continuations lets us write imperative algorithms in a functional language. Thus
we see that we can use continuations to allow us to handle computations which involve state. The use
of continuations which we discuss later in this chapter are essentially the application of this observation
to various specific problems.
102
12.3 Uses of continuations
hcf n 0 = n
hcf n m = hcf m (n ‘rem‘ m)
x → λk.kx
λx.M → λk.k(λx.M )
M N → λk.M (λm.N (λn.mnk))
Where k, m and n are chosen to avoid variable capture in the usual way. Plotkin proves some
theorems relating to the values of M and M when call-by-name and call-by-value evaluation strategies
are used.
The effect of Plotkin’s CPS conversion on some terms is shown in Figure 12.4. We have named the
newly introduced bound variables kn .
x → λk1 .k1 x
λk.kx → λk1 .k1 (λk.(λk2 .(λk3 .k3 k)(λk4 .(λk5 .k5 x)(λk6 .k4 k6 k2 ))))
λx.x → λk1 .k1 (λx.(λk2 .k2 x))
xy → λk1 .(λk2 .k2 x)(λk3 .(λk4 .k4 y)(λk5 .k3 k5 k1 ))
103
12 Continuations
• CPS-conversion can produce terms which tell us useful things about how they will be evaluated;
• such terms likely to tell us a lot of things which are not really very useful, so they need to be
optimised.
There is a large literature on the use of CPS in compilation: [2, 24] provide a good start.
Another direction that we can follow from the use of CPS-conversions leads us to the extraction of
constructive content from classical proofs. Notoriously, classical proofs need not contain any construc-
tive content. However, we know that we can use the double-negation transformation to produce a
intuitionistic theorem from a classical one. In [26] Friedman introduced a related technique, called
A-translation, which allowed him to show that Peano arithmetic is a conservative extension of Heyting
arithmetic, for Π02 sentences. The constructive content of Friedman’s proof is that we can convert a
classical proof of a Π02 sentence into a constructive one. The constructed proof can be interpreted
as the application of a non-local control operator applied to a CPS-conversion of the classical proof.
The control operator allows us to replace the current evaluation context with a different one, just as
goto allows us to make non-local jumps in imperative programs. There is an extensive literature on
control operators: [13, 71] are a beginning. [33, 57] provide much more detail on the extraction of
constructive content from classical proofs, and the relation with control operators.
Just as in imperative programming we can make jumps available to the programmer by providing
goto we can make the control operator available to the programmer. The is done in Scheme [43]
and Standard ML of New Jersey [2] using a callcc (call-with-current-continuation) primitive. [74]
describes the representation of jumps with continuations in more detail. Just as goto allows the
programmer to invent control structures, so does callcc, with all that this entails. Continuations
allow us to implement threads, as discussed in, for example, [2] and [22].
Again, connected with their rôle of representing control in a functional setting continuations have
applications in denotational semantics as discussed in, for example, [70].
• we can stop;
• we can read a value, and continue by performing some computation with it;
At this level the world is almost as simple: it is just a pair of lists of natural numbers. We see that
CPS functions let us thread a state or world value through our programs.
104
12.4 Further examples
cpsid a c = c a
Notice that cpsid is just apply, with its argument supplied the other way around.
apply f x = f x
The function split, given in Program 12.14, is the basic function to split pairs (2-tuples) into their
components.
split f (a, b) = f a b
cpssplit f (a, b) k = k (f a b)
105
12 Continuations
The use of append is a bit unfortunate: we should define a CPS version of append:
cpsappend [] a k = k a
cpsappend (h:t) a k = cpsappend t a (\r -> k (h:r))
Now we have imposed an order on the evaluation that was not here explicily before.
We can define CPS versions of addition, multiplication, and exponentation:
106
12.4 Further examples
infixr .+.
infixr .*.
infixr .^.
Example: cpsreduce
There is a general pattern to the CPS functions that we write. In the case of lists we have the following
basic form:
f [] k = k D
f (h:t) k = f t (\r -> k (E h r))
Having identified this basic patttern we can produce a CPS version of reduce:
cpsreduce d _ [] k = k d
cpsreduce d e (h:t) k = cpsreduce d e t (\r -> k (e h r))
The same can be done for all the other inductively defined types we have come across.
Example: λ-terms
One obvious thing to do with λ-terms is to write a pretty-printer, or unparser. The CPS unparser will
have type Term -> (String ->a) -> a. We adopt a strategy of writing a function which we know
to be flawed, and then correct it. The deliberate error that we make is to ignore the use of brackets in
the string we are generating. The following function is a first attempt:
107
12 Continuations
cpsunparse (Var x) k = k x
cpsunparse (App l m) k =
cpsunparse l (\lstring ->
cpsunparse m (\mstring ->
k(lstring ++ " " ++ mstring)))
cpsunparse (Abs x m) k =
cpsunparse m (\mstring ->
k("\\ " ++ x ++ " . " ++ mstring))
This function does not use CPS for appending the strings. We use Program 12.19 on page 106 and
define:
up2 (Var x) k = k x
up2 (App l m) k =
up2 l (\lstring ->
up2 m (\mstring ->
cpsappend lstring " " (\r ->
cpsappend r mstring k)))
up2 (Abs x m) k =
up2 m (\mstring ->
cpsappend "\\ " x (\r ->
cpsappend r " . " (\s ->
cpsappend s mstring k)))
This is beginnning to look very typical of CPS code. We have pushed the continuation k as far into
the function as possible. Consequently we have named a lot of intermediate results.
This code is still not correct, as it fails to put brackets in where they are needed:
Gofer?
up2 (App (Var "x") (App (Var "y") (Var "z"))) id
x y z
Gofer?
up2 (App (App (Var "x") (Var "y")) (Var "z")) id
x y z
Gofer?
up2 (App (Abs "x" (Var "x")) (Var "y")) id
\ x . x y
Now we address this problem. We know that we will need to bracket expressions, so we define4 :
108
12.4 Further examples
brack string k =
cpsappend "(" string (\r -> cpsappend r ")" k)
We can use Haskell’s pattern matching to solve the problem. Instead of having one clause to deal
with application terms we will have several:
The full function is given in Program 12.31 on the following page. For comparison, the same function
written in non-CPS style, is given in Program 12.32 on the next page. Now we get:
Gofer?
up2 (App (App (Var "x") (Var "y")) (Var "z")) id
x y z
Gofer?
up2 (App (Var "x") (App (Var "y") (Var "z"))) id
x (y z)
Gofer?
up2 (Abs "x" (App (Var "x") (Var "y"))) id
\ x . x y
Gofer?
up2 (App (Abs "x" (Var "x")) (Var "y")) id
(\ x . x) y
109
12 Continuations
up2 (Var x) k = k x
up2 (App l m@(App _ _)) k =
up2 l (\lstring ->
up2 m (\mstring0 ->
brack mstring0 (\mstring ->
cpsappend lstring " " (\r ->
cpsappend r mstring k))))
up2 (App l@(Abs _ _ ) m) k =
up2 l (\lstring0 ->
up2 m (\mstring ->
brack lstring0 (\lstring ->
cpsappend lstring " " (\r ->
cpsappend r mstring k))))
up2 (App l m) k =
up2 l (\lstring ->
up2 m (\mstring ->
cpsappend lstring " " (\r ->
cpsappend r mstring k)))
up2 (Abs x m) k =
up2 m (\mstring ->
cpsappend "\\ " x (\r ->
cpsappend r " . " (\s ->
cpsappend s mstring k)))
up3 (Var x) = x
up3 (App l m@(App _ _)) =
up3 l ++ " " ++ (brack1 (up3 m))
up3 (App l@(Abs _ _ ) m) =
brack1 (up3 l) ++ " " ++ (up3 m)
up3 (App l m) =
up3 l ++ " " ++ (up3 m)
up3 (Abs x m) =
"\\ " ++ x ++ " . " ++ (up3 m)
110
13 Case study: unification
13.1 Premable
Let a term be either:
• a variable;
Question 1 Given two terms t1 and t2 , is there a way to replace the variables in t1 and t2 with terms,
such that the two resulting terms are the same term?
13.1.1 Examples
In the following x, y, z are variables, f, g, h functors. In the possible replacements all variables not
mentioned are assumed to be left unchanged.
• if a pair of terms can be made the same then there are lots of ways to do this, but;
111
13 Case study: unification
We should derive the show function automatically, of course. There are a lot of possible variations
on this type. One interesting one eschews the use of lists, and uses two mutually inductive types:
In order to replace a single variable by a term, leaving all other variables unaffected, we define a
function called mksubst:
Program 13.5: Constructing a substitution to replace the variable x with the term t
112
13.3 Representing substitutions
Now we get:
Main> (mksubst (Compound "f" []) "x") "y"
Var "y" :: Term
Main> (mksubst (Compound "f" []) "x") "x"
Compound "f" [] :: Term
idsubst :: Substitution
idsubst x = Var x
idsubst :: Substitution
idsubst = Var
Now we get:
Main> (extend (mksubst (Compound "f" []) "x")) (Var "y")
Var "y" :: Term
Main> (extend (mksubst (Compound "f" []) "x")) (Var "x")
Compound "f" [] :: Term
113
13 Case study: unification
Main> (extend
(mksubst (Compound "f" []) "x"))
(Compound "g" [(Var "x"), (Var "y")])
Compound "g" [Compound "f" [],Var "y"] :: Term
infixr #
Definition 1 (Unifier) A unifier of two terms t1 and t2 is a substitution σ such that σ ∗ t1 and σ ∗ t2 are
equal terms.
Definition 2 (Basis) The basis of a substitution σ is the set of variables x such that σx differs from ιx.
The basis of a substitution is the set of variables which the substitution really affects. All the substitu-
tions that we are concerned with will have a finite basis.
Definition 3 (Set map) Let f be a function and S be a set. The set map of f on S is the set {f s|s ∈
S}.
Definition 4 (Cycle) Let σ be a substitution and B be its basis. σ is a cycle if the set map of σ on B
is the same as the set map of ι on B.
Informally, a cycle maps some variables onto themselves. We can define functions like the following
to construct cycles:
114
13.5 Continuing to answer our question
cycle2 (x, y) v
| v == x = Var y
| v == y = Var x
| otherwise = Var v
cycle3 (x, y, z) v
| v == x = Var z
| v == y = Var x
| v == z = Var y
| otherwise = Var v
Definition 5 (Generality) Let σ and τ be substitutions. We say that σ is more general than τ if there
is a substitution γ such that τ = γ ◦ σ.
The phrase more general than is a slight misnomer. First, every substitution is more general than
itself since every substitution is the same as itself composed with the identity substitution. Second, let γ
and δ be cycles which have the same basis. Then γ and δ are more general than each other.
If a substitution is idempotent then applying it once is enough. Notice that cycles are not, with the
exception of ι, idempotent.
Notice that this substitution is a most general, idempotent unifier. We could just as easily have chosen
mksubst (Var x) y. This too is a most general, idempotent unifier.
115
13 Case study: unification
Unfortunately, when we apply the unifier of x and f (x) to x and f (x) we get f (x) and f (f (x)):
So, close, but no cigar. The solution is that we have to check whether the variable occurs in the
compound term with which it is being unified. We can write the occurs check:
occurs x (Var z) = x == z
occurs x (Compound _ ys) = any (occurs x) ys
Program 13.17: A less naïve attempt at the second and third clauses of Program 13.13
occurs x (Var z) = x == z
occurs x (Compound _ ys) = occurs_all x ys
116
13.5 Continuing to answer our question
In some ways this is actually a more natural way to express this function, and we will see that the
final clause of unifier follows this pattern.
Now we look at the final clause. We know that we can only unify two compound terms which have the
same functor. We also know that the functors must have the same arity. We can check that functors are
the same easily. The arity of a functor is implicit in the length of the list of terms it is applied to. We
could check this by comparing the lengths of the lists. We are working our way towards:
What do we the do with xs and ys? Perhaps we should match on the lists in clause four:
The most general, idempotent unifier of a constant with itself is the identity substitution.
Clearly the testing of the length in this function is a waste of time. Furthermore we know that we
need to apply sigma to xs and ys. So we are getting towards:
117
13 Case study: unification
We still have to do something with xs’ and ys’. We just don’t know how to unify a list of terms.
Perhaps we should look at an analogy with Program 13.18, and implement a pair of mutually recursive
functions with types:
We then get:
and:
118
13.6 Termination of unifier
The function termsmap maps a function over a Terms, just like map maps a function over a list. So
we have solved the problem of unifying lists of terms. Notice also that we only test whether the functors
are the same at one place. Now if we go back to the type we previously had we get:
unifierall [] [] = idsubst
unifierall [] (_:_) = error "arities"
unifierall (_:_) [] = error "arities"
unifierall (x : xs) (y : ys) =
let sigma = unifier x y
xs’ = map (extend sigma) xs
ys’ = map (extend sigma) ys
tau = unifierall xs’ ys’
in tau # sigma
If σ is the most general unifier of x and y, τ the most general unifier of map σ ∗ xs and map σ ∗ ys
then τ ◦ σ is the most general unifier of x : xs and y : ys. Although the composition of two idempotent
substitutions need not be idempotent (for example, two idempotent substitutions may compose to form
a cycle), τ ◦ σ is, in fact, idempotent.
The full code is given in Program 13.28 on page 123.
Definition 7 (Chain) Let ≺ be a partial order. A chain is a sequence x1 , x2 , x3 , . . . such that if i < j
then xi xj .
119
13 Case study: unification
This particular trick relies on constructing terms with a dummy functor "DUMMY". Of course we would
waste time comparing "DUMMY" with itself, and it is bad programming practice to build a structure just
to take it apart.
13.8 A theorem
In order to answer Question 1 we have had to deviate quite a way into the theory of substitutions and
of well-founded orders. The program that we have developed embodies a proof of this theorem:
Theorem 4 For any two terms t1 and t2 then either t1 and t2 have a most-general, idempotent unifier,
or they have no unifier.
120
13.10 Yet another view of unification
It is no accident that the set of equations (13.4) to (13.6) is the solved set for the set (13.1) to (13.3).
The task of solving a set of equations is just to find replacements for the variables such that the left-
and right-hand sides of each equation are the same. So by writing a unification algorithm we have
written an equation solver.
13.11 Summary
We have developed a unification algorithm in Haskell. We have touched on a number of issues, some
to do with Haskell, and some to do with unification:
• we saw how to represent terms without using mutually inductive types by using lists and sleight-
of-hand;
• we saw how to use higher-order functions to treat substitutions as Haskell functions, saving our-
selves a lot of bookkeeping;
13.12 Questions
1. Write termsmap.
4. Write an accumulator-based unifier :: Substitution -> Term -> Term -> Substitution.
5. Use the Maybe type and write unifier :: Term -> Term -> (Maybe Substitution)
7. Write a unification algorithm for the type of terms in Program 13.27. In this type a term is either
a variable or a constant of is the application of a term to a term. So x, f, f x, (f x)y are legitimate
terms corresponding to our terms x, f, f (x), f (x, y). Now we also have things like xy, (f x)(wz),
which don’t correspond to anything we had before.
121
13 Case study: unification
13.13 Background
Most of the development of this follows [49], which in turn relied on [54]. The type given in Program
13.27 is used in [60]. More details on bar induction can be found in [78]. The principal-type algorithm
is described in [38].
122
13.14 Code summary
idsubst :: Substitution
idsubst = Var
infixr #
(#) :: Substitution -> Substitution -> Substitution
sigma # tau = (extend sigma). tau
unifierall [] [] = idsubst
unifierall [] (_:_) = error "arities"
unifierall (_:_) [] = error "arities"
unifierall (x : xs) (y : ys) =
let sigma = unifier x y
xs’ = map (extend sigma) xs
ys’ = map (extend sigma) ys
tau = unifierall xs’ ys’
in tau # sigma
123
13 Case study: unification
124
14 Case study: unification in
continuation-passing style
14.1 Premable
In Chapter 12 we discussed continuations, and in Chapter 13 we discussed unification. In this Chapter
we present a CPS unification algorithm.
Since substitutions are not in CPS we have a number of ‘direct’ functions on substitutions, exactly as
before:
• mksub :: Var -> Term -> Substitution from Program 13.6 on page 113
• extend :: Substitution -> Term -> Term from Program 13.9 on page 113
The continuation that we supply will be a function which makes use of a substitution, for example
something like: \ s -> map s ["x", "y", "z"].
As we saw in Chapter 12 the order of operations will seem to be inverted when compared to the
direct functions.
125
14 Case study: unification in continuation-passing style
All we are doing here is applying the continuation to the substitution we have constructed.
As we expect cpsoccurs and cpsoccursl must be defined together, and they require us to define
cpsany:
cpsoccs x (Var y) k = k (x == y)
cpsoccs x (Compound _ ys) k = cpsoccsl x ys k
cpsany _ [] k = k False
cpsany f (h:t) k = cpsany f t (\b -> f h (\c -> k(c||b)))
Notice that the test we supply to cpsany is itself in CPS. Compare with any from the prelude:
any _ [] = False
any f (h:t) = (f h) || any f t
126
14.4 Implementing cpsunifierl
cpsunifierl [] [] k = k idsubst
cpsunifierl (_:_) [] k = k (error "arities")
cpsunifierl [] (_:_) k = k (error "arities")
The final clause is, as always, where the hard work is. Given two lists of terms (x:xs) and (y:ys)
and something to do next k we must:
• find the unifier of xs0 and ys0 , call this u, and next
• apply k to the v ◦ u
127
14 Case study: unification in continuation-passing style
cpsmap _ [] k = k []
cpsmap f (h:t) k = cpsmap f t (\r -> k((f h) : r))
14.5 Summary
We presented a CPS version of a unification algorithm.
128
15 Case study: computing principal types
for λ-terms
15.1 Preamble
We will develop a type assignment algorithm in Haskell. We have already mentioned type assignment
in Chapter 11. The rules for type assignment were given in Rule 11.1. We will develop an algorithm
based on these rules. We begin with some definitions.
Definition 9 (Closed term) A λ-term is closed if it contains no occurrences of any free variables.
Definition 11 (Principal type) A principal type for a term is a type for the term. Furthermore every
type for the term is a substitution instance of it.
Theorem 5 (Principal type theorem) Every closed term of the pure λ calculus either has a principal
type, or has no type.
The principal-type algorithm is described in [38]. Haskell’s own type inference is just a slightly more
subtle version of this algorithm.
We give some examples of principal type assignable to λ-terms in § 15.11 on page 140.
129
15 Case study: computing principal types for λ-terms
infixr :->
• we have a variable;
• an abstraction;
• an application term.
Life is quite straightforward: we appeal to some proof theory that we do not discuss and rely on the
fact that if we can assign a type to a term then we can assign a type using the rules in the most obvious
way – we rely on there being principal derivations.
130
15.4 Type assignment itself
idsubst :: Substitution
idsubst = TyVar
infixr #
x:α∈Γ
Γ 7→ x : α
Γ is an environment, i.e. a collection of variable/type pairs. This rule tells us that if x : α is in the
environment then the variable x can be assigned the type α. In this rule α is an arbitrary type.
x : α, Γ 7→ y : β
Γ 7→ λx.y : α → β
This rules tells us that in order to type an abstraction we add a new binding to the environment, and
type the body of the abstraction in the new environment.
131
15 Case study: computing principal types for λ-terms
occurs x (TyVar z) = x == z
occurs x (t1 :-> t2) = (occurs x t1) || occurs x t2
Γ 7→ f : α → β Γ 7→ x : α
Γ 7→ f x : β
This tells us that in order to type an application we should type the term being applied and the term
being applied to.
[f : a → b, x : a] 7→ f : a → b [f : a → b, x : a] 7→ x : a
[f : a → b, x : a] 7→ f x : b
[f : a → b] 7→ λx.f x : a → b
7→ λf x.f x : (a → b) → a → b
132
15.6 Turning these rules into an algorithm
[f : a → a, x : a] 7→ f : a → a [f : a → a, x : a] 7→ x : a
[f : a → a, x : a] 7→ f x : a
[f : a → a] 7→ λx.f x : a → a
7→ λf x.f x : (a → a) → a → a
Program 15.7 implements lookuptype, which takes an environment and a λ-variable, and returns
the type of the variable. Since we are dealing with closed terms, and the environment is added to using
Rule 15.2, every variable we need to check will have a type in the environment.
The type for lookuptype is more general than we need: Var -> [(Var, TyVar)] -> TyVar
would do.
133
15 Case study: computing principal types for λ-terms
unifiers, to match types when required. Later we will see that some of the unification that we are doing
is unnecessary, but for the moment, whenever we need to match types we will unify.
Initially we have an empty context, and the identity substitution. We also have a supply of arbitrary
types. Since we are accumulating a substitution, we will need to return this. We also need to return the
type of the term, and, since we may consume some of our stock of arbitrary types we need to return
the unused part of this.
We are working our way towards the program given in Program 15.8.
princtype tm =
let (_, _, typ) =
printyp idsubst [] (map TyVar (supply ’t’)) tm
in typ
One important point to notice is that we choose only to build up information in the substitution: the
environment will consist of λ-variable/type-variable pairs only.
We pick a new arbitrary type, tv. We retrieve t, the type variable we picked for the variable. The
currently accumulated substitution tells us what new details we have about t, so we find the unifier
of TyVar tv and sigma t. This unifier is not very interesting as one of the type expressions being
unified is a type variable which is sure not to occur in the other. We will return to this observarion later.
Now, we update our accumulated substitution by composing the unifier with the previously accumulated
substitution. Finally, the type we have computed for the variable is given by applying this substitution
to the new arbitrary type that we picked.
We assert, but do not prove, that this is the principal type.
134
15.7 Improving the code
First we compute the principal type of the body, given the current accumulated substitution, and
extending the environment with a binding of a new arbitrary type to the abstracted variable. Next we
make use of the newly accumulated substitution on the arbitrary type we chose for the variable. We
choose a second arbitrary type for the abstraction term itself, and compute the unifier of this with tv1’
:-> mtype. We then use this to update the accumulated substitution and also compute the type we
were interested in.
We assert, but do not prove, that we compute the principal type.
It is crucial that we ensure that we take account of any information we gained when typing l when
we attempt to type m. It is here that we will get the "unification would give infinite type"
error when we try to type λx.xx
We assert, but do not prove, that we compute the principal type.
135
15 Case study: computing principal types for λ-terms
We have just chosen tv from our stock of shiny new type variables, so we know that tv does not
occur in (extend sigma t). Hence we know that the result of this call will be:
Since composition with the identity substitution does nothing, this is just:
We notice that t’ is only used once, and its definition is very small, we we put it in place.
So new we have:
Notice that we would have obtained the same result if we had thought of passing sigma into
trunifier as the accumulated unifier, rather than starting from scratch with the identity substitution.
Again we have eliminated a call to unifier, which is probably conceptually clearer as there is no
unification going on.
136
15.8 Even more improvements
There is nothing much that we can do about the definition of tau, except to feel that we have possibly
built (TyVar tv1) :-> (TyVar tv2) just to take it apart. The only subsequent use of tau is that
it is composed with sigma’, so we might as well exploit the accumulating nature of trunifier, and
define:
We can then use mu in place of tau # sigma’. Similarly we can dispense with delta and just
define:
We have still named some intermediate results, but this code (and that in Programs 15.13 and
15.14) is cleaner than the code we originally wrote.
This is silly. We should instead re-write our code so that we return a triple of the form:
• (gamma, tyvars, v)
:: (Substitution, [TyVars], TyVar)
princtype tm =
let (sigma, _, v) = printyp idsubst [] (supply ’t’) tm
in sigma v
On reflection, the previous definition of princtype in Program 15.8 really is a bit odd looking.
However, it seemed plausible at the time that we wrote it.
The final version of printyp is given in Program 15.17.
137
15 Case study: computing principal types for λ-terms
This code, although much shorter than the code we originally wrote is clearer and easier to under-
stand. If only we could have written this in the first place!
15.9 Tidying up
We are now more-or-less finished, except for some tidying up that we might want to do to the way
results are printed. The code that we have written so far behaves like this:
PT> i
^ x . x :: LTerm
PT> princtype i
t1 -> t1 :: Type
PT> k
^ x y . x :: LTerm
PT> princtype k
t1 -> t3 -> t1 :: Type
PT> one
((^ n m z . z n m) ^ x y . y) (^ x . x) :: LTerm
PT> princtype one
((t18 -> t20 -> t20) -> (t23 -> t23) -> t12) -> t12
:: Type
138
15.10 Summary
PT> one’
^ z . z (^ x y . y) (^ x . x) :: LTerm
PT> princtype one’
((t8 -> t10 -> t10) -> (t13 -> t13) -> t4) -> t4 :: Type
PT> two
(^ n . ((^ n m z . z n m) ^ x y . y) n)
(((^ n m z . z n m) ^ x y . y) (^ x . x)) :: LTerm
PT> princtype two
((t22 -> t24 -> t24) -> (((t45 -> t47 -> t47) ->
(t50 -> t50) -> t39) -> t39) -> t16) -> t16 :: Type
PT> two’
^ z . z (^ x y . y) (^ z . z (^ x y . y) (^ x . x))
:: LTerm
PT> princtype two’
((t8 -> t10 -> t10) -> (((t20 -> t22 -> t22) ->
(t25 -> t25) -> t16) -> t16) -> t4) -> t4 :: Type
These types are correct, but don’t look very pretty. It is also a bit unfortunate that the type of one
and one’ is not obviously the same. We can correct this flaw by defining a way to prettify types, as in
Program 15.18 on the next page.
Now we get:
PT> (prettytypes.princtype) i
a -> a :: Type
PT> (prettytypes.princtype) k
a -> b -> a :: Type
PT> (prettytypes.princtype) one
((a -> b -> b) -> (c -> c) -> d) -> d :: Type
PT> (prettytypes.princtype) one’
((a -> b -> b) -> (c -> c) -> d) -> d :: Type
PT> (prettytypes.princtype) two
((a -> b -> b) -> (((c -> d -> d) ->
(e -> e) -> f) -> f) -> g) -> g :: Type
PT> (prettytypes.princtype) two’
((a -> b -> b) -> (((c -> d -> d) ->
(e -> e) -> f) -> f) -> g) -> g :: Type
And finally, we can ask Haskell for a type for one:
PT> \z -> z (\ x y -> y)(\x -> x)
ERROR - Cannot find "show" function for:
*** Expression : \z -> z (\x y -> y) (\x -> x)
*** Of type : ((a -> b -> b) -> (c -> c) -> d) -> d
And it agrees with us, even down to the names of the type variables.
15.10 Summary
In this note we have developed a type inference algorithm. This algorithm will take a closed λ-term
and return either a principal type for the term, if the term has a type; or the information that the term
has no type.
The algorithm was motivated by considering the rules for type assignment. Its correctness is justified
by appeal to some proof theory that we have not had space to cover. Unification is crucial to the
computation of a principal type.
We could extend this algorithm to deal with any term (not just closed terms). Free variables are
allocated types when we first meet then, and we have to be careful to pass an environment around.
We can also extend the algorithm to deal with other types (e.g. products) if we can give the appro-
priate proof rules, and give similar proof theory to justify their use.
139
15 Case study: computing principal types for λ-terms
prettytypes ty =
let (_, thetype, _) = prettify [] prettynames ty
in thetype
15.11 Examples
λxy.x : a→b→a
λxyz.xz(yz) : (a → b → c) → (a → b) → a → c
λx.x : a→a
λf x.x : a→b→b
λf x.f x : (a → b) → a → b
λf x.f (f x) : (a → a) → a → a
λf x.f (f (f x)) : (a → a) → a → a
λf x.f (f (f (f x))) : (a → a) → a → a
λf x.f (f (f (f (f x)))) : (a → a) → a → a
λf x.f (f (f (f (f (f x))))) : (a → a) → a → a
λf x.f (f (f (f (f (f (f x)))))) : (a → a) → a → a
λf x.f (f (f (f (f (f (f (f x))))))) : (a → a) → a → a
λf x.f (f (f (f (f (f (f (f (f x)))))))) : (a → a) → a → a
λf x.f (f (f (f (f (f (f (f (f (f x))))))))) : (a → a) → a → a
λxypq.xp(ypq) : (a → b → c) → (a → d → b) → a → d → c
λf gx.f (gx) : (a → b) → (c → a) → c → b
λxy.yx : a → (a → b) → b
λf gx.f (gx) : (a → b) → (c → a) → c → b
λnmz.znm : a → b → (a → b → c) → c
λxy.x : a→b→a
λxy.y : a→b→b
λp.p(λxy.x) : ((a → b → a) → c) → c
λp.p(λxy.y) : ((a → b → b) → c) → c
λx.x : a→a
λn.n(λxy.x) : ((a → b → a) → c) → c
Notice that Y and Θ are untypeable.
140
16 Case study: SKI-ing
16.1 Preamble
In this Chapter we will explain how to compile λ-terms to Combinatory Logic terms.
Section 16.2 introduces Combinatory Logic (CL), a theory closely related to the λ-calculus.
Section 16.3 asks you to write a function to reduce CL terms.
Section 16.4 explains how we can mimic abstraction in CL.
Section 16.5 explains how to translate λ-terms to CL.
clunparse (CLVar x) = x
clunparse S = "S"
clunparse K = "K"
clunparse (CLApp l n@(CLApp _ _))
= (clunparse l) ++ bracket(clunparse n)
clunparse (CLApp l n)
= (clunparse l) ++ clunparse n
S =def λxyz.xz(yz)
K =def λxy.x
Pedantic note: S and K are (names of) terms of the λ-calculus, S and K are terms of combinatory
logic.
141
16 Case study: SKI-ing
λxyz.xz(yz) : (a → b → c) → (a → b) → a → c
λxy.x : a → b → a
Instantly, we recognise that if we replace → by ⊃ and write the letters in upper case the types of S
and K can be written:
(A ⊃ B ⊃ C) ⊃ (A ⊃ B) ⊃ A ⊃ C
A⊃B⊃A
`A⊃B `A
MP
`B
And, of course, if we treat the two types as axioms schemata we have, in combination with Modus
Ponens a system which is complete for minimal implicational logic.
Now, we look at the rules that we saw for type checking, and we see the rule:
x:α→β y:α
(→ E)
xy : β
Rule 16.2: → E
S : (γ → (γ → γ) → γ) → (γ → γ → γ) → γ → γ K : γ → (γ → γ) → γ
(→ E)
SK : (γ → γ → γ) → γ → γ K:γ→γ→γ
(→ E)
SKK : γ → γ
` (A ⊃ (A ⊃ A) ⊃ A) ⊃ (A ⊃ A ⊃ A) ⊃ A ⊃ A ` A ⊃ (A ⊃ A) ⊃ A
(⊃ E)
` (A ⊃ A ⊃ A) ⊃ A ⊃ A `A⊃A⊃A
(⊃ E)
`A⊃A
142
16.4 Mimicking abstraction using SKI
So
SKKp
B1 Kp(Kp)
B1 p
Note: In the λ calculus SKK reduces to λx.x; in CL SKK does not simplify.
The CL term SKK does not have an abstraction in it (it is a CL term and abstractions do not figure in
CL terms), and yet it can mimic a λ-term which does have an abstraction. It turns out that this is not a
fluke: we can use S and K to give ourselves the power of λ.
Since the term SKK is just the identity function we can freely extend our theory of CL terms, without
affecting anything very much, by adding a new primitive combinator I, with the reduction rule:
Ix B1 x
16.4.1 λw x.x
We begin with:
λw x.x =def I
Now, λw mimics λ (the abstraction operator of the λ-calculus) in the sense that (λw x.x)M reduces
in a single step to M , and (λx.x)P also reduces in a single step to P . We have only shown that we
can mimic λx.x, and we need to extend the definition.
16.4.3 λw x.U V
Now suppose we have (λx.U V )M . This reduces in a single step to:
• ([M/x]U )([M/x]V )
Now, (λx.U )M reduces in a single step to [M/x]U , so what we have is:
• (λx.U )M ((λx.V )M )
Hence:
• (λx.U V )M = (λx.U )M ((λx.V )M )
Lets switch to looking at S. Recall:
• Sf gy B1 f y(gy)
So:
143
16 Case study: SKI-ing
This tells us that S(λw x.U )(λw x.V )M has the same behaviour as (λx.U V )M , and hence we make
the defintion:
16.4.4 λw
The full definition of λw is:
λw x.x =def I
λw x.P =def KP, x 6∈ FV(P )
w
λ x.U V =def S(λw x.U )(λw x.V )
S(KK)Iuv
B (KK)u(Iu)v
B K(Iu)v
B Iu
B u
16.5.1 Summary
Now we can reduce λ-terms, by compiling them to CL terms and then reducing the CL term. As so
often we have only looked at the first steps. For example there are lots of optimisations that we can
make in both the compilation and reduction phases.
16.5.2 Background
This material is covered in greater detail in [39], [4] and [64]. The treatments in [39] and [4] focus
on technical aspects if λw , and other similar operators. On the other hand, [64] concentrates on
implementation issues. The first practical use of combinators in compiling functional languages was
by David Turner in SASL. The definition of λw goes back at least as far as the late 1950’s.
144
17 Reasoning about functions
At the end of Chapter 11 we touched on the issue of reasoning about programs. In this Chapter we
will continue this work, although not in exactly the same formal framework as we used in Chapter 11.
In the main we will be concerned about reasoning about program correctness, rather than program
efficiency. We will follow the treatment from Chapter 6 of [61].
Reasoning about functions can be difficult and expensive: not reasoning about them can be fatal.
17.1 Preliminaries
We assume that we have the ‘usual’ logical language: ∧, ∨, ¬, ⊃, ∀, ∃.
The specification for a program is, typically, a proposition of the form:
For all inputs there is some output which has such-and-such a property.
where S formalises the relationship between input and output. Although this is the typical form of a
program specification, not all specifications have this form. For example we might specify a parser for
a grammar g as:
(∀s : String)(G(g, s) ∨ ¬G(g, s))
i.e. a parser is a function which takes a string and a grammar and tell whether or not the string is
generated by the grammar. Specifications of this form can be massaged into the ∀∃ form.
If we consider the specification as a proposition we will see that there is (usually) a close connection
between a proof of the proposition and a program which meets the specification. In fact, in cer-
tain logics proofs and programs are just the same thing. Even in logics where we cannot make this
identification there is usually much to be learned about algorithms from studying proofs.
n : Nat
Zero : Nat Succ(n) : Nat
This inductive definition justifies an induction rule, to allow us to prove C(n) for arbitrary n : Nat:
145
17 Reasoning about functions
[C(m)]
·
·
·
C(Zero) C(Succ(m))
C(n)
Paulson [61] calls this rule ‘mathematical induction’. It is probably better called ‘structural induction
on Nat’. Informally we can read it as:
If we can show that C holds of Zero, and we can show that if C holds of m than it holds
of Succ(m) then C holds of arbitrary n.
We can write a similar rules for lists:
[C(t)]
·
·
·
C([]) C(Cons(h, t))
C(l)
We have been a bit cavalier here: we really should say what h, t, and l are. Rule 17.3 bears a
strong resemblance to Rule 11.8 on page 96.
Structural induction is not the only form of induction we can have. We can use total induction, where
we are allowed to assume that C holds for all m < n:
Although total induction looks more powerful than structural induction, this is not the case.
We are assuming here that < is the ‘obvious’ order on Nat, but we can write an induction rule using
any well-founded order ≺. An order is well-founded if every chain is finite, where a chain is a sequence
x1 x2 x3 x4 . . ..
The induction step in the proof corresponds to recursion in the function definition. In the cases of
structural and mathematical induction we do not need to perform a separate proof that our function
terminates, as this is guaranteed by <. When we use well-founded induction the proof that the rela-
tion ≺ is well-founded is exactly the proof that our function terminates. Thus quicksort terminates
because the ordering of lists on length is well-founded.
17.2.2 An example
To illustrate proof by structural induction we prove that every natural number is either even or odd. First
of all we need to clarify what we mean by ‘even’ and ‘odd’. A number n is even if there is a number
m such that n = 2m, and it is odd if there is a number m such that n = 2m + 1.1 So now we are trying
to prove:
(∀n : Nat)(∃m : Nat)(n = 2m ∨ n = 2m + 1)
The first step in the proof is ∀ introduction, leaving us to prove:
(∃m : Nat)(k = 2m ∨ k = 2m + 1)
1 We are assuming that we understand ‘1’, ‘2’, equality, addition and multiplication!
146
17.2 Proof by induction
for arbitrary k. Now we use the induction rule, to give ourselves two sub-problems:
(∃m : Nat)(Zero = 2m ∨ Zero = 2m + 1)
and:
[(∃m : Nat)(p = 2m ∨ p = 2m + 1)]
·
·
·
(∃q : Nat)(Succ(p) = 2q ∨ Succ(p) = 2q + 1)
The base case is easy:
Arithmetic
Zero = 2 ∗ Zero
∨ I left
Zero = 2 ∗ Zero ∨ Zero = 2 ∗ Zero + 1
∃I
(∃m : Nat)(Zero = 2m ∨ Zero = 2m + 1)
[p = 2m ∨ p = 2m + 1] Π1 Π2
∨E
[(∃m : Nat)(p = 2m ∨ p = 2m + 1)] (∃q : Nat)(Succ(p) = 2q ∨ Succ(p) = 2q + 1)
∃E
(∃q : Nat)(Succ(p) = 2q ∨ Succ(p) = 2q + 1)
[p = 2m]
=
Succ(p) = Succ(2m)
Arithmetic
Succ(p) = 2m + 1
∨ I right
Succ(p) = 2m ∨ Succ(p) = 2m + 1
∃I
(∃q : Nat)(Succ(p) = 2q ∨ Succ(p) = 2q + 1)
and:
[p = 2m + 1]
=
Succ(p) = Succ(2m + 1)
Arithmetic
Succ(p) = 2m + 2
Arithmetic
Succ(p) = 2(m + 1)
∨ I left
Succ(p) = 2(m + 1) ∨ Succ(p) = 2(m + 1) + 1
∃I
(∃q : Nat)(Succ(p) = 2q ∨ Succ(p) = 2q + 1)
So these two subproofs complete Proof 17.6. Thus both the base case and the induction step have
been established.
From this proof we observe a number of points:
147
17 Reasoning about functions
• formal proofs can be tedious, and we should use a mechanical proof assistant;
• the structure of the proof reflects the structure of a recursive algorithm which will compute m
where n = 2m ∨ n = 2m + 1;
data OR a b = Inl a
| Inr b
when d e (Inl l) = d l
when d e (Inr r) = e r
evenorodd 0 = Inl 0
evenorodd n = when (\l -> Inr l)
(\r -> Inl (r + 1))
(evenorodd (n - 1))
The first clause is generated from the base case, Proof 17.5. The second clause is generated from
the induction step. The function when arises from the use of the rule for ∨ elimination in Proof 17.6.
The uses of Inr and Inl arise from the uses of ∨ introduction on the right and left in the sub-proofs
Π1 and Π2 respectively. The call to evenorodd (n - 1) corresponds to the assumption made in the
induction step.
evenorodd behaves like this:
Gofer?
evenorodd 0
Inl 0 :: OR Int Int
Gofer?
evenorodd 1
Inr 0 :: OR Int Int
Gofer?
evenorodd 13
Inr 6 :: OR Int Int
Thus the algorithm does not merely tell us that the number is even or odd, but also provides us with
a witness. In this case we are probably not terribly bothered about the witness, but, for example, in the
case of a parser the witness is the parse tree for the string, or the error message that informs us that
the string is not generated by the grammar.
In the current case we might look at the function we have constructed and decide to throw away the
witnessing information. We might decide that evenorodd should only return either Inl "even" or
Inr "odd". Thinking for a little bit allows us to see that we are returning one or other of two distinct
values. We already have a perfectly good type with two distinct values in it, called Bool, so let’s use it.
Now our program will look like:
148
17.3 Summary
evenorodd 0 = True
evenorodd n = if (evenorodd (n - 1))
then False
else True
The second clause of evenorodd is a clumsy way to write not (evenorodd (n - 1)), so we
have now:
evenorodd 0 = True
evenorodd n = not (evenorodd (n - 1))
17.3 Summary
In this Chapter we presented a quick introduction to reasoning about functional programs. The steps
involved were:
• a problem was expressed informally;
• this was formalised as a specification, which we treated as a proposition to prove;
• we proved the proposition, using induction;
• we inspected the proof, and from it extracted an algorithm;
• we performed some transformations on this algorithm, mostly to throw some information away
the we did not consider important.
149
17 Reasoning about functions
150
[p = 2m + 1]3
=
[p = 2m]3 Succ(p) = Succ(2m + 1)
= Arithmetic
Succ(p) = Succ(2m) Succ(p) = 2m + 2
Arithmetic Arithmetic
Succ(p) = 2m + 1 Succ(p) = 2(m + 1)
∨ I right ∨ I left
Succ(p) = 2m ∨ Succ(p) = 2m + 1 Succ(p) = 2(m + 1) ∨ Succ(p) = 2(m + 1) + 1
Arithmetic ∃I ∃I
Zero = 2 ∗ Zero [p = 2m ∨ p = 2m + 1]2 (∃q : Nat)(Succ(p) = 2q ∨ Succ(p) = 2q + 1) (∃q : Nat)(Succ(p) = 2q ∨ Succ(p) = 2q + 1)
∨ I left ∨ E3
Zero = 2 ∗ Zero ∨ Zero = 2 ∗ Zero + 1 [(∃m : Nat)(p = 2m ∨ p = 2m + 1)]1 (∃q : Nat)(Succ(p) = 2q ∨ Succ(p) = 2q + 1)
∃I ∃ E2
(∃m : Nat)(Zero = 2m ∨ Zero = 2m + 1) (∃q : Nat)(Succ(p) = 2q ∨ Succ(p) = 2q + 1) 1
151
17.3 Summary
17 Reasoning about functions
152
18 Monads
In this Chapter we will look at monads in functional programming. The notion of a monad1 comes to
us from category theory (see, for example, [46]) where they are also known as ‘standard constructions’
or ‘triples’. The precise definition of a monad that computer scientists use is very slightly different to that
employed by category theorists. Monads escaped from category theory into computer science via the
work of Eugenio Moggi, an Italian then working at Edinburgh University. Monads were then adopted
with enthusiasm by Phil Wadler (see, for example, [83, 82]), an American then working at Glasgow
University.
• a functor3 T ;
• a natural transformation µ : T 2 → T
µ ◦ Tη = 1T
1T = µ ◦ ηT
µ ◦ µT = µ ◦ Tµ
So, a monad is functor and two natural transformations which obey some laws. The relevance of
this notion to computer science is not immediately obvious.
The functor T will turn out to be a type, and the natural transformations η and µ will turn out to be
functions. The monad laws will turn out to be properties of the functions.
The insight that Moggi had was that a monad allows is to treat computations as things to manipulate.
At this point we decide to omit the details of the original path from category theory to computer
science, and turn immediately to a computer scientist’s view of monads.
153
18 Monads
18.2.2 IO
The basic intuition behind the IO types is that a value of IO a is an action (a program) which performs
some I/O and returns a value of type a.
The type ()
Haskell has a type () which has exactly one value, confusingly also called (). As a datatype () is
pretty impoverished as there is not much data in the value (). However a value of type IO () is an
I/O action which returns only () as its value.
Reading input
We can read input using the standard functions getLine :: IO String and getChar :: IO
Char. These behave as we would expect them to.
Writing output
We can write output using putStr :: String -> IO ().
Recall that Haskell provides a class Show, and show is defined as:
return
There is a builtin function return :: a -> IO a, which does no I/O, but simply returns a value
of a. return x simply returns x. This is not quite pointless, as we will see later.
154
18.2 A computer scientist’s view of monads
The do notation
We now introduce the do notation. do allows us to compose I/O actions is a way very reminiscent of
writing imperative programs.4
We can use do to put a string on a line:
<-
We can use <- to name the result of an I/O action. For example we can write a program to echo lines:
echo :: IO ()
We can think of the process of naming a form of assignment. Crucially <- only permits single
assignment: this is not the same as the destructive assignment that we (you!) are familiar with from
imperative programming.
»= and »
We can define the function do using an operator »= :: IO a -> (a -> IO b) -> IO b.5 The
operator »= is a sequencing operator. it is called bind. It passes the result of the first operation to the
second one. We can use »= in place of do:
echo2 :: IO ()
155
18 Monads
The do notation is often more convenient to use, but we can think of it as being implemented by »=.
We can define the operator (») :: IO a -> IO b -> IO b as:
Program 18.7: »
The effect of » is to throw the value returned by its first operand away.
Further discussion
For a much fuller treatment of I/O in functional programming see [31].
This defines a class of type constructors. The default definition of » is as in Program 18.7, and that
for fail is:
156
18.3 Examples of monads
fail s = error s
We still have to present the monad laws. These are most easily expressed using an derived compo-
sition operator >@>, called Kleisli composition:
return >@> f = f
f >@> return = f
When we declare a monad we have to prove the monad laws for ourselves – Haskell is incapable of
checking these automatically.
x >>= f = f x
return = id
157
18 Monads
Program 18.13: »=, return and fail for the list monad
mparse (MParser p) = p
Program 18.14: »=, return and fail for the MParser monad
Program 18.15: »=, return and fail for the Maybe monad
158
18.4 Summary
18.4 Summary
In this Chapter we have introduced and discussed monads from two perspectives: as a concept from
category theory, and as a way to generalise the treatment of I/O to the structuring of other computa-
tions.
159
18 Monads
160
19 Monad example: an evalautor
19.1 Introduction
We build an evaluator, following Ch 10 of [6]
Bird’s example illustrates how we can use monads to help structure computations.
eval (Con x) = x
eval (Div n d) = (eval n) ‘div‘ (eval d)
Birdeval> eval (Div (Div (Con 66) (Con 3)) (Con 11))
2
Birdeval> eval (Div (Con 1) (Con 0))
We will follow Bird and add extra functionality to our evaluator, first directly, and second by using a
monad.
We will add:
• exception handling
data Error a = OK a
| Error Mess
161
19 Monad example: an evalautor
Birdeval> eval (Div (Div (Con 66) (Con 3)) (Con 11))
ERROR - Cannot find "show" function for:
*** Expression : eval (Div (Div (Con 66) (Con 3)) (Con 11))
*** Of type : Exc Int
Birdeval> eval (Div (Div (Con 66) (Con 3)) (Con 11))
Value: 2
Birdeval> eval (Div (Con 1) (Con 0))
Exception: division by 0
Birdeval> eval (Div (Div (Con 12) (Con 0)) (Con 99))
Exception: division by 0
162
19.2 Without monads
A value of St a is a state transformer, i.e. a function which takes a state and returns a value paired
with a (new) state. In order to make use of a value of type St a we define
applyst (MkSt f) s = f s
Birdeval> eval (Div (Div (Con 66) (Con 3)) (Con 11))
Value: 2 Count: 2
Birdeval> eval (Div (Div (Con 12) (Con 0)) (Con 99))
Value:
Program error: {primDivInt 12 0}
163
19 Monad example: an evalautor
Then we define a function format some output (after remembering to declare Term as an instance
of Show):
cr = "\n"
164
19.3 Using monads
newtype Id a = MkId a
evalID = eval
Birdeval> evalID (Div (Div (Con 66) (Con 3)) (Con 11))
Value: 2
raise = Raise
In this case raise is a trivial function, but for other monads we may define less trivial functions.
Now we need to define a new monadic evaluator, evalEx :: Term -> Exc Int
165
19 Monad example: an evalautor
Birdeval> evalEx (Div (Div (Con 66) (Con 0)) (Con 11))
Exception: division by 0
Birdeval> evalEx (Div (Div (Con 66) (Con 3)) (Con 11))
Value: 2
Birdeval> evalEx (Div (Div (Con 66) (Con 3)) (Con 0))
Exception: division by 0
Next we define a function specific to the state monad, which just increments a counter:
tick :: St ()
-- Bird has
-- tick = MkSt f where f s = ((), s + 1)
166
19.3 Using monads
or
Birdeval> evalSt (Div (Div (Con 66) (Con 3)) (Con 11))
Value: 2 Count: 2
167
19 Monad example: an evalautor
evalOut (Con i) =
do
out (line (Con i) i)
return i
evalOut (Div n d) =
do
x <- evalOut n
y <- evalOut d
out (line (Div n d) (x ‘div‘ y))
return (x ‘div‘ y)
evalOut (Con i) =
out (line (Con i) i) >>
return i
evalOut (Div n d) =
evalOut n >>= (\x ->
evalOut d >>= (\y ->
out (line (Div n d) (x ‘div‘ y)) >>
return (x ‘div‘ y)))
19.4 Summary
We have used monads to implement different variants of an evaluator.
168
20 Worked Example : writing an
evaluator for λ terms
20.1 β reduction
For the pure, untyped λβ -calculus we have the following (big step) reduction rules:
x Bβ x
M Bβ M 0
λx.M Bβ λx.M 0
M Bβ P Q N Bβ N 0
M N Bβ P QN 0
M Bβ x N Bβ N 0
M N Bβ xN 0
M Bβ λx.P [N/x]P Bβ P 0
M N Bβ P 0
169
20 Worked Example : writing an evaluator for λ terms
We will write a function called reduce :: Term a -> Term a. We expect that reduce will
have three clauses, and that the third clause will itself be analysed into three cases, which we will
handle by using an auxiliary function red’ :: (Term a) -> (Term a) -> Term a.
The function red’ lets us deal with the case of reducing a possible redex.
[N/x]x −→ N
x 6= y
[N/x]y −→ y
[N/x]P −→ P 0 [N/x]Q −→ Q0
[N/x]P Q −→ P 0 Q0
[N/x]λx.P −→ λx.P
170
20.4 Solution 1: following our noses
[N/x]P −→ P 0
x 6= y, y ∈
/ FV (N )
[N/x]λy.P −→ λy.P 0
[z/y]P −→ P 0 [N/x]P 0 −→ P 00
x 6= y, y ∈ FV (N ), z fresh
[N/x]λy.P −→ λz.P 00
subst n x (Var y)
| x == y = n
| otherwise = (Var y)
subst n x (App p q) = App (subst n x p) (subst n x q)
subst n x (Abs y p)
| x == y = (Abs y p)
| not (y ‘freein‘ n) = Abs y (subst n x p)
| otherwise = ?????
The red function has to return a pair consisting of a stock of fresh variables and a term. This function
is a lot more readable is we make use of lets. There is now a lot of bookkeeping involved in our code:
only in one place in the substitution function do we actually make use of the stock of fresh variables.
171
20 Worked Example : writing an evaluator for λ terms
red’ :: Eq a => [a] -> Term a -> Term a -> ([a], Term a)
red’ frees n (Var x) = let (frees’, n’) = red frees n
in
(frees’, App (Var x) n’)
red’ frees n (App p q) = let (frees’, n’) = red frees n
in
(frees’, App (App p q) n’)
red’ frees n (Abs x p) =
let (frees’, p’) = subst frees n x p
in red frees’ p’
We could, of course, have used Eq a => ([a], Term a) -> ([a] , Term a) for the type
of red.
172
20.5 Solution 2: Use CPS
20.4.1 Comments
This code works, but lots of it are just concerned with the passing around of the stock of fresh variables.
As we might have anticipated, only in the subst function do we make use of the stock of variables.
We have also made very extensive use of lets, to aid program readability.
We do a little bit of re-arrangement of the arguments to get the type of cpssubst to look like we
want it to.
173
20 Worked Example : writing an evaluator for λ terms
20.5.1 Comments
This code suffers from a real burden of notation. We could, perhaps, have improved readability by
using let expression to get code like program 20.15. Once again we have spent a lot of syntax on
something which gets passed around unaltered for most of the code. The structure of this code is,
basically, what we will get for the monad-based code. The big win of the monad-based code is that
we almost never need to mention the state, once we have wrapped it up in the monad.
174
20.6 Solution 3: Or, we could use a monad
In Chapter 19 we saw a simple evaluator which used a state variable to count the number of divisions it
had performed. We can use the same technique to deal with the stock of fresh variables. For simplicity
we will just use an integer for the state. The code which deals with setting the monad up is:
applyst (MkSt f) s = f s
One significant difference is that red has a different type from before. For each of the function we
will give a version using bind (»=), and then a version defined using the do notation.
175
20 Worked Example : writing an evaluator for λ terms
red’ (Var x) n =
red n >>= (\n’ ->
return (App (Var x) n’))
red’ (App p q) n =
red n >>= (\n’ ->
return (App (App p q) n’))
red’ (Abs x p) n =
subst n x p >>= (\p’ ->
red p’)
red’ (Var x) n = do
n’ <- red n
return (App (Var x) n’)
red’ (App p q) n = do
n’ <- red n
return (App (App p q) n’)
red’ (Abs x p) n = do
p’ <- subst n x p
red p’
176
20.6 Solution 3: Or, we could use a monad
subst n x (Var y)
| x == y = return n
| otherwise = return (Var y)
subst n x (App p q) =
subst n x p >>= (\p’ ->
subst n x q >>= (\q’ ->
return (App p’ q’)))
subst n x (Abs y p)
| x == y = return (Abs y p)
| not (y ‘freein‘ n) =
subst n x p >>= (\p’ ->
return (Abs y p’))
| otherwise =
fresh >>= (\z ->
subst (Var z) y p >>= (\p’ ->
subst n x p’ >>= (\p’’ ->
return (Abs z p’’))))
subst n x (Var y)
| x == y = return n
| otherwise = return (Var y)
subst n x (App p q) = do
p’ <- subst n x p
q’ <- subst n x q
return (App p’ q’)
subst n x (Abs y p)
| x == y = return (Abs y p)
| not (y ‘freein‘ n) = do
p’ <- subst n x p
return (Abs y p’)
| otherwise = do
z <- fresh
p’ <- subst (Var z) y p
p’’ <- subst n x p’
return (Abs z p’’)
20.6.1 Comments
For the monad-based code there is certainly a burden of setting up the monad, and we have been
less abstract (any type with a next function defined on it should have been suitable for the state) than
with the other definitions, but there can be little doubt that the code written with the do notation is very
much closer to Programs 20.2, 20.3, 20.4 than any of the other code is. This must count as a very
strong argument in favour of monads.
177
20 Worked Example : writing an evaluator for λ terms
178
21 An SECD machine
21.1 Introduction
This Chapter presents some example code to implement a version of the SECD machine of Landin
[47]. The code is in 3 modules:
D the dump, where we can copy the current state of the machine
The operation of the SECD machine can be given in terms of a while loop. If there is no more work
to do we stop, and return the result on the stack; otherwise we inspect the top of the control stack to
see what we must push onto and pop from the various stacks to make progress.
21.2.1 Evaluation
while not finished(S, E, C, D) do
{if empty(C)
then do
{resume(S, E, C, D)}
else do
{ case top(C) of
...
}
}
179
21 An SECD machine
21.4 Imports
import Stack
import Terms
21.5 Functions
We start execution by calling load of a term. The term gets loaded, and evaluation begins.
The evaluation function inspects the control stack. If it is empty, and the dump is also empty then
we are finished, and we stop and return a value. If the dump is not empty, we resume the suspended
computation. If the control stack is not empty we pop it and proceed as appropriate.
180
21.5 Functions
When we must resume, the result stack will have a single value in it. We will crash with a pattern-
matching error if the result stack is not a singleton. We resume by putting the value on the top of the
stack that we moved to the dump, and restoring the old environment, control stack and dump.
If there is still work to do we look at the item on top of the control stack, and take the appropriate
action. All of these auxiliary functions are designed to give a pattern-matching error if computation
cannot proceed. The general pattern is as follows:
• if we have a term, we break the term up and push items onto the stacks. If the term is evaluable
we will leave a marker on the control stack to indicate that, when we reach it again, we can
perform some evaluation;
• if we have a marker we perform the appropriate action, such as making a primitive function call.
181
21 An SECD machine
182
21.6 Some miscellaneous definitions
--add s, k, i, theta
mktm op n m = PrimBinTm op n m
183
21 An SECD machine
(Abs "p"
(Abs "q" (App (App (Var "x") (Var "p"))
(App (App (Var "y") (Var "p")) (Var "q"))))))
ch2int ch = App (App ch (Abs "v" (PrimBinTm Plus (Var "v") one)))
zero
module Terms
(parsetm,
Var,
NumVal,
TruVal,
Value(NumVal, TruVal),
UnOp(Negative, Not, IsZero),
BinOp(Plus, Minus, Times, Divide, Remainder),
Term(Const, Var, PrimUnTm, PrimBinTm, ITE, Abs, App)
)
where
import ParserLib
21.8 Types
First some synonyms:
184
21.9 Functions
21.9 Functions
21.9.1 Top-level call to be exported
Then Not:
Then IsZero:
185
21 An SECD machine
186
21.9 Functions
name,
plussym, minussym, timessym, divsym, modsym, negativesym,
truesym, falsesym, notsym, isZerosym,
ifsym, thensym, elsesym,
lambda, dot, bra, ket :: Parser Char String
name = sp (first (plus (satisfy ( \ c -> (’a’ <= c && c <= ’z’ )))))
187
21 An SECD machine
188
21.10 Module: Stack
k :: a-> b -> a
k x y = x
module Stack
( Stack,
emptyStack,
isEmptyStack,
push,
pop
) where
emptyStack = St []
189
21 An SECD machine
190
Bibliography
[1] Hassan Aït-Kassi. Warren’s Abstract Machine: A Tutorial Reconstruction. MIT Press, Cam-
bridge, Massachusetts, USA, 1991. Also available from http://www.isg.sfu.ca/~hak/
documents/wam.html.
[2] Andrew Appel. Compiling with Continuations. Cambridge University Press, Cambridge, England,
1992.
[3] Franz Baader and Tobias Nipkow. Term Rewriting and All That. Cambridge University Press,
Cambridge, England, 1998.
[4] Henk Barendregt. The Lambda Calculus Its Syntax and Semantics, volume 103 of Studies in
Logic and the Foundations of Mathematics. North-Holland, Amsterdam, The Netherlands, revised
edition, 1984.
[5] Henk Barendregt. The impact of the lambda calculus in logic and computer science. Bulletin of
Symbolic Logic, 3(2):181–215, 1997. Also available at http://www.math.ucla.edu/~asl/
bsl/0302/0302-003.ps.
[6] Richard Bird. Introduction to Functional Programing using Hasekll. Prentice-Hall, second edition,
1998.
[7] Ivan Bratko. Prolog Programming for Artificial Intelligence. International Computer Science Series.
Addison-Wesley, Wokingham, England, 1986.
[8] Luca Cardelli and Peter Wegner. On understanding types, data abstraction, and polymor-
phism. Computing Surveys, 17(4):471–522, 1985. Available from http://www.research.
microsoft.com/Users/luca/Papers/OnUnderstanding.A4.ps.
[9] Alonzo Church. The Calculi of Lambda Conversion. Princeton University Press, Princeton, NJ, USA,
1941.
[10] Alonzo Church. Introduction to Mathematical Logic, volume 1. Princeton University Press, Prince-
ton, New Jersey, USA, second, enlarged edition, 1956.
[11] W F Clocksin. Clause and Effect Prolog Programming for the Working Programmer. Springer-
Verlag, Berlin, 1997.
[12] W F Clocksin and C S Mellish. Programming in Prolog. Springer-Verlag, Berlin, third, revised and
extended edition, 1987.
[13] Olivier Danvy and Andrej Filinski. Representing control a study of cps transformation. Mathemat-
ical Structures in Computer Science, 1992. Also Tech Report CIS-91-2, Kansas State University.
[14] Anthony J T Davie. An Introduction to Functional Programming Systems Using Haskell, volume 27
of Cambridge Computer Science Texts. Cambridge University Press, Cambridge, England, 1992.
[15] Anthony J T Davie and Ronald Morrison. Recursive Descent Compilng. Ellis-Horwod, 1981.
[16] Martin Davis, editor. The Undecidable. Raven Press, Hewlett, New York, USA, 1965.
[17] N G de Bruijn. Lambda calculus notation with nameless dummies, a tool for automatic formula
manipulation. Indag. Math., 34:381–392, 1972.
[18] N G de Bruijn. A survey of the project AUTOMATH. In J Roger Hindley and Jonathan P Seldin,
editors, To H. B. Curry: Essays in Combinatory Logic, Lambda-Calculus and Formalism, pages
579–607. Academic Press, New York, NY, USA, 1980.
191
Bibliography
[19] Phillipe de Groote. The Curry-Howard Isomorphism, volume 8 of Cahiers du Centre de Logique.
Academia, Louvain-la-Neuve, Belgium, 1995.
[20] Paul de Mast, Jan-Marten Jansen, Dick Bruin, Jeroen Fokker, Pieter Koopman, Sjaak Smetsers,
Marko van Eekelen, and Rinus Plasmeijer. Functional Programming in Clean. Unpublished draft,
2000. Available from http://www.cs.kun.nl/~clean/Manuals/Clean_Book/clean_
book.html.
[21] Edsger Wybe Dijkstra. A Primer of ALGOL 60 Programming. APIC Studies in Data Processing.
Academic Press, London, England, 1962.
[22] Richard P Draves, Brian N Bershad, Richard F Rashid, and Randall W Dean. Using continua-
tions to implement thread management and communication in operating systems. In 13th ACM
Symposium on Operating Systems Principles, pages 122–136. ACM Press, 1991.
[23] Michael J Fischer. Lambda calculus schemata. Sigplan Notices, 7:104–109, 1972.
[24] Cormac Flanagan, Amr Sabry, Bruce F Duba, and Matthias Felleisen. The essence of compiling
with continuations. In Conference on Programming Language Design and Implementation, 1993.
[25] Jeroen Fokker. Functional parsers. In Johan Jeuring and Erik Meijer, editors, Advanced Functional
Programming, Tutorial Text of the First International Spring School on Advanced Functional Pro-
gramming Techniques, volume 925 of Lecture Notes in Computer Science, pages 1–23. Springer,
1995. Also avalaible from http://www.cs.uu.nl/staff/IDX/sds.html.
[26] Harvey Friedman. Classically and intuitionistically provably recursive functions. In G.H. Müller
and D. S. Scott, editors, Higher Set Theory, pages 21–27. Springer, 1977.
[27] Gerhard Gentzen. Investigations into logical deduction. In M E Szabo, editor, The Collected Papers
of Gerhard Gentzen, pages 68–131. North-Holland, Amsterdam, The Netherlands, 1969.
[28] Carlo Ghezzi and Mehdi Jazayeri. Programming Langauge Concepts. John Wiley & Sons, New
York, NY, USA, third edition, 1998.
[29] Hugh Glaser, Chris Hankin, and David Till. Principles of Functional Programming. Prentice/Hall
International, London, England, 1984.
[30] Kurt Gödel. On formally undecidable propositions of Principia Mathematica and related systems
i. In Davis [16], pages 5–38. Originally published in German as Uber formale unentschiedbare
Sätze der Principia Mathematica und verwandter Systeme I Monatschefte für Mathematik und
Physik, Vol. 38 pp. 173–198, 1931.
[31] Andrew Donald Gordon. Functional Programming and Input/Output. PhD, University of Cam-
bridge, 1992. Also Technical Report 285, Cambridge University Computer Laboratory, and pub-
lished as a Distinguished Dissertation in Computer Science, Cambridge University Press.
[32] J S Green. ALGOL Programming for KDF9. An English Electric Leo mini-manual. English Electric-
Leo Computers Ltd, Stoke-on-Trent, England, 1963.
[33] Timothy G Griffin. A formulae-as-types notion of control. In Seventeenth Annual ACM Symposium
on Principles of Programming Languages (POPL 17), pages 47–58. ACM Press, 1990.
[34] John Hatcliff and Olivier Danvy. A generic account of continuation-passing styles. In Popl 94
: ACM Symposium on Principles of Programming Languages, pages 458–471. Association for
Computing Machinery, 1994.
[35] David Hilbert and W Ackermann. Principles of Mathematical Logic. Chelsea Publishing Company,
New York, USA, 1950. A translation, with corrections, of the second German edition of Grundzüge
der Theoretishcen logik of 1938.
[36] David Hilbert and Paul Bernays. Grundlagen der Mathematik, volume 1. Springer, Berlin, Ger-
many, 1934. In German.
192
Bibliography
[37] J Roger Hindley. The principal type-scheme of an object in combinatory logic. Transactions of the
American Mathematical Society, 146(12):29–60, 1969.
[38] J Roger Hindley. Basic Simple Type Theory, volume 42 of Cambridge Tracts In Theoretical Computer
Science. Cambridge University Press, Cambridge, 1997.
[39] J Roger Hindley and Jonathan P Seldin. Introduction to Combinators and λ-Calculus, volume 1
of London Mathematical Society Student Texts. Cambridge University Press, Cambridge, England,
1986.
[40] Paul Hudak. The Haskell School of Expression Learning Functional Programming Through Multi-
media. Cambridge University Press, Cambridge, England, 2000.
[41] John Hughes. Why functional programing matters. In David Turner, editor, Research Topics in
Functional Programming. Addison-Wesley, Reading, Masachussets, USA, 1990. Also appeared in
The Computer Journal 32(2), 1989.
[42] Graham Hutton. Higher-order functions for parsing. Journal of Functional Programming,
2(3):323–343, 1992. Also available from http://www.cs.nott.ac.uk/~gmh/bib.html#
parsing.
[43] Richard Kelsey, William Clinger, and Jonathon Rees. Revised5 report on the algorithmic language
scheme. Journal of Higher Order and Symbolic Computation, 11(1):7 –105, 1998. Also appears
in ACM SIGPLAN Notices 33(9), September 1998.
[44] Brian W Kernighan and Dennis M Ritchie. The C Programming Language. Prentice Hall Software
Series. Prentice Hall, Englewood Cliffs, New Jersey, USA, second edition, 1988.
[45] Stephen C Kleene. General recursive functions of natural numbers. In Davis [16], pages 237–
253. Originally published in Mathematische Annalen 112(5):727–742, 1936.
[46] J Lambek and P J Scott. Introduction to higher order categorical logic, volume 7 of Cambridge
Studies in Advanced Mathematics. Cambridge University Press, Cambridge, England, 1986.
[47] P J Landin. The mechanical evaluation of expressions. The Computer Journal, 6:308–320, 1964.
[48] Leonardo of Pisa (Fibonacci). Liber Abacci. 2 edition, 1228. In Latin.
[49] Neil Leslie. Specification And Implementation Of A Unification Algorithm In Martin-Löf’s Type
Theory. MSc, St Andrews, 1993.
[50] Neil Leslie. Continuations and Martin-Löf’s Type Theory. Ph.D, Massey Unviersity, 2000.
[51] Tim Lindholm and Frank Yellin. The Java Virtual Machine Specification. Java series. Addison-
Wesley, second edition, 1999. Available from http://java.sun.com/docs/books/
vmspec/index.html.
[52] Kenneth C Louden. Programming Languages Principles and Practice. PWS-KENT Series in Com-
puter Science. PWS Publishing Company, Boston, Massachusetts, USA, 1993.
[53] John McCarthy. History of LISP. In Wexelblat [85], pages 173–197.
[54] Zohar Manna and Richard Waldinger. Deductive synthesis of the unification algorithm. Science
of Computer Programming, 1:5–48, 1981.
[55] Per Martin-Löf. Intuitionistic Type Theory, volume 1 of Studies in Proof Theory Lecture Notes.
Bibliopolis, Napoli, Italy, 1984. Notes taken by Giovanni Sambin from a series of lectures given
in Padua, June 1980.
[56] Robin Milner. A theory of type polymorphism in programming. Journal of Computer and System
Sciences, 17(3):348–375, 1978.
[57] Chetan R Murthy. Extracting Constructive Content From Classical Proofs. PhD, Cornell University,
1990. Also available as TR 89-1151 from Dept. of Computer Science, Cornell University.
193
Bibliography
[58] Bengt Nordström, Kent Petersson, and Jan M Smith. Programming in Martin-Löf’s Type Theory An
Introduction. Clarendon Press, Oxford, England, 1990.
[59] Chris Okasaki. Purely Functional Data Structures. Cambridge University Press, Cambridge, Eng-
land, 1998.
[60] Larry Paulson. Verifying the unification algorithm in lcf. Science of Computer Programming,
5:143–169, 1985.
[61] Laurence C Paulson. ML for the Working Programmer. Cambridge University Press, Cambridge,
England, 1991.
[62] Alan J Perlis. The american side of the development of ALGOL. In Wexelblat [85], pages 75–91.
[63] Rózsa Péter. Recursive Functions in Computer Theory. The Ellis Horwood Series in Computers and
Their Applications. Ellis Horwood, Chichester, England, 1981. Originally published in German
as Rekursive Funktionen in der Komputer-Theorie, 1977.
[64] Simon L Peyton Jones. The Implementation of Functional Programming Languages. Prentice-Hall
International Series on Computer Science. Prentice-Hall, Inc., Englewood Cliffs, New Jersey, USA,
1987.
[65] Rinus Plasmeijer and Marko van Eekelen. Functional Programming and Parallel Graph Rewriting.
International Computer Science Series. Addison-Wesley, Wokingham, England, 1993.
[66] Gordon D Plotkin. Call-by-name, call-by-value and the λ-calculus. Theoretical Computer Science,
1:125–159, 1975.
[67] Emil L Post. Finite combinatory processes formulation i. In Davis [16], pages 288–292. Originally
published in The Journal of Symbolic Logic 1:103–105, 1936.
[68] Stuart Russell and Peter Norvig. Artificial Intelligence A Modern Approach. Prentice Hall Studies
in Artificial Intelligence. Prentice Hall, Upper Saddle River, New Jersey, USA, 1995.
[69] Giovanni Sambin and Jan Smith, editors. Twenty-Five Years of Constructive Type Theory, vol-
ume 36 of Oxford Logic Guides. Clarendon Press, Oxford, England, 1998.
[70] David A Schmidt. Denotational Semantics A Methodology for Language Development. Wm. C.
Brown, Dubuque, Iowa, USA, 1986.
[71] Helmut Schwichtenberg. Proofs, lambda terms and control operators. In Helmut Schwichtenberg,
editor, Logic of Computation, pages 309–348. Springer, Heidelberg, Germany, 1997. Proceed-
ings of the NATO Advanced Study Institute on Logic of Computation, held at Marktoberdorf,
Germany, July 25 – August 6, 1995.
[72] Brian J. Shelburne and Christopher P.Burton. Early programs on the manchester mark i prototype.
IEEE Annals of the History of Computing, 20(3):4–15, July-September 1998. See also http:
//www.computer50.org.
[73] J C Shepherson and H E Sturgis. Computabilty of recursive functions. Journal of the Association
for Computing Machinery, 10:217–255, 1963.
[75] Hayo Thielecke. Categorical Structure of Continuation Passing Style. PhD, University of Edinburgh,
1997. Also available as technical report ECS-LFCS-97-376.
[76] Simon Thompson. Type Theory and Functional Programming. International Computer Science
Series. Addison-Wesley, Wokingham, England, 1991.
194
Bibliography
[77] Simon Thompson. Haskell The Craft of Functional Programming. International Computer Science
Series. Addison-Wesley, Wokingham, England, second edition, 1999.
[78] Anne Sjerp Troelstra and Dirk van Dalen. Constructivism in Mathematics An Introduction Volume 1,
volume 121 of Studies in Logic and the Foundations of Mathematics. North-Holland, Amsterdam,
The Netherlands, revised edition, 1988.
[79] Alan Turing. On computable numbers, with an application to the Entscheidungsproblem. In Davis
[16], pages 115–153. Originaly published in Proceedings of the London Mathematical Society
Series 2 vol. 42 1936-1937 pp. 230–265.
[80] Nikolai Nikolaevich Vorobev. The Fibonacci Numbers. Heath, Boston, Massachusetts, USA, 1963.
Translation of Chisla Fibonachchi, published in Russian, 1951.
[81] Philip Wadler. How to replace failure by a list of successes. In 2 nd International Conference on
Functional Programming Languages and Computer Architecture. Springer-Verlag, 1985.
[82] Philip Wadler. The essence of functional programming. In 19’th Symposium on Principles of
Programming Languages. ACM Press, 1992.
[83] Philip Wadler. Monads for functional programming. In M Broy, editor, Marktoberdorf
Summer School on Program Design Calculi, volume 118 of NATO ASI Series F: Computer
and systems sciences, Berlin, Germany, 1992. Springer. This paper is available from
http://www.cs.bell-labs.com/who/wadler/topics/monads.html.
[84] Philip Wadler and Stephen Blott. How to make ad-hoc polymorphism less ad hoc. In 16th
Symposium on Principles of Programming Languages. ACM Press, 1989. Also available from
http://cm.bell-labs.com/who/wadler/papers/class/class.ps.gz.
[85] Richard L Wexelblat, editor. History of Programming Languages. ACM Monograph. Academic
Press, New York, New York, USA, 1981.
[86] F.C. Williams and T. Kilburn. Electronic digital computers. Nature, 162:487, September 1948.
See also http://www.computer50.org.
195