Lambda PDF
Lambda PDF
-reduce.
Normal Order. Under this strategy, the leftmost outermost redex is always reduced rst.
The term above would be reduced as follows
id(id(z.idz))
id(z.idz)
z.idz
z.z
Normal order is a normalising strategy in the sense that if a term has a normal form, this strategy
will nd it.
Applicative Order. Under this strategy, the leftmost innermost redex is always reduced rst.
The term above would be reduced as follows:
id(id(z.idz))
id(z.idz)
z.idz
z.z
In contrast with normal order, under this strategy arguments are evaluated before being passed
to functions. It turns out that applicative order is not a normalising strategy. For example,
suppose that M does not have a normal form (that is, reducing M does not terminate). In that
case, applicative order evaluation of (x.y.y)M will never nd the normal form of the term
(y.y).
2 THE UNTYPED LAMBDA CALCULUS 6
Call-by-Name. This is a restriction of normal order, where no reductions are allowed inside
abstractions. For our example term, the reduction is the same but stops earlier:
id(id(z.idz))
id(z.idz)
z.idz
Algol-60 introduced call-by-name parameter passing, and Haskells semantics is a more ecient
variant known as call-by-need where instead of re-evaluating an argument each time it is used,
all occurrences of the argument are overwritten with its value the rst time it is evaluated. This
requires that we have a run-time representation of terms that allows sharing of subexpressions,
leading to a terms becoming graphs rather than trees.
Call-by-Value. Most languages, including Java and ML use this strategy, where again no
reductions are allowed inside abstractions and a redex is only reduced when its argument part
has already been reduced to a value a term which cannot be reduced any further.
id(id(z.idz))
id(z.idz)
z.idz
Call-by-value is related to applicative order, in exactly the same way that call-by-name relates
to normal order. The call-by-value strategy is strict in the sense that the arguments to a
function are evaluated whether or not they are used by the function. In contrast non-strict
(or lazy) strategies such as call-by-name and call-by-need only evaluate the arguments that are
actually used. The dierence between strict and non-strict becomes clearer when we consider
the possibility of non-termination. We will see a more formal denition of these concepts later
in the course.
2.4 Substitution
The idea of substituting a term for all occurences of a variable, as in the denition of -reduction
is intuitively appealing but it turns out that a formal denition is quite a delicate matter. A
nave (and ultimately incorrect) attempt might be:
x[x := N] = N
y[x := N] = y
(y.M)[x := N] = y.(M[x := N])
(MP)[x := N] = (M[x := N])(P[x := N])
For most examples this seems to work. For example
(y.x)[x := (z.zw)] = y.z.zw
which matches our intuitions about how substitution should behave. However
(x.x)[x := y] = x.y
3 LAMBDA CALCULUS AS A MODEL OF COMPUTATION 7
conicts with a basic understanding that the names of bound variables (i.e. parameters) dont
matter. The identity function is the same whether we write it as x.x or z.z or fred.fred.
If these dont behave the same way under substitution they wont behave the same way under
evaluation and that seems wrong. The mistake is that the substitution should only apply to
free variables and not bound ones. In the example, x is bound in the term so we should not
substitute it. That seems to give us what we want:
(x.x)[x := y] = x.x
But thats still not quite enough, as the following example demonstrates:
(z.x)[x := z] = z.z
This has changed a constant function into the identity function in some sense this is the
dual of the problem identied above. Once again, the choice of z as the binder in z.x should
be completely arbitrary. This phenomenon is known as free variable capture as the z being
substituted is free, but in the result it is bound. Our solution will be to allow renaming of
bound variables (a process traditionally known as -conversion). We are now in a position to
give a formal denition of substitution which accounts for the issues explored here.
x[x := N] = N
y[x := N] = y
(x.M)[x := N] = x.M ()
(y.M)[x := N] = y.M[x := N] if y , FV(N)
(y.M)[x := N] = z.M[y := z][x := N] if y FV(N), z a fresh variable ()
(MP)[x := N] = (M[x := N])(P[x := N])
assuming that x and y are dierent variables.
Notice in () that the substitution does not apply to bound occurrences of x. In () notice the
replacement of bound variables to avoid capture of free variables.
That all seems quite complicated and it is so it is common to abide by a convention rather
than keep dealing with this machinery. Typically we work with lambda terms up to renaming
of bound variables or up to -conversion and dont allow variable capture. In fact many
compilers, including ghc, rename all variables to be unique, so as to circumvent such problems.
Exercise 2.1
The -calculus reduction workbench, available through the course web site, automatically gener-
ates fresh variables to avoid free variable capture. Try evaulating some term on the workbench,
such as (\ x . \ y .x) y and (\ x . \ x . x) y to see it in action.
3 Lambda Calculus as a Model of Computation
So far this may all seem ungrounded, almost vacuous it all looks like symbol pushing and
nothing more. There are none of the data values which we are used to having at the basis
of computation, such as numbers and booleans. Without them, what can we do with the -
calculus?
3 LAMBDA CALCULUS AS A MODEL OF COMPUTATION 8
In this section we will see how to represent such data types in the -calculus. This is not such
an unfamiliar idea: we are used to the idea of interpreting sequences of bits as integers, oats,
characters, instructions and so on. Here we will interpret certain lambda terms as representing
boolean and numeric values, in a manner that can be extended to other classes of values and
data structures. What arises from dening a sucient set of representations is a model of
computation equivalent in power to Turing machines, recursive function theory and every other
standard model of computation.
3.1 Multiple Arguments
First we will deal with a straightforward matter. Many functions of interest (addition, for
example) take more than one argument, yet -calculus abstractions only have a single binder.
There are two solutions: either pass a single argument consisting of a tuple of values, or curry
the functions. The rst solution requires that we nd a way to represent tuples in the -calculus,
which turns out to be not too dicult. The second approach, which we will adopt here, treats
a function of several arguments as taking each argument in turn, one at a time, and should be
familiar from your Haskell programming experience.
1
3.2 Booleans
It may be that we naturally consider boolean values to somehow be primitive or atomic
they are what they are. In fact, what we are interested in (and this is a rather algebraic
attitude that extends to all data types and structures) is how they behave. At the heart of
our understanding of boolean values is that they guide a choice: if something is true choose
this, otherwise choose that. So what we want is to come up with a way of representing true,
false and choice (i.e. conditional) as lambda terms, such that they work together as we wish
that is, their evaluation leads to results that correspond to (representations of) our intuitive
computational expectations.
tt = xy.x
= xy.y
cond = abc.abc
where tt represents true, represents false and cond represents if-then-else.
Notice that tt is a function that takes two arguments and returns the rst, while also takes
two arguments and returns the second. The cond combinator takes three arguments and applies
the rst to the second and the third. The key observation is that the rst argument to cond will
be either tt or , thus choosing either the second or the third argument respectively.
So for example, if true then M else N is represented by the term:
cond tt MN = (abc.abc)(xy.x)MN
1
The expression curry is after Haskell Brooks Curry, a student of Church and one of the -calculus pioneers.
3 LAMBDA CALCULUS AS A MODEL OF COMPUTATION 9
which evaluates as follows:
(abc.abc)(xy.x)MN
(bc.(xy.x)bc)MN
(c.(xy.x)Mc)N
(xy.x)MN
(y.M)N
M
as we would wish.
Exercise 3.1
Complete a similar evaluation of cond MN
A little thought should convince you that other logical operators such as conjunction and dis-
junction can be dened in terms of these three notions, so we have constructed three combinators
that form the basis of a representation of booleans in the -calculus.
Exercise 3.2
Using logical equivalences like
x and y = if x then y else false
dene combinators and, or and not representing logical conjunction, disjunction and negation.
3.3 Church Numerals
One way of representing the natural numbers in the -calculus is as Church numerals. They
have a similar avour to Peano arithmetic where the natural numbers are dened inductively
as follows:
zero is a natural number;
if n is a natural number, then succ(n) is also a natural number.
So for example
3 = succ(succ(succ(zero)))
The Church numerals c
0
, c
1
, c
2
, c
3
, . . . are rather similarly dened:
c
0
= sz.z
c
1
= sz.sz
c
2
= sz.s(sz)
c
3
= sz.s(s(sz))
. . .
We have used s and z as suggestive names for the bound variables, but of course they are quite
arbitrary. The idea is that c
n
takes two arguments s and z (for successor and zero) and applies s,
3 LAMBDA CALCULUS AS A MODEL OF COMPUTATION 10
n times to z. Once again, we are not so interested in representation but rather in the interaction
with the combinators representing primitive operations on numbers, such as succ, addition and
so on.
A successor function on Church numerals needs to take the c
n
term apart to get the body, add
another s application, then rewrap it in the binders:
succ = n.sz.s(nsz)
For example:
succ c
2
= (n.sz.s(nsz))(ab.a(ab))
sz.s((ab.a(ab))sz)
sz.s((b.s(sb))z)
sz.s(s(sz))
= c
3
Addition can be performed by a term plus that takes two Church numerals, m and n, and yields
another Church numeral (i.e. a function) that accepts arguments s and z, applies s iterated n
times to z (by passing arguments s and z to n), and then applies s iterated m more times to
the result:
plus = m.n.sz.ms(nsz)
Exercise 3.3
Convince yourself that this denition is correct by evaluating plus c
2
c
3
. Do it by hand and try
some examples using the workbench.
Exercise 3.4
Dene combinators for multiplying and exponentiating Church numerals. A natural approach
to dening multiplication is to use the plus combinator, but you may also nd another way.
Inventing subtraction and predecessor combinators may be too challenging, but they are among
the predened combinators that come with the -calculus workbench. Look at their denitions
and try to understand and describe how they work.
Example 3.5
Here is an example reduction to normal form, with redexes in red (the function) and blue (the
argument) and their residuals in green, with the substitutions of the argument still in blue.
3 LAMBDA CALCULUS AS A MODEL OF COMPUTATION 11
The expression is cond tt (succ c
2
) c
0
:
(bxy.bxy)(tf.t)((nfx.f(nfx))(sz.s(sz)))(sz.z)
(xy.(tf.t)xy)((nfx.f(nfx))(sz.s(sz)))(sz.z)
(xy.(tf.t)xy)((nfx.f(nfx))(sz.s(sz)))(sz.z)
(y.(tf.t)((nfx.f(nfx))(sz.s(sz)))y)(sz.z)
(y.(tf.t)((nfx.f(nfx))(sz.s(sz)))y)(sz.z)
(tf.t)((nfx.f(nfx))(sz.s(sz)))(sz.z)
(tf.t)((nfx.f(nfx))(sz.s(sz)))(sz.z)
(f.(nfx.f(nfx))(sz.s(sz)))(sz.z)
(f.(nfx.f(nfx))(sz.s(sz)))(sz.z)
(nfx.f(nfx))(sz.s(sz))
(nfx.f(nfx))(sz.s(sz))
fx.f((sz.s(sz))fx)
fx.f((sz.s(sz))fx)
fx.f((z.f(fz))x)
fx.f((z.f(fz))x)
fx.f(f(fx))
Which is c
3
, as expected.
Another class of operators on numbers are the relational operators, bringing the representation of
booleans and numbers together. Consider testing whether a Church numeral is zero. To achieve
this we must nd some appropriate pair of arguments that will give us back this information.
Specically we want to apply our numeral to a pair of terms ss and zz (so ss is substituted
for s and zz is substituted for z) such that applying ss to zz one or more times yields while
not applying it at all yields tt. The constant function x. serves as ss (always discarding its
argument and returning ) and tt as zz, leading to:
iszero = m.m(x.)tt
For example
iszero c
2
= (m.m(x.)tt)c
2
c
2
(x.)tt
= (sz.s(sz))(x.)tt
(z.(x.)((x.)z))tt
(x.)((x.)tt)
and
iszero c
0
= (m.m(x.)tt)c
0
c
0
(x.)tt
= (sz.z)(x.)tt
(z.z)tt
tt
3 LAMBDA CALCULUS AS A MODEL OF COMPUTATION 12
3.4 Combinators
In the two sample evaluations above, we worked with a mixture of combinators and pure terms,
expanding the combinators as necessary before performing reduction. This was important
because our intention was to informally verify that our choice of combinators to represent the
various functions and values was correct. However, taking a more abstract approach, we can
work simply with higher-level rules dealing directly with the combinators. For example, the
following rules can be derived from their denitions as lambda terms:
iszero c
0
= tt
iszero c
n
= for all n ,= 0
succ c
n
= c
n+1
plus c
m
c
n
= c
m+n
and so on. This kind of conceptual abstraction is very familiar to computer scientists. For
example, even when coding in assembler language we think in terms of data values passing
between registers and memory locations, rather than the bit pattern of the instruction and
the micro-code running on the processor. Taking that idea one step further, notice that the
lambda term for plus only behaves as addition under a particular interpretation involved with
Church numerals in another context its just another lambda term. The same idea of dierent
interpretation of the same data object is familiar in computer science: the same bit pattern may
represent a character, a number, an instruction and so on.
3.5 Recursion
Every reasonable model of computation must allow for some form of repetition, but so far we
have only talked about simple values and primitive operations. If we consider the denition of
the factorial function mentioned in section 2.1:
fact n = if n == 0 then 1 else n fact(n 1)
We have combinators for most components of this denition, e.g.
fact = n. cond (iszero n) c
1
(mul n (fact(pred n)))
but the function name persists within its own denition. To complete the job we want to nd
a lambda term that (anonymously) represents the fact function. At the moment we still rely
on naming the function and giving an equation for which we have learned a way to interpret
computationally. As mentioned in section 2.1 we will look at this in more depth in a later section
of the course.
Recall that a term that cannot take a step under the evaluation relation is called a normal
form. It is interesting that some terms do not have a normal form, for example the divergent
combinator:
omega = (x.xx)(x.xx)
contains exactly one redex (the whole term) and reducing it yields exactly the same term omega
again.
3 LAMBDA CALCULUS AS A MODEL OF COMPUTATION 13
There are various combinators, related to omega, known as xpoint operators. A well-known
one is the Y combinator:
Y = f.(x.f(xx))(x.f(xx))
Just looking at the term isnt very helpful, but notice this important fact:
Yf = f(Yf) = f(f(Yf)) = f(f(f(Yf))) = . . .
To get some better intuition of the behaviour of the Y combinator we will explore a specic
example. When we write recursive function denitions like
f = body containingf )
the intention is that the denition should be unfolded inside the body. For example, the
intuition about the factorial function denition in section 2.1 is an innite unfolding:
if n == 0 then 1
else n * (if n-1 == 0 then 1
else (n-1) * (if n-2 == 0 then 1
else (n-2) * (if n-3 == 0 then 1
else (n-3) * ...
Or, using Church numerals:
cond (iszero n) c
1
(mul n (cond (iszero (pred n)) c
1
(mul (pred n) (cond (iszero (pred(pred n)) c
1
(mul (pred(pred n)) (cond (iszero (pred(pred(pred n)))) c
1
(mul (pred(pred(pred n)))) . . .
The Y combinator can give us this unfolding eect. First we dene:
g = h. body containing h )
and then:
f = Y g
Doing this to the factorial function gives:
g = f . n. cond (iszero n) c
1
(mul n (f (pred n)))
fact = Y g
or simply:
fact = Y(f . n. cond (iszero n) c
1
(mul n (f (pred n))))
Lets see what happens as we begin evaluating (fact c
3
). Rather than writing out the whole
term in -notation, we will leave in the combinator names, expanding them as necessary.
fact c
3
= (Y g) c
3
= ((f.(x.f(xx))(x.f(xx))) g) c
3
(x.g(xx))(x.g(xx)) c
3
g ((x.g(xx))(x.g(xx))) c
3
4 THE TYPED LAMBDA CALCULUS 14
Notice that the rst argument to g (the subterm in blue) is just (Y g). Observe that fact has
a self-replicating aspect, so that when it is applied to an argument (say n), it also supplies
itself to g. Lets continue, expanding g to the term it names:
g ((x.g(xx))(x.g(xx))) c
3
= g (Y g) c
3
= (f . n. cond (iszero n) c
1
(mul n (f (pred n)))) (Y g) c
3
(n. cond (iszero n) c
1
(mul n ((Y g) (pred n)))) c
3
cond (iszero c
3
) c
1
(mul c
3
((Y g) (pred c
3
)))
cond c
1
(mul c
3
((Y g) c
2
))
mul c
3
((Y g) c
2
)
Now notice that the subterm redex (Y g) c
2
is just (fact c
2
), so we have calculated that
fact c
3
= mul c
3
(fact c
2
)
which is what we were aiming for.
Exercise 3.6
Complete the calculation of (fact c
3
), both by hand and on the lambda calculus workbench.
We will talk more about xpoints later in the Fixpoint Theory of Recursive Function Denitions
component of the course. For now just be convinced that we can cope with repetition or recursion
in the -calculus.
4 The Typed Lambda Calculus
In Mathematics and Computer Science, especially in programming languages, we usually nd it
helpful to classify values and expressions according to some notion of having a particular type.
The pure -calculus as we have seen it so far could be considered to have a degenerate type
system where every term has the same type, say D, and we could write
M : D
Alone that isnt very interesting, but considering that since every term is a function (and every
argument and result is too) we can also write
M : D D
to indicate Ms functional characteristic. An immediate consequence is that D and (D D)
need to be equal (or at least isomorphic), but unfortunately they cant be they dont even
have the same cardinality. This caused some concern: what could a model of the -calculus be
and in particular is there a function space model? The armative answer was provided by
Dana Scott who took (D D) to be the continuous functions from D to D.
4 THE TYPED LAMBDA CALCULUS 15
4.1 Extending the Typed Lambda Calculus
Lets begin by attempting to classify the boolean values as having type Bool. In the previous
section we represented the boolean value false as the combinator = xy.y. It is tempting to
say has type Bool but that just doesnt work: we may be able to interpret the behaviour of
as being like false in certain contexts, but in another context the same term is the Church
numeral c
0
which we would like to classify as being of type Int. Furthermore, in general xy.y
is just a function that takes two arguments and returns the second. A similar point can be made
regarding all the combinators we introduced in our discussion of the pure -calculus as a model
of computation.
Instead of trying to classify the combinators, what we will do is to introduce a type Bool and
new values to the calculus:
true, false : Bool
We will also need to add (at least) a primitive conditional operation as the cond combinator no
longer serves that purpose. Similarly we can introduce a type Int along with primitive values
and operators to further extend the -calculus.
0, 1, 2, 3, . . . : Int
4.2 The Typing Relation
We know the type of the primitive Bool and Int values, but our aim is to construct a type
system for the -calculus syntactic categories (variable, abstraction and application) that is
well-behaved: for example, if M : t and M reduces to N then N : t. The system should also not
be too conservative, in the sense that most useful terms will have a type.
2
Consider how to go about assigning a type to x.M. We want it to have a function type which
we will write as
x.M : t
1
t
2
where t
2
is the result type and t
1
is the argument type. The result type is therefore the type
assigned to M, but to work out that type, there is a dependency on the type assumed for the
binder x. An obvious example is the identity function:
x.x : t t
In general, the collection of type assumptions for all the free variables is needed as part of the
type assignment process. The typing relation is thus a triple:
M : t
where M is a term, t is the type being assigned to M and is a collection of type assumptions
(or type bindings) of the form x : t. We call a context or an environment. For simplicity we
will gloss over details and require that a context contains at most one binding for any variable.
We will write , x : t to indicate a context containing the binding x : t.
2
We could completely evaluate a term and then classify its result, but typing is intended to be a static analysis
we want to classify their behaviour prior to evaluation.
4 THE TYPED LAMBDA CALCULUS 16
The type assignment system is as follows:
, x : t x : t (Var)
, x : t
1
M : t
2
x.M : t
1
t
2
(Abstr)
M : t
1
t
2
N : t
1
MN : t
2
(App)
The Var rule says that the type of a free variable x is determined by the corresponding context
assumption. It is important to keep in mind that in an abstraction x.M the bound occurrences
of the binder x are the free occurrences of x in term M. The Var rule is an axiom and the other
two are inference rules. As in your earlier studies of natural deduction, the components above
the line are premises and below the line is the consequence. In other words, if we can deduce
the premises then the rule allows us to deduce the consequence. As with natural deduction we
can present type assignments as derivation trees.
The Abstr rule can be read as follows: if, under the assumption that x has type t
1
we can
determine the type of M to be t
2
, then the abstraction x.M has functional type t
1
t
2
.
Notice that the assumption about the type of x is expressed by the context , x : t
1
.
The App rule says that if we can determine that M has a functional type t
1
t
2
and that
Ns type agrees with the argument type of M then the the application MN has type t
2
, as
given by the result type of function M. Explicit in this rule is a requirement that type of the
argument provided agrees with the type the function is expecting. If not, we may reject the
term as invalid. Essentially, this is the point of static typing. Another thing to notice about the
App rule is that the contexts for typing M and N are both the same . If that were not the
case we would be able to have dierent type assumptions for the same free variable. If that were
permitted for example x : t in M and x : t