0% found this document useful (0 votes)
200 views

Continuation-Passing, Closure-Passing Style: Andrew W. Appel Trevor Jim

This document summarizes a research paper that implemented a continuation-passing style (CPS) code generator for the ML programming language. The CPS code generator represents programs as an ML datatype where functions are named and ill-formed expressions are impossible. It separates code generation into multiple phases that rewrite the representation into simpler forms. Closures are explicitly represented as records, allowing closure strategies to be communicated between phases. Benchmark results showed this new CPS code generator was an improvement over the previous abstract machine-based generator.

Uploaded by

Bob William
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PS, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
200 views

Continuation-Passing, Closure-Passing Style: Andrew W. Appel Trevor Jim

This document summarizes a research paper that implemented a continuation-passing style (CPS) code generator for the ML programming language. The CPS code generator represents programs as an ML datatype where functions are named and ill-formed expressions are impossible. It separates code generation into multiple phases that rewrite the representation into simpler forms. Closures are explicitly represented as records, allowing closure strategies to be communicated between phases. Benchmark results showed this new CPS code generator was an improvement over the previous abstract machine-based generator.

Uploaded by

Bob William
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PS, PDF, TXT or read online on Scribd
You are on page 1/ 11

-

Continuation-Passing, Closure-Passing Style

Andrew W. Appel*
Trevor Jim†

CS-TR-183-88

July 1988
Revised September 1988

ABSTRACT
We implemented a continuation-passing style (CPS) code generator for ML. Our CPS language is
represented as an ML datatype in which all functions are named and most kinds of ill-formed expres-
sions are impossible. We separate the code generation into phases that rewrite this representation
into ever-simpler forms. Closures are represented explicitly as records, so that closure strategies can
be communicated from one phase to another. No stack is used. Our benchmark data shows that the
new method is an improvement over our previous, abstract-machine based code generator.

To appear in POPL ’89.

* Supported in part by NSF Grant DCR-8603543 and by a Digital Equipment Corp. Faculty Incentive Grant.
† AT&T Bell Laboratories, Murray Hill, NJ. Current address: Laboratory for Computer Science, MIT, Cambridge, Mass.
-2-

1. Overview definitions.
Standard ML of New Jersey[1] is a compiler for 8. ‘‘Register spilling,’’ producing a CPS
ML written in ML. Its first code generator, based expression in which no sub-expression has
on an abstract stack machine, produced code with more than n free variables, where n is
acceptable but not stunning performance. Exami- related to the number of registers on the
nation of the code revealed that the greatest target machine.
source of inefficiency seemed to be that each 9. Generation of target-machine instructions.
value went on and off the stack too many times.
10. Backpatching and jump-size optimization.
Rather than hack a register allocator into the
abstract stack machine, we decided to try a Where the ORBIT compiler has one black box
continuation-passing style (CPS)[2] code genera- covering phases 6 through 9, we have four
tor. Kranz’s ORBIT compiler[3] [4] shows how smaller black boxes. The interfaces between the
CPS provides a natural context for register alloca- phases are semantically well-defined, making it
tion and representation decisions. easier to isolate individual parts of the analysis to
one phase.
The beauty of continuation passing style is that
control flow and data flow can be represented in a This paper describes phases 4 through 9, and then
clean intermediate language with a known seman- presents an analysis based on profiling and bench-
tics, rather than being hidden inside a ‘‘black marks. Because of space limitations, we must
box’’ code generator. The ORBIT compiler assume that the reader is familiar with
translates CPS into efficient machine code, mak- continuation-passing style.
ing representation decisions for each function and
each variable. ORBIT does an impressive set of 2. Continuation-passing style
analyses in its back end, but they’re all tangled Our back-end representation language is a
together into a single phase. We have a series of continuation-passing style (CPS) representation
phases, each of which re-writes and simplifies the similar in spirit to Steele’s, but with a few impor-
representation of the program, culminating in a tant differences: we use the ML datatype feature
final instruction emission phase that’s never to prohibit ill-formed expressions; we want every
presented with complications. function to have a name; and we have n-tuple
operators which make modelling closures con-
The phases are: venient.

1. Lexical analysis, parsing, typechecking, An important property of well-formed CPS


producing an annotated abstract syntax expressions in Steele’s representation is that a
tree. function-application can never be the direct child
of another application. We can express this res-
2. Translation into lambda-calculus (produc-
triction directly in the ML datatype cexp (for
ing a simple representation described
continuation-expression):
in[1]).
3. Optimization of the lambda-calculus
(present here for historical reasons; this datatype cexp
phase duplicates some of the effort done by = RECORD of
our CPS optimizer). var list * var * cexp
| SELECT of
4. Conversion into continuation-passing style,
int * var * var * cexp
producing a CPS representation described
| APP of
in the next section.
var * var list
5. Optimization of the CPS expression. | FIX of
6. Closure conversion, producing a CPS (var * var list * cexp) list * cexp
expression in which each function is closed | SWITCH of
(i.e. has no free variables). var * cexp list
7. Elimination of nested scopes, producing a | PRIMOP of
CPS expression with one global set of int * var list * var list * cexp list
mutually-recursive, non-nested function
-3-

The italicized var’s are binding occurrences, and SELECT(i,r,v,cont) means ‘‘let v be the
the others are uses of the variables. i th field of r in cont’’.

All of Steele’s ‘‘atoms’’[2] are represented in our We have a constructor for indexed jumps
cexp’s by variables. Constants are represented by (SWITCH), and a constructor (PRIMOP) for mis-
globally-free variables entered in an auxiliary cellaneous in-line primitive operations like
table. This means that a function application integer and floating arithmetic, array subscript,
(APP) can be represented by the name of the etc. The expression
function (var) and a series of arguments (var list).
PRIMOP(i,[a,b,c],[d,e],[F,G,H])
The constraint that an APP can’t be a child of an
APP is enforced by the fact that the arguments of means to apply operator i to the arguments
the APP constructor are variables, not (a,b,c) yielding the results d and e, and then
continuation-expressions. branch to one of the continuations F, G, or H.
Each primitive operator has its own ‘‘signature;’’
One of the useful properties of CPS is that every for example
intermediate value of a computation is given a
PRIMOP(plus,[a,b],[d],[F])
name. In Steele’s representation, however, func-
tions can still be anonymous, making it difficult takes two arguments, returns one result, and con-
for a code generator to keep track of them. tinues in only one way, whereas
Therefore, we eliminate LAMBDA from our CPS
PRIMOP(lessthan,[a,b],[],[F,G])
datatype, in favor of FIX, a general-purpose
mutually recursive function definition in which takes two arguments, produces no result, and
names are explicitly bound to functions: branches to F or G.

One goal in choosing our representation is to give


FIX([(f,[x,y,z],
each object a name (i.e. a variable). Why did we
(...x...g...y...z...f...)),
not represent the control-flow branches of a
(g,[i,j],
SWITCH or PRIMOP by variables standing for
(...j...i...g...f...g...)) ],
continuation functions? The problem is with the
...g...f... )
free variables of the different control-flow
branches. In the CPS language as shown in this
(In ML notation, [x,y,z] is a list of three elements section, any sub-expression and any (nested)
and (a,b,c) is a tuple of three elements.) This function may have free variables (that are bound
example defines two mutually recursive functions at an outer level of nesting). However, in one of
f and g of two and three arguments, respectively; our code generation phases we will rewrite the
the ellipses are other expressions, and the entire CPS graph to eliminate free variables from all
example is an expression. The binding is roughly functions. If the control-flow branches are to be
equivalent to the ML expression: represented as functions, then they could not have
free variables without creating extra closures; in
fact, it would be necessary to create closures for
let fun f(x,y,z) =
all branches even though only one branch would
...x...g...y...z...f...
be taken. Therefore we compromise and leave
and g(i,j) =
the branches as unnamed continuation expres-
...j...i...g...f...g...
sions instead of named continuation variables.
in ...g...f...
end
3. Conversion into CPS
The front end of our compiler produces a
Function definitions can be nested in other lambda-calculus intermediate representation
expressions; expressions (and functions) can, of (described in [1]). This must be translated into
course, have free variables bound at outer levels. continuation-passing style; the conversion algo-
rithm is similar to Steele’s and won’t be described
RECORD and SELECT are used to manipulate in detail here.
n-tuples. RECORD([a,b,c,d],r,cont)
means ‘‘let r = (a,b,c,d) in cont’’, and The conversion process doesn’t do many optimi-
-4-

zations; it’s simpler to do that in a separate phase. 4: Remove unused arguments of functions.
The converter has its hands full just with the 2: Flatten the arguments of (nominally
semantics of the two languages (lambda-calculus single-argument) ML functions that are
and continuation-passing style) that it is translat- always called with a tuple of actual param-
ing between. It does make these representation eters.
decisions:
0.1: Remove the definitions of variables that
g Makes control flow explicit by the use of aren’t used.
continuations.
Our optimizer makes several passes (typically
g ‘‘Lowers’’ typed constructs like ML’s dis- half a dozen) before no (or few) redexes remain.
joint union constructors into untyped con- The test that produced the frequencies above
structs like RECORDs with integer tags. counted all passes on a 16,000-line ML program
g Optimizes the representation of case state- that had a graph of size 118514. If the optimizer
ments (arising from ML pattern-matching) were to stop at module boundaries, the numbers
into jump-tables or binary trees of com- would be somewhat different. Our compiler, for
parisons.[5]. historical reasons, also has an optimizer in the
g The pattern (λx. M)(N), which has the (non-CPS) lambda-calculus level, which has
effect of let x = N in M, is treated spe- some overlap with the CPS reducer and also will
cially. This is an optimization that could be affect the counts given here.
left for the next phase, but it is convenient
and cost-effective to recognize it here. 5. Closure conversion
When one function is nested inside another, the
4. Reduction of the CPS inner function may refer to variables bound in the
The next phase is a CPS ‘‘reducer’’ that performs outer function. A compiler for a language where
a variety of optimizations. They are listed here, function nesting is permitted must have a
each with an indication of how often it is applica- mechanism for access to these variables. The
ble for each 1000 operands* of CPS graph: problem is more complicated in languages (like
ML) with higher-order functions, where the inner
function can be called after the outer function has
205: Replace SELECT(i,r,...) with the i th returned.
field of r, when r is a statically determin-
able record. The usual implementation technique uses a ‘‘clo-
181: Perform beta-reduction (inline expansion) sure’’ data structure: a record containing the free
on any function that is called only once, or variables of the inner function as well as a pointer
whose body is not too large. to its machine code. A pointer to this record is
72: Merge sets of mutually recursive function made available to the machine code while it exe-
definitions (FIXes) in the hope that they cutes so that the free variables are accessible. By
will later share the same closure. Merging putting the code-pointer at a fixed offset (e.g. 0)
can be done if one FIX is the immediate of the record, users of the function need not know
child of another, and each has the same set the format of the record or even its size in order
of free variables. to call the function.
66: Perform eta-reduction (where f(x,y)=g(x,y),
In fact, several functions can be represented by a
replace all uses of f with g).
single closure record containing the union of their
47: Perform constant-folding on SWITCHes free variables and code pointers. A closure
and PRIMOPs. record is necessary only for a function that
26: Hoist (un-nest, or enlarge the scope of visi- ‘‘escapes’’ — some of its call sites are unknown
bility of) function definitions to enable the because it is passed as an argument, stored into a
merging of FIXes. data-structure, or returned as a result of a
function-call. A call of an escaping function is
hhhhhhhhhhhhhhhhhh implemented by extracting the code pointer from
* This is a larger and more useful quantity than the the closure record and jumping to the function
number of nodes in the graph. with the closure record as one of its arguments.
-5-

The closure of a ‘‘known’’ function (whose every


call site is known at compile-time) need not be
fun f x =
implemented as a record. Instead, the free vari-
let fun g y = x+y
ables can be added as extra arguments to the
in g z
function. A call of a known function must
end
arrange to pass along the appropriate variables to
the function. This implementation of closures is
...f 3...
intended to produce efficient code for loops.

Some functions escape and are also called from


in CPS:
known sites. These can be split into two func-
tions, where the escaping function is defined in
FIX([(f,[x,c1],
terms of the known one, in the hope that the
FIX([(g,[y,c2],
known calls will execute more efficiently.
PRIMOP(+,[x,y],[a],
[APP(c2,a)]))]
Our closure converter rewrites a CPS expression,
APP(g,[z,c1])))],
making function-closure representation explicit;
we call this explicit representation ‘‘closure-
...APP(f,[3,c0])...)
passing style.’’ After a pass to gather free vari-
able information,* the converter traverses the
CPS expression. At every FIX that binds an
after closure conversion:
escaping function, a RECORD is inserted to
create an explicit closure record, and a new argu-
FIX([(f’,[f’’,x,c1],
ment corresponding to the closure record is added
FIX([(g’,[y,c2,x],
to the argument list of each function.† Free vari-
PRIMOP(+,[x,y],[a],
ables accessed within the body of the function are
[APP(c2,a)]))]
rewritten as SELECTs from this argument.
SELECT(1,f’’,z,
APP(g’,[z,c1,x])))],
Known-function bindings are rewritten by adding
RECORD([f’,z],f,
an argument for each free variable in the
function’s body. Applications of the function are
...SELECT(0,f,f’
rewritten as the application to the arguments and
APP(f’,[f,3,c0])...)
the necessary free variables.

The following code fragment shows a sample ML The function g is known, so its free variable x has
function and the transformations applied to it. In been added to its argument list; function f
rewriting a function f, our convention is to call the escapes, and requires a closure record. See the
new, closed function f ′, its closure record (if any) appendix for a larger example written in a more
f, and the formal parameter corresponding to the readable notation.
closure record f ′′. The first element of a closure
record f will be f ′, so that if f escapes to some In more complicated examples, involving many
context where f ′ is not known, the code-pointer f ′ variables from differing scopes, there can be a
can be SELECTed from the closure record. All number of possible closure representations for a
other references to escaping functions become function.[8] One simple strategy is to use a flat
references to closure records. closure containing all the free variables. At the
other extreme, a number of closure records
already exist when a new closure must be created,
hhhhhhhhhhhhhhhhhh and a pointer or link to one can provide access to
*Mutually-recursive functions complicate the free vari- several of the necessary variables. Combinations
able analysis, but this turns out to be a classical dataflow of the two allow us to trade off time of closure
problem (live-variable analysis) that can be solved by
classical techniques.[6] creation, size of closures, and ease of access to
† This technique has been used in the Categorical variables from closures. The tradeoffs can be
Abstract Machine[7] subtle: for example, linked closures can take up
more space than flat closures because they hold
-6-

on to closures which might otherwise be


PRIMOP(subscr,[arr,1],a,
reclaimed by the garbage collector. Several stra-
PRIMOP(subscr,[arr,2],b,
tegies have been implemented in Standard ML of
PRIMOP(subscr,[arr,3],c,
New Jersey.
RECORD([a,b,c],r,
PRIMOP(subscr,[arr,4],d,
6. Flattening and spilling phases
PRIMOP(subscr,[arr,5],e,
After closure conversion, functions no longer RECORD([d,e],s,
refer to non-constant free variables; therefore, SELECT(0,r,a’,
nesting of function definitions is not necessary. A SELECT(1,r,b’,
simple flattening pass gathers all the function PRIMOP(+,[a’,b’],[g],[
definitions of a compilation unit into a single set SELECT(2,r,c’,
of mutually-recursive function declarations; each PRIMOP(+,[g,c’],[h],[
declaration will correspond to a code fragment in SELECT(0,s,d’,
the final machine code. Gathering the fragments PRIMOP(+,[h,d’],[i],[
lets us generate code for functions in any order, SELECT(1,s,e’,
which helps make calls of known functions more PRIMOP(+,[i,e’],[j],[ ...
efficient (see section 7).
Now each sub-expression of this continuation-
Next, a register-spilling phase rewrites the CPS expression has only three variables free. The dis-
expression so that there are no more than n free cerning reader will note that we could have re-
variables at any subexpression, where n is related arranged the additions and the subscripts and
to the number of general purpose registers on the avoided the use of records entirely, but some pri-
target machine. Wherever the number of free mops have side-effects that prevent re-
variables at a subexpression is larger than n, a arrangements. Furthermore, spilling is rarely
RECORD containing some of the free variables is needed, so that improvements to the spilling algo-
inserted. The appropriate SELECT is inserted at rithm won’t affect overall performance much.
uses of those variables in the rest of the expres-
sion. The spiller could have been combined with the
closure converter; this would let us apply the clo-
For example, consider a program that fetches six sure representation analysis to spill records.
values from an array and adds them: However, spilling’s rarity and the extra complex-
ity required to combine the phases convinced us
PRIMOP(subscr,[arr,1],a, to implement spilling separately.
PRIMOP(subscr,[arr,2],b,
PRIMOP(subscr,[arr,3],c, Our method of spilling has two important conse-
PRIMOP(subscr,[arr,4],d, quences. First, PRIMOPs and APPs must have
PRIMOP(subscr,[arr,5],e, no more than n arguments (where n is the number
PRIMOP(+,[a,b],[g],[ of registers on the machine). Thus we make sure
PRIMOP(+,[g,c],[h],[ that the optimizer never flattens the arguments of
PRIMOP(+,[h,d],[i],[ a known function if it would cross the limit, and
PRIMOP(+,[i,e],[j],[ ... that the closure converter implements known
There are five variables free in the first plus- functions with more than n arguments and free
expression. If compiling for a machine with only variables by packaging the free variables into a
four registers, we can limit the number of simul- closure record.
taneous free variables by packing a,b,c into a
record r and d,e into a record s: Second, it means that the CPS datatype described
in section 2 does not allow the creation of records
from more than n free variables. In practice, we
use a RECORD constructor with (var * path) ele-
ments. The var can be a spill record, in which
case the path specifies an element to select from
it. Large records are made by computing some of
the elements, spilling them, computing more, and
eventually constructing the final record from vari-
-7-

ables in both spill records and registers. Some instructions on some machines prefer
Although this might seem expensive, profiling to have their result argument in the same
shows that all spill records take up only one or register as a source argument (this prefer-
two percent of the total heap allocation in our ence doesn’t typically apply to fetches, so it
Vax implementation, where n is 8. won’t probably wouldn’t be used for this
SELECT).
7. Generation of target-machine instructions Targeting:
Since modern garbage collectors are so cheap[9] Sometimes there’s an opportunity to avoid
[10] we have dispensed with the stack. This a move instruction later on. If the variable
simplifies the code generator, which doesn’t need w is used (in cexp) as the n th argument to a
to do the analysis[2] [4] [11] necessary to decide function f whose calling sequence requires
which closure records can be allocated on the the n th argument in register r, and if r is not
stack; it also simplifies the runtime system, mak- bound to any other live variable, then r
ing it easier to add multiple threads or state- should be used for w to save the cost of a
saving operators to the programming environ- move instruction when f is called.
ment.
Anti-targeting:
Eliminating the stack is advantageous not only If there is a call in cexp to a function f , one
because it makes the compiler simpler. Opera- of whose arguments (which is not w) is to
tions like call-with-current-continuation (which, be passed in register r, then r is to be given
though not in ML, is compatible with the ML less preference than another register, to
type system) are more efficient if there is no avoid the cost of moving w out of the way
stack; a generational garbage collector can when f is called.
traverse just the newest call-frames, whereas it Default:
would have to traverse all the call frames on a
Otherwise, any register not already allo-
stack; and in a multi-thread environment with
cated to a live variable may be used. The
stacks, a large stack space must be allocated for
work done by the spiller ensures that there
each thread even if it won’t all be used.
will always be a register available.
The expression handed to the target-machine
instruction generator has a very simple form As in the ORBIT compiler, the instruction gen-
indeed. Procedures never return (as a result of erator treats ‘‘known’’ functions (those whose
CPS conversion), procedures don’t have non- call sites are all known statically) specially. The
constant free variables (as a result of closure parameters of a known function can be allocated
analysis), scopes aren’t nested, and there are to registers in a way that optimizes at least one of
never more live variables than registers to hold the calls to the function. Specifically, the code
them. for a known function is not generated until a call
site is found and generated. Then, the formal
Since all representation decisions have been made parameters of the function can be allocated to the
in previous phases, the decisions made by the same registers where the call’s actual parameters
instruction generator have mostly to do with are already sitting; and the transfer of control can
register allocation. As an example, consider the be by ‘‘falling through’’ without a jump. Thus, at
fragment SELECT(3,v,w,cexp), which least one call to each known function can be at no
requires that the third field of the record v be cost (except for actual parameters that are con-
fetched into the (newly-defined) variable w, and stants, and must be fetched into registers).
execution to continue with the expression cexp.
Both v and w can be allocated to machine regis- 8. Benchmarks
ters, as a result of the (previous) spilling analysis. We ran five different compilers on five different
The variable v will have already been allocated at benchmark programs, all on a VAX 8650. The
the time it was bound in the enclosing expression. compilers were:
The variable w must be allocated at this time to a Pascal Berkeley Pascal with -O option
register. Several heuristics are used.
ORBIT Version 3.0 of the T system from
Two-address instructions: Yale, with the ORBIT code genera-
tor.
-8-

Old Our old code generator for ML, benchmark, our CPS code generator did only 25%
based on an abstract stack machine. better than the old one. The reason seems to be
CPS Our new code generator, described in that though the new code generator produces very
this paper. efficient code for tight, tail-recursive loops, big
programs tend to have more function calls requir-
CPS’ Our new code generator with aggres- ing saving of state.
sive cross-module optimization
enabled. It might be argued that since we save state by
Of course, comparisons between compilers for making continuation closure records, it is our
different programming languages may tell us stackless strategy that slows performance. How-
more about the languages than about the com- ever, we estimate that even if every closure
pilers. record were stack allocated we would save only
6% to 10%. Furthermore, the stackless strategy
The programs were: tends to use less memory (typically on the order
of 20%, but sometimes a much greater savings)
Hanoi The towers of Hanoi benchmark
than the old, stack-based code, because objects
from Kranz’s thesis[4].
tend to be retained on the stack after their last use.
Puzz A compute-bound program from And the stackless strategy has other advantages,
Forest Baskett[4]. like a simpler runtime system and garbage collec-
LenL A tail-recursive function (or, in Pas- tor.
cal, a while loop) to compute the
length of a list. Acknowledgements
LenR A recursive function (not tail- David MacQueen was co-designer and co-
recursive) to compute the length of a implementor of the front end of the ML compiler.
list. David Kranz made many useful suggestions about
Comp A 16000-line compilation job in benchmarks.
Standard ML. This is intended to
measure the performance of real sys- Appendix: An example in detail
tems, not just artificial benchmarks. To illustrate the several phases of the code gen-
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii eration, we show the transformations made to a
c Hanoi Puzz LenL LenR Comp c fragment of an ML program. The program has
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
c c
Pascal .42 2.02 1.43 7.52 function, count, that takes a predicate (a func-
c c
c ORBIT ˜.4 ˜2.1 .9 3.6 c
tion from α to boolean, for some type α) as an
c Old 1.28 8.81 5.62 5.71 1613 c argument, and returns a function that counts how
c CPS .72 2.63 1.18 3.89 1432 c many elements of a list (of α) satisfy that predi-
c
CPS’ .21 2.87 1.09 4.53 1224 cc cate. Then a function countZeros is made by
ciiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
applying count to a predicate that returns
The table above gives execution times in seconds, true on 0 and false on other integers:
not including garbage-collection overhead (which
can be arbitrarily large or small depending on fun count(pred) =
memory size[10]). let fun f(x::rest) =
if pred(x)
9. Results then 1+f(rest)
else f rest
By separating the code generation into easily- | f nil = 0
understood phases with clean interfaces, we make in f
it easier to produce robust optimizing compilers. end
Our method is not difficult to implement, and
works well in practice. val countZeros =
count (fn 0 => true
Our CPS code generator produces code that runs | _ => false)
up to four times faster than our old, stack-based
code generator on small benchmarks, and seems This function will be translated by the compiler
comparable to Pascal and ORBIT for the exam- into lambda-calculus and then into CPS, but for
ples we tested. But on the large, ‘‘real world’’ illustrative purposes we will show all the transfor-
-9-

mations using the syntax of ML. The first


fun count (pred,c1) = . . .
transformation is in the front end of the compiler,
where pattern-matches are converted into deci-
fun countZeros(m,c0) =
sion trees (i.e. if-expressions):
let fun f(a,c2) =
fun count(pred) = if null(a)
let fun f(a) = then c2(0)
if null(a) else if a.0 = 0
then 0 then
else if pred(a.0) let fun c4(i)=c2(1+i)
then 1+f(a.1) in f(a.1,c4)
else f a.1 end
in f else f(a.1,c2)
end in f(m,c0)
end
val countZeros =
Now comes the closure conversion phase. The
count (fn i => if i=0
function countZeros ‘‘escapes’’ — it can be
then true
called by functions that don’t know its structure
else false)
— so it must use the standard calling sequence.
The next transformation is the conversion into The standard calling sequence has three argu-
continuation-passing style: each function gets an ments: closure record, user-argument, and con-
additional continuation argument and ‘‘returns’’ tinuation. Here, the function has no free vari-
by calling the continuation: ables, but a closure record must still be built,
since the function is first-class (i.e. escapes).
fun count(pred,c1) =
let fun f(a,c2) =
The function f doesn’t escape, and doesn’t hap-
if null(a)
pen to need a closure. The function c4 escapes
then c2(0)
(it is an argument to a function), and needs a clo-
else
sure anyhow because it has a free variable c2
let fun c3(b) =
(renamed c7 here). Continuations have a two-
if b
argument standard calling sequence (different
then
from the three-argument standard for user-
let fun c4(i)=c2(1+i)
functions, since continuations don’t themselves
in f(a.1,c4)
have a continuation argument). The closure for
end
c4 will be called c4, while the function that
else
implements it will be c4’, and the argument in
let fun c5(i)=c2(i)
which the closure record is supposed to be passed
in f(a.1,c5)
is called c4’’.
end
in pred(a.0, c3) fun countZeros(e0,m,c0) =
end let fun f(a,c2) =
in c1(f) if null(a)
end then c2.0(c2,0)
else if a.0 = 0
val countZeros = . . . then let fun c4’(c4’’,i) =
let val c7=c4’’.1
The next phase is the optimization phase. Here
in c7.0(c7,1+i)
there is one η-reduction (the removal of the func-
end
tion c5) and several β-reductions that can be
val c4 = [c4’,c7]
done. In particular, the function count is not
in f(a.1,c4)
recursive (even though it contains a recursive
end
function nested within it), and is ‘‘small’’ enough
else f(a.1,c2)
that the optimizer decides to put it ‘‘inline’’
in f(m,c0)
inside countZeros. The function pred can
end
then be expanded inside the copy of count.
- 10 -

The next phase is trivial: since functions no


countZeros: e0 in r0, m in r1, c0 in r2
longer have free variables, they can be un-nested.
fall through: f(m,c0)
fun countZeros(e0,m,c0) = f(m,c0) f: a in r1, c2 in r2
and f(a,c2) = blbc r1,L1 branch if a is not null
if null(a) clrl r1 arg0 in r2, arg1(=0) in r1
then c2.0(c2,0) movl (r2),r0 r0 = c2.0
else if a.0 = 0 jmp (r0) c2.0(c2,0)
then let val c4=[c4’,c2] L1: else clause
in f(a.1,c4) movl (r1),r0 r0 = a.0
end cmpl $1,r0 test for "zero"
else f(a.1,c2) jne L2 branch to else clause
and c4’(c4’’,i) = movl r2,4(r12) second field of new record
let val c7 = c4’’.1 moval c4’,(r12) make record [c4’,c2]
in c7.0(c7,1+i) movl $0x21,-4(r12) descriptor of record
end movl r12,r2 r2 = c4
addl2 $12,r12 allocation bookkeeping
The spilling phase won’t change this program,
movl 4(r1),r1 r1 = a.1
since there are never very many variables free at
jbr f f(a.1,c4)
the same point. The next phase, register alloca-
L2: second else clause
tion and code generation (for the VAX, in this
movl 4(r1),r1 r1 = a.1
example), has more to do. Since countZeros
jbr f f(a.1,c2)
escapes, it must be given a standard calling
c4’: r1 = i, r2 = c4’’
sequence using registers 0, 1, and 2 for the clo-
addl2 $2,r1 r1 = 1+i
sure, argument, and continuation. Two-argument
movl 4(r2),r2 r2 = c7
standard functions (e.g. the continuation c4’)
movl (r2),r0 r0 = c7.0
have a different calling sequence using registers 2
jmp (r0) c7.0(c7,1+i)
and 1 for closure and argument; we arrange it this
way because the continuation of a function will
typically be the same as the closure of the next References
continuation called, and it will already be sitting
1. Andrew W. Appel and David B. Mac-
in the right register. The function f may have
Queen, ‘‘A Standard ML compiler,’’ in
any calling sequence, and this is chosen on the
Functional Programming Languages and
first call so that countZeros won’t have to
Computer Architecture (LNCS 274), pp.
shuffle any registers.
301-324, Springer-Verlag, 1987.
As described in [1], integers are represented with 2. Guy L. Steele, ‘‘Rabbit: a compiler for
a low-order 1 bit, pointers with a low-order zero. Scheme,’’ AI-TR-474, MIT, 1978.
Allocation (e.g. for the closure after L1 below) is 3. D. Kranz, R. Kelsey, J. Rees, P. Hudak, J.
in-line, relying on a page fault to tell it when it Philbin, and N. Adams, ‘‘ORBIT: An
needs to garbage-collect; register 12 points to the optimizing compiler for Scheme,’’ Proc.
next available allocable space. Sigplan ’86 Symp. on Compiler Construc-
tion, vol. 21 (Sigplan Notices), no. 7, pp.
This program would be more efficient (and would 219-233, July 1986.
not allocate closures) if it had been written tail-
4. David Kranz, ‘‘ORBIT: An Optimizing
recursively, but even as it is, it will run very
Compiler for Scheme,’’ PhD Thesis, Yale
quickly:
University, 1987.
5. Andrew W. Appel, Christopher W. Fraser,
David R. Hanson, and Arthur H. Watson,
‘‘Generating code for the Case statement,’’
in preparation.
6. V. Vyssotsky and P. Wegner, A graph
theoretical Fortran source language
analyzer, AT&T Bell Laboratories, Murray
- 11 -

Hill, NJ, 1963.


7. G. Cousineau, P. L. Curien, and M. Mauny,
‘‘The Categorical Abstract Machine,’’ in
Functional Programming Languages and
Computer Architecture, LNCS Vol 201, ed.
J. P. Jouannaud, pp. 50-64, Springer-
Verlag, 1985.
8. Andrew W. Appel and Trevor Jim,
‘‘Optimizing closure environment
representations,’’ CS-TR-168-88, Princeton
University, 1988.
9. David Ungar, ‘‘Generation scavenging: a
non-disruptive high performance storage
reclamation algorithm,’’ SIGPLAN Notices
(Proc. ACM SIGSOFT/SIGPLAN Software
Eng. Symp. on Practical Software Develop-
ment Environments), vol. 19, no. 5, pp.
157-167, ACM, 1984.
10. A. W. Appel, ‘‘Garbage collection can be
faster than stack allocation,’’ Information
Processing Letters, vol. 25, no. 4, pp. 275-
279, 1987.
11. David R. Chase, ‘‘Safety considerations for
storage allocation optimizations,’’ SIG-
PLAN ’88 Conf. on Prog. Lang. Design and
Implementation, pp. 1-10, ACM, 1988.

You might also like