0% found this document useful (0 votes)
31 views

Types Are Calling

Uploaded by

delta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Types Are Calling

Uploaded by

delta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Types are calling conventions

Max Bolingbroke Simon Peyton Jones


University of Cambridge Microsoft Research
[email protected] [email protected]

Abstract In this paper we take a more systematic approach. We outline


It is common for compilers to derive the calling convention of a a new intermediate language for a compiler for a purely functional
function from its type. Doing so is simple and modular but misses programming language, that is designed to encode the most impor-
many optimisation opportunities, particularly in lazy, higher-order tant aspects of a function’s calling convention directly in the type
functional languages with extensive use of currying. We restore the system of a concise lambda calculus with a simple operational se-
lost opportunities by defining Strict Core, a new intermediate lan- mantics.
guage whose type system makes the missing distinctions: laziness • We present Strict Core, a typed intermediate language whose
is explicit, and functions take multiple arguments and return multi- types are rich enough to describe all the calling conventions
ple results. that our experience with GHC has convinced us are valuable
(Section 3). For example, Strict Core supports uncurried func-
1. Introduction tions symmetrically, with both multiple arguments and multiple
In the implementation of a lazy functional programming language, results.
imagine that you are given the following function: • We show how to translate a lazy functional language like
Haskell into Strict Core (Section 4). The source language,
f :: Int → Bool → (Int, Bool )
which we call FH, contains all the features that we are inter-
How would you go about actually executing an application of f to ested in compiling well – laziness, parametric polymorphism,
two arguments? There are many factors to consider: higher-order functions and so on.
• We show that the properties captured by the intermediate lan-
• How many arguments are given to the function at once? One at
a time, as currying would suggest? As many are as available at guage expose a wealth of opportunities for program optimiza-
the application site? Some other answer? tion by discussing four of them – definition-site and use-site
arity raising (Section 6.1 and Section 6.2), thunk speculation
• How does the function receive its arguments? In registers? On (Section 5.5) and deep unboxing (Section 5.6). These optimi-
the stack? Bundled up on the heap somewhere? sations were awkward or simply inaccessible in GHC’s earlier
• Since this is a lazy language, the arguments should be evaluated Core intermediate language.
lazily. How is this achieved? If f is strict in its first argument,
can we do something a bit more efficient by adjusting f and its Although our initial context is that of lazy functional programming
callers? languages, Strict Core is a call-by-value language and should also
be suitable for use in compiling a strict, pure, language such as Tim-
• How are the results returned to the caller? As a pointer to a ber [1], or a hybrid language which makes use of both evaluation
heap-allocated pair? Or in some other way? strategies.
No single part of our design is new, and we discuss related work
The answers to these questions (and others) are collectively called
in Section 7. However, the pieces fit together very nicely. For exam-
the calling convention of the function f . The calling convention of
ple: the symmetry between arguments and results (Section 3.1); the
a function is typically determined by the function’s type signature.
use of n-ary functions to get thunks “for free”, including so-called
This suffices for a largely-first-order language like C, but it imposes
“multi-thunks” (Section 3.4); and the natural expression of algo-
unacceptable performance penalties for a language like Haskell,
rithms and data structures with mixed strict/lazy behaviour (Sec-
because of the pervasive use of higher-order functions, currying,
tion 3.5).
polymorphism, and laziness. Fast function calls are particularly
important in a functional programming language, so compilers for
these languages – such as the Glasgow Haskell Compiler (GHC) – 2. The challenge we address
typically use a mixture of ad hoc strategies to make function calls In GHC today, type information alone is not enough to get a defini-
efficient. tive specification of a function’s calling convention. The next few
sections discuss some examples of what we lose by working with
the imprecise, conservative calling convention implied by the type
system as it stands.

2.1 Strict arguments


Consider the following function:
f :: Bool → Int
[Copyright notice will appear here once ’preprint’ option is removed.] f x = case x of True → . . . ; False → . . .

1 2009/5/14
This function is certainly strict in its argument x . GHC uses
this information to generate more efficient code for calls to f , Shorthand Expansion
using call-by-value to avoid allocating a thunk for the argument. xn , hx1 , . . . , xn i (n > 0)
However, when generating the code for the definition of f , can we x , hx1 , . . . , xn i (n > 0)
really assume that the argument has already been evaluated, and x , hxi Singleton
hence omit instructions that checks for evaluated-ness? Well, no. x, y , hx1 , . . . , xn , y1 , . . . , ym i Concatenation
For example, consider the call
map f [fibonacci 10, 1234] Figure 1: Notation for sequences
Since map is used with both strict and lazy functions, map will not
use call-by-value when calling f . So in GHC today, f is conserva- (Cons x xs 0 ) →
tive, and always tests its argument for evaluated-ness even though case ys of
in most calls the answer is ‘yes’. Nil → Nil
An obvious alternative would be to treat first-order calls (where
(Cons y ys 0 ) → Cons (f x y) (zipWith f xs 0 ys 0 )
the call site can “see” the definition of f , and you can statically see
that your use-site has as at least as many arguments as the definition The functional argument f is always applied to two arguments,
site demands) specially, and generate a wrapper for higher-order and it seems a shame that we cannot somehow communicate that
calls that does the argument evaluation. That would work, but it is information to the functions that are actually given to zipWith
fragile. For example, the wrapper approach to a map call might do so that they might be compiled with a less pessimistic calling
something like this: convention.
map (λx . case x of y → f y) [. . .] 2.3 Optionally-strict source languages
Here, the case expression evaluates x before passing it to f , Leaving the issue of compilation aside, Haskell’s source-level type
to satisfy f ’s invariant that its argument is always evaluated1 . system is not expressive enough to encode an important class of
But, alas, one of GHC’s optimising transformations is to rewrite invariants about how far an expression has been evaluated. For
case x of y → e to e[x /y], if e is strict in x . This transfor- example, you might like to write a function that produces a list
mation would break f ’s invariant, resulting in utterly wrong be- of certainly-evaluated Ints, which we might write as [!Int ]. We do
haviour or even a segmentation fault – for example, if it lead to not attempt to solve the issues of how to expose this functionality
erroneously treating part of an unevaluated value as a pointer. GHC to the user in this paper, but we make a first step along this road by
has a strongly-typed intermediate language that is supposed to be describing an intermediate language which is able to express such
immune to segmentation faults, so this fragility is unacceptable. types.
That is why GHC always makes a conservative assumption about
evaluated-ness. 2.4 Multiple results
The generation of spurious evaluated-ness checks represents an In a purely functional language like Haskell, there is no direct
obvious lost opportunity for the so-called “dictionary” arguments analogue of a reference parameter, such as you would have in
that arise from desugaring the type-class constraints in Haskell. an imperative language like C++. This means that if a function
These are constructed by the compiler so as to be non-bottoming, wishes to return multiple results it has to encapsulate them in a
and hence may always be passed by value regardless of how a data structure of some kind, such as a tuple:
function uses them. Can we avoid generated evaluated-ness checks
for these, without the use of any ad-hocery? splitList :: [Int ] → (Int, [Int ])
splitList xs = case xs of (y : ys) → (y, ys)
2.2 Multiple arguments
Unfortunately, creating a tuple means that you need to allocate
Consider these two functions: a blob of memory on the heap – and this can be a real performance
f x y =x +y drag, especially when functions returning multiple results occur in
tight loops.
g x = let z = factorial 10 inλy → x + y + z How can we compile functions which – like this one – return
They have the same type (Int → Int → Int), but we evaluate multiple results, efficiently?
applications of them quite differently – g can only deal with being
applied to one argument, after which it returns a function closure, 3. Strict Core
whereas f can and should be applied to two arguments if possible.
GHC currently discovers this arity difference between the two We are now in a position to discuss the details of our proposed
functions statically (for first-order calls) or dynamically (for higher- compiler intermediate language, which we call Strict CoreANF 2 .
order calls). However, the former requires an apparently-modest but Strict CoreANF makes extensive use of sequences of variables,
insidiously-pervasive propagation of ad-hoc arity information; and types, values, and terms, so we pause to establish our notation
the latter imposes a performance penalty [2]. for sequences. We use angle brackets hx1 , x2 , . . . , xn i to denote
For the higher-order case, consider the well-known list-combining a possibly-empty sequence of n elements. We often abbreviate
combinator zipWith, which we might write like this: such a sequence as xn or, where n is unimportant, as x. When no
ambiguity arises we abbreviate the singleton sequence hxi to just
zipWith = λf :: (a → b → c). λxs :: List a. λys :: List b. x. All this notation is summarised in Figure 1.
case xs of We also adopt the “variable convention” (that all names are
Nil → Nil unique) throughout this paper, and assume that whenever the en-
vironment is extended, the name added must not already occur in
1 In Haskell, a case expression with a variable pattern is lazy, but in GHC’s
current compiler intermediate language it is strict, and that is the semantics 2 ANF stands for A-normal form, which will be explained further in Sec-
we assume here. tion 3.6

2 2009/5/14
the environment – α-conversion can be used as usual to get around
this restriction where necessary.
Variables x, y, z
3.1 Syntax of Strict CoreANF
Type Variables α, β
Strict CoreANF is a higher-order, explicitly-typed, purely-functional,
call-by-value language. In spirit it is similar to System F, but it is Kinds
slightly more elaborate so that its types can express a richer variety κ ::= ? Kind of constructed types
of calling conventions. The key difference from an ordinary typed | κ→κ Kind of type constructors
lambda calculus, is this:
A function may take multiple arguments simultaneously, Binders
and (symmetrically) return multiple results. b ::= x:τ Value binding
| α:κ Type binding
The syntax of types τ , shown in Figure 2, embodies this idea:
a function type takes the form b → τ , where b is a sequence Types
of binders (describing the arguments of the function), and τ is a τ, υ, σ ::= T Type constructors
sequence of types (describing its results). Here are three example | α Type variable references
function types: | b→τ Function types
f1 : Int → Int | τυ Type application
: h : Inti → hInti
Atoms
id : hα : ?, αi → α
a ::= x Term variable references
: hα : ?, : αi → hαi
| ` Literals
f3 : hα : ?, Int, αi → hα, Inti
: hα : ?, : Int, : αi → hα, Inti Atoms In Arguments
f4 : α : ? → Int → α → hBool , Inti g ::= a Value arguments
: hα : ?i → hh : Inti → hh : αi → hBool , Intiii | τ Type arguments
In each case, the first line uses simple syntactic abbreviations, Multi-value Terms
which are expanded in the subsequent line. The first, f1 , takes e ::= a Return multiple values
one argument and returns one result3 . Strict CoreANF expresses | let x : τ = e in e Evaluation
functions using the notation of dependent products. For example, | valrec x : τ = v in e Allocation
in System F the identity function id has type ∀α : ?. α → α, but in | ag Application
Strict CoreANF it has the type hα : ?, : αi → hαi (although another | case a of p → e Branch on values
possibility would be hα : ?i → hh : αi → hαii, reflecting the fact
that there is a choice of calling convention). As is conventional, Heap Allocated Values
the term binder “ ” may be omitted since it cannot be mentioned.
v ::= λb. e Closures
The next example, f3 , illustrates a polymorphic function that takes
| C τ , a Constructed data
a type argument and two value arguments, and returns two results.
Finally, f4 gives a curried version of the same function.
Patterns
Admittedly, this uncurried notation is more complicated than
p ::= Default case
the unary notation of conventional System F, in which all functions
| ` Matches exact literal value
are curried. The extra complexity is crucial because, as we will see
| C x:τ Matches data constructor
in Section 3.3, it allows us to express directly that a function takes
several arguments simultaneously, and returns multiple results.
Data Types
The syntax of terms (also shown in Figure 2) is driven by the
d ::= data T α : κ = c | . . . | c Data declarations
same imperatives. For example, Strict CoreANF has n-ary applica-
c ::= C τ Data constructors
tion a g; and a function may return multiple results a. A possibly-
recursive collection of heap values may be allocated with valrec,
where a heap value is just a lambda or constructor application. Fi- Programs d, e
nally, evaluation is performed by let; since the term on the right-
hand side may return multiple values, the let may bind multiple Typing Environments
values. Here, for example, is a possible definition of f3 above: Γ ::=  Empty environment
| Γ, x : τ Value binding
f3 = λhα : ?, x : Int, y : αi. hy, x i | Γ, α : κ Type binding
In support of the multi-value idea, terms are segregated into | Γ, C : b → hT αi Data constructor binding
three syntactically distinct classes: atoms a, heap values v, and | Γ, T : κ Type constructor binding
multi-value terms e. An atom a is a trivial term – a literal, variable
reference, or (in an argument position) a type. A heap value v is Syntactic sugar
a heap-allocated constructor application or lambda term. Neither Shorthand Expansion
atoms nor heap values require evaluation. The third class of terms Value binders τ , :τ
is much more interesting: a multi-value term (e) is a term that Thunk types {τ1 , . . . , τn } , hi → hτ1 , . . . , τn i
either diverges, or evaluates to several (zero, one, or more) values Thunk terms {e} , λ hi . e
simultaneously.
Figure 2: Syntax of Strict CoreANF
3 Recall Figure 1, which abbreviates a singleton sequence hInti to Int

3 2009/5/14
Γ `κ τ : κ Γ `a a : τ

T:κ ∈ Γ B(T) = κ x:τ ∈ Γ L(`) = τ


T Y C ON DATA T Y C ON P RIM VAR L IT
Γ `κ T : κ Γ `κ T : κ Γ `a x : τ Γ `a ` : τ

α:κ ∈ Γ Γ ` b : Γ0 ∀i.Γ0 `κ τi : ? Γ`e : τ


T Y VAR T Y F UN
Γ `κ α : κ κ
Γ` b→τ : ?
∀i.Γ `a ai : τi
M ULTI
Γ `κ τ : κ1 → κ2 Γ `κ υ : κ1 Γ`a : τ
T Y C ONA PP
Γ ` κ τ υ : κ2
Γ ` e1 : τ Γ, x : τ ` e2 : σ
L ET
Figure 3: Kinding rules for Strict CoreANF Γ ` let x : τ = e1 in e2 : σ

∀j.Γ, x : τ `v vj : τj Γ, x : τ ` e2 : σ
VAL R EC
3.2 Static semantics of Strict CoreANF Γ ` valrec x : τ = v in e2 : σ
The static semantics of Strict CoreANF is given in Figure 3, Figure 4
and Figure 5. Despite its ineluctable volume, it should present few Γ `a a : b → τ Γ `app b → τ @ g : υ
A PP
surprises. The term judgement Γ ` e : τ types a multi-valued term Γ`ag : υ
e, giving it a multi-type τ . There are similar judgements for atoms
a, and values v, except that they possess types (not multi-types). An Γ `a a : τscrut ∀i.Γ `alt pi → ei : τscrut ⇒ τ
important invariant of Strict CoreANF is this: variables and values C ASE
Γ ` case a of p → e : τ
have types τ , not multi-types τ . In particular, the environment Γ
maps each variable to a type τ (not a multi-type). Γ `v v : τ
The only other unusual feature is the tiresome auxiliary judge-
ment Γ `app b → τ @ g : υ, shown in Figure 5, which computes
Γ ` b : Γ0 Γ0 ` e : τ
the result type υ that results from applying a function of type b → τ L AM
to arguments g. Γ `v λb.e : b → τ
The last two pieces of notation used in the type rules are for
introducing primitives and are as follows: C : b → hT αi ∈ Γ
L Maps literals to their built-in types Γ `app b → hT αi @ τ , a : hυi
B Maps built-in type constructors to their kinds – the do- DATA
Γ `v C τ , a : υ
main must contain at least all of the type constructors
returned by L
Γ `alt p → e : τscrut ⇒ τ
3.3 Operational semantics of Strict CoreANF
Γ`e : τ
Strict CoreANF is designed to have a direct operational interpreta- D EFA LT
Γ `alt → e : τscrut ⇒ τ
tion, which is manifested in its small-step operational semantics,
given in Figure 7. Each small step moves from one configuration
L(`) = τscrut Γ ` e : τ
to another. A configuration is given by hH; e; Σi, where H repre- L ITA LT
sents the heap, e is the term under evaluation, and Σ represents the Γ `alt ` → e : τscrut ⇒ τ
stack – the syntax of stacks and heaps is given in Figure 6.
We denote the fact that a heap H contains a mapping from x to Γ, x : τ `v C σ, x : hT σi
a heap value v by H[x 7→ v]. This stands in contrast to a pattern Γ, x : τ ` e : τ
such as H, x 7→ v, where we intend that H does not include the C ONA LT
Γ `alt C x : τ → e : T σ ⇒ τ
mapping for x
The syntax of Strict Core is carefully designed so that there is a Γ`d : Γ
1–1 correspondence between syntactic forms and operational rules:
• Rule EVAL begins evaluation of a multi-valued term e1 , pushing Γ0 = Γ, T : κ1 → . . . → κm → ?
onto the stack the frame let x : τ = • in e2 . Although it is a pure ∀i.Γi−1 ` ci : T α : κm in Γi
language, Strict CoreANF uses call-by-value and hence evaluates DATA D ECL
Γ ` data T α : κm = c1 | . . . | cn : Γn
e1 before e2 . If you want to delay evaluation of e1 , use a thunk
(Section 3.4).
Γ ` c : T α : κ in Γ
• Dually, rule RET returns a multiple value to the let frame, bind-
ing the x to the (atomic) returned values a. In this latter rule, the ∀i.Γ `κ τi : ?
simultaneous substitution models the idea that e1 returns mul- DATAC ON
Γ ` C τ : T α : κ in (Γ, C : α : κ, τ → hT αi)
tiple values in registers to its caller. The static semantics (Sec-
tion 3.2) guarantees that the number of returned values exactly
matches the number of binders. ` d, e : τ
• Rule ALLOC performs heap allocation, by allocating one or
Γ0 =  ∀i.Γi−1 ` di : Γi Γn ` e : τ
more heap values, each of which may point to the others. We n P ROGRAM
model the heap address of each value by a fresh variable y that ` d ,e : τ

Figure 4: Typing rules for Strict CoreANF


4 2009/5/14
EVAL hH; let x : τ = e1 in e2 ; Σi hH; e1 ; let x : τ = • in e2 . Σi
D E
RET hH; a; let x : τ = • in e2 . Σi H; e2 [a/x]; Σ
D E
ALLOC hH; valrec x : τ = v in e; Σi H, y 7→ v[y/x]; e[y/x]; Σ y 6∈ dom(H)
D n
E D n
E
BETA H[x 7→ λb . e]; x an ; Σ H; e[a/b ]; Σ (n > 0)
ENTER hH, x 7→ λ hi . e; x hi ; Σi hH, x 7→ ; e; update x. Σi
UPDATE hH; a; update x. Σi hH[x 7→ IND a]; a; Σi
IND hH[x 7→ IND a]; x hi ; Σi hH; a; Σi
˙ ¸
CASE - LIT H; case ` of . . . , ` → e, . . .; Σ hH; e; Σi
D n
E D n
E
CASE - CON H[x 7→ C τ , an ]; case x of . . . , C b → e, . . .; Σ H; e[a/b ]; Σ
CASE - DEF hH; case a of . . . , → e, . . .; Σi hH; e; Σi If no other match

Figure 7: Operational semantics of Strict CoreANF

Heap values h ::= λb. e Abstraction


Γ`b : Γ
| C τ, a Constructor
| IND a Indirection
Γ, α : κ ` b : Γ0 | Black hole
B NDRS E MPTY B NDRS T Y
Γ ` hi : Γ Γ ` α : κ, b : Γ0
Heaps H ::=  | H, x 7→ h
κ 0
Γ` τ : ? Γ, x : τ ` b : Γ
B NDRS VAL Stacks Σ ::= 
Γ ` x : τ, b : Γ0 | update x. Σ
| let x : τ = • in e. Σ
Γ `app b → τ @ g : υ
Figure 6: Syntax for operational semantics of Strict CoreANF
A PP E MPTY
Γ `app hi → τ @ hi : τ
If we only cared about call-by-name, we could model a thunk
Γ `a a : σ Γ `app b → τ @ g : υ as a nullary function (a function binding 0 arguments) with type
A PP VAL
Γ `app ( : σ, b) → τ @ a, g : υ hi → Int. Then we could thunk a term e by wrapping it in a
nullary lambda λ hi . e, and force a thunk by applying it to hi. This
Γ `κ σ : κ call-by-name approach would unacceptably lose sharing, but we
` ´
Γ `app b → τ [σ/α] @ g : υ
A PP T Y can readily turn it into call-by-need by treating nullary functions
Γ `app (α : κ, b) → τ @ σ, g : υ (henceforth called thunks) specially in the operational semantics
(Figure 7), which is what we do:
Figure 5: Typing rules dealing with multiple abstraction and appli-
cation • In rule ENTER, an application of a thunk to hi pushes onto
the stack a thunk update frame mentioning the thunk name. It
also overwrites the thunk in the heap with a black hole ( ), to
is not already used in the heap, and freshen both the v and e to express the fact that entering a thunk twice with no intervening
reflect this renaming. update is always an error [3]. We call all this entering, or
• Rule BETA performs β-reduction, by simultaneously substitut-
forcing, a thunk.
ing for all the binders in one step. This simultaneous substitu- • When the machine evaluates to a result (a vector of atoms a),
tion models the idea of calling a function passing several ar- UPDATE overwrites the black hole with an indirection IND a,
guments in registers. The static semantics guarantees that the pops the update frame, and continues as if it had never been
number of arguments at the call site exactly matches what the there.
function is expecting. • Finally, the IND rule ensures that, should the original thunk be
Rules CASE - LIT, CASE - CON, and CASE - DEF deal with pattern entered to again, the value saved in the indirection is returned
matching (see Section 3.5); while ENTER, UPDATE, and IND deal directly (remember – the indirection overwrote the pointer to
with thunks (Section 3.4) the thunk definition that was in the heap), so that the body of
the thunk is evaluated at most once.
3.4 Thunks We use thunking to describe the process of wrapping a term e in
Because Strict CoreANF is a call-by-value language, if we need to a nullary function λ hi . e. Because thunking is so common, we
delay evaluation of an expression we must explicitly thunk it in use syntactic sugar for the thunking operation on both types and
the program text, and correspondingly force it when we want to expressions – if something is enclosed in {braces} then it is a
actually access the value. thunk. See Figure 2 for details.

5 2009/5/14
An unusual feature is that Strict CoreANF supports multi-valued worry that it might increase code size. The same is not true in
thunks, with a type such as hi → hInt, Bool i, or (using our syntac- a compiler using ANF, because the ability to do β-reduction
tic sugar) {Int, Bool }. Multi-thunks arose naturally from treating without code bloat depends on your application site being the
thunks as a special kind of function, but this additional expressive- sole user of the function – a distinctly non-local property!
ness turns out to allow us to do at least one new optimisation: deep • Non-ANFed terms are often much more concise, and tend to be
unboxing (Section 5.6). more understandable to the human reader.
Arguably, we should not conflate the notions of functions and
thunks, especially since we have special cases in our operational In the remainder of the paper we will adopt a non-ANFed
semantics for nullary functions. However, the similarity of thunks variant of Strict CoreANF which we simply call Strict Core, by
and nullary functions does mean that some parts of the compiler making use of the following simple extension to the grammar and
can be cleaner if we adopt this conflation. For example, if the type rules:
compiler detects that all of the arguments to a function of type
hInt, Bool i → Int are absent (not used in the body) then the
Γ ` e : hτ i Γ `v v : τ
function can be safely transformed to one of type hi → Int, S ING VAL
but not one of type Int – as that would imply that the body is a ::= . . . | e | v Γ `a e : τ Γ `a v : τ
always evaluated immediately. Because we conflate thunks and The semantics of the new form of atom are given by a stan-
nullary functions, this restriction just falls out naturally as part of dard ANFing transformation into Strict CoreANF . Note that there
the normal code for discarding absent arguments rather than being are actually several different choices of ANF transformation, cor-
a special case (as it is in GHC today). responding to a choice about whether to evaluate arguments or
functions first, and whether arguments are evaluated right-to-left
3.5 Data types or vice-versa. The specific choice made is not relevant to the se-
We treat Int and Char as built-in types, with a suitable family of mantics of a pure language like Strict Core.
(call-by-value) operations. A value of type Char is an evaluated
character, not a thunk (ie. like ML, not like Haskell), and similarly 3.7 Types are calling conventions
Int. To allow a polymorphic function to manipulate values of these Consider again the example with which we began this paper. Here
built-in types, they must be boxed (ie. represented by a heap pointer are several different Strict Core types that express different calling
like every other value). A real implementation, however, might conventions:
have additional unboxed (not heap allocated) types, Char#, Int#,
which do not support polymorphism [4], but we ignore these issues f1 : Int → Bool → (Int, Bool )
here. f2 : hInt, Bool i → (Int, Bool )
All other data types are built by declaring a new algebraic f3 : (Int, Bool ) → hInt, Bool i
data type, using a delcaration d, each of which has a number of f4 : h{Int}, Bool i → (Int, Bool )
constructors (c). For example, we represent the (lazy) list data type Here f1 is a curried function, taking its arguments one at a time; f2
with a top-level definition like so: takes two arguments at once, but returns a heap-allocated pair; f3
data List a : ∗ = Nil | Cons h{a}, {List a}i takes a heap-allocated pair and returns two results (presumably in
registers); while f4 takes two arguments at once, but the first is a
Applications of data constructors cause heap allocation, and hence thunk. In this way, Strict CoreANF directly expresses the answers to
(as we noted in Section 3.3), values drawn from these types can the questions posed in the Introduction.
only be allocated by a valrec expression. By expressing all of these operational properties explicitly in
The operational semantics of case expressions are given in our intermediate language we expose them to the wrath of the
rules CASE - LIT, CASE - CON, and CASE - DEF, which are quite con- optimiser. Section 5 will show how we can use this new information
ventional (Figure 7). Notice that, unlike Haskell, case does not about calling convention to cleanly solve the problems considered
perform evaluation – that is done by let in EVAL. The only subtlety in the introduction.
(present in all such calculi) is in rule CASE - CON: the a constructor
C must be applied to both its type and value arguments, whereas 3.8 Type erasure
a pattern match for C binds only its value arguments. For the sake
Although we do not explore it further in this paper, Strict CoreANF
of simplicity we restrict ourselves to vanilla Haskell 98 data types,
has a simple type-erased counterpart, where type binders in λs,
but there is no difficulty with extending Strict Core to include exis-
type arguments and heaps values have been dropped. A natural
tentials, GADTs, and equality constraints [5].
consequence of this erasure is that functions such as ha : ∗i →
3.6 A-normal form and syntactic sugar hInti will be converted into thunks (like hi → hInti), so their
results will be shared.
The language as presented is in so-called A-normal form (ANF),
where intermediate results must all be bound to a name before
they can be used in any other context. This leads to a very clear 4. Translating laziness
operational semantics, but there are at least two good reasons to We have defined a useful-looking target language, but we haven
avoid the use of ANF in practice: not yet shown how we can produce terms it in from those of a
more traditional lazy language. In this section, we present a simple
• In the implementation of a compiler, avoiding the use of ANF
source language that captures the essential features of Haskell, and
allows a syntactic encoding of the fact that an expression occurs show how we can translate it into Strict Core.
exactly once in a program. For example, consider the following Figure 8 presents a simple, lazy, explicitly-typed source lan-
program: guage, a kind of featherweight Haskell, or FH. It is designed to be
(λhα : ∗, x : αi. x ) hInt, 1i a suitable target language for the desugaring of programs written in
Haskell, and is deliberately similar to GHCs current intermediate
The compiler may manifestly see, using purely local informa- language (which we call Core). Due to space constraints, we omit
tion, that it can perform β-reduction on this term, without the the type rules and dynamic semantics for this language – suffice to

6 2009/5/14
Variables x, y, z [[τ : κ]] : κ

Type Variables α, β [[T]] = T


[[α]] = α
Kinds [[τ1 → τ2 ]] = {[[τ1 ]]} → [[τ2 ]]
κ ::= ? Kind of constructed types [[∀α : κ.τ ]] = α : κ → [[τ ]]
| κ→κ Kind of type constructors [[τ1 τ2 ]] = [[τ1 ]] [[τ2 ]]
Types Figure 9: Translation from FH to Strict Core types
τ, υ, σ ::= T Type constructors
| α Type variables
| τ → τ Function types
| ∀α : κ.τ Quantification [[e : τ ]] : h[[τ ]]i
| ττ Type application
[[`]] = `
Expressions [[C]] = Cwrap
e ::= ` Unlifted literals [[x]] = x hi
| C Built-in data constructors [[e τ ]] = [[e]] [[τ ]]
| x Variables [[Λα : κa . e]] = λα : κ. [[e]]
| ee Value application [[e1 e2 ]] = [[e1 ]] {[[e2 ]]}
| eτ Type application [[λx : τ. e]] = λx : {[[τ ]]} . [[e]]
| λx : τ. e Functions binding values
[[let x : τ = e in eb ]] = valrec x : {[[τ ]]} = {[[e]]} in [[eb ]]
| Λα : κ. e Functions binding types
| let x : τ = e in e Recursive name binding [[case es of p → e]] = case [[es ]] of [[p]] → [[e]]
| case e of p → e Evaluation and branching
[[p]]
Patterns
p ::= Default case / ignores eval. result [[`]] = `
| ` Matches exact literal value [[C x : τ ]] = C x : {[[τ ]]}
| C x:τ Matches data constructor [[ ]] =
Data Types Figure 10: Translation from FH to Strict Core expressions
d ::= data T α : κ = c | . . . | c Data declarations
c ::= C τ Data constructors
CoreANF is highly verbose. For example, the translation for appli-
Programs d, e cations into Strict CoreANF would look like this:

Figure 8: The FH language [[e1 e2 ]] let hf i = [[e1 ]] in


=
valrec x = λ hi . [[e2 ]] in
f hxi
say that they are perfectly standard for a typed lambda calculus like The job of the term translation is to add explicit thunks to the
System Fω [6]. Strict Core output wherever we had implicit laziness in the FH
input program. To this end, we add thunks around the result of the
4.1 Type translation translation in “lazy” positions – namely, arguments to applications
The translation from FH to Strict Core types is given by Figure 9. and in the right hand side of let bindings. Dually, when we need
The principal interesting feature of the translation is the way it deals to access a variable, it must have been the case that the binding
with function types. Function arguments are thunked, reflecting the site for the variable caused it to be thunked, and hence we need to
call-by-need semantics of application in FH, but result types are explicitly force variable accesses by applying them to hi.
left unthunked. This means that after being fully applied, functions Bearing all this in mind, here is the translation for a simple
eagerly evaluate to get their result. If a use-site of that function application of a polymorphic identity function to 1:
wants to delay the evaluation of the application it must explicitly [[(Λα : ?. λx : α. x ) Int 1]] = (λα : ?. λx : {α}. x hi) Int {1}
create a thunk.
Furthermore, both ∀ and function types translate to 1-ary func- 4.3 Data type translation
tions returning a 1-ary result in Strict Core. In any translation from FH to Strict Core we must account for
(a) the translation of data type declarations themselves, (b) the
4.2 Term translation translation of constructor applications, and (c) the translation of
The translation from FH terms to those in Strict Core becomes pattern matching. We begin with (a), using the following FH data
almost inevitable given our choice for the type translation, and is type declaration for lists:
given by Figure 10. It satisfies the invariant: data List α : ∗ = Nil | Cons α (List α)
x : τ `FH e : υ =⇒ x : {[[τ ]]} ` [[e]] : h[[υ]]i The translation D, shown in Figure 11 yields this Strict Core dec-
laration:
The translation makes extensive use of our syntactic sugar and
ability to write non-ANFed terms, because the translation to Strict data List α : ∗ = Nil | Cons h{α}, {List α}i

7 2009/5/14
5.1 Routine optimisations
D [[d]] Strict Core has a number of equational laws that have applications
to program optimisation. We present a few of them in Figure 12.
D [[data T α : κ = C1 τ 1 | . . . | Cn τ n ]] The examples we present in this section will usually already
= data T α : κ = C1 {[[τ ]]}1 | . . . | Cn {[[τ ]]}n have had these equational laws applied to them, if the rewrite
represents an improvement in their efficiency or readability. For
an example of how they can improve programs, notice that in the
W [[d]]
translation we give from FH, variable access in a lazy context (such
as the argument of an application) results in a redundant thunking
W8[[data T α : κr = C1 τ m 1
1
| . . . | Cn τ m n
n ]] and forcing operation. We can remove that by applying the η law:
> ...
< Cwrap = λα1 : κ1 . . . λαr : κr .
>
[[f ]] hλ hi . [[y]]i
>
k [[f y]] =
= λx1 : {[[τ1,k ]]} . . . λxmk : {[[τmk ,k ]]} . = f hi hλ hi . y hii
>
>
>
: Ck (αr , xmk ) = f hi hyi
...
ˆ
d, e
˜ 5.2 Expressing the calling convention for strict arguments
ˆ ˜ Let’s go back to the first example of a strict function from Section 1:
d, e = D [[d]], valrec W [[d]] in [[e]]
f :: Bool → Int
Figure 11: Translation from FH to Strict Core programs f x = case x of True → . . . ; False → . . .
We claimed that we could not, while generating the code for f ,
The arguments are thunked, as you would expect, but the construc- assume that the x argument was already evaluated, because that is a
tor is given an uncurried type of (value) arity 2. So the types of the fragile property that would be tricky to guarantee for all call-sites.
data constructor Cons before and after translation are: In Strict Core, the evaluated/non-evaluated distinction is apparent
in the type system, so the property becomes robust. Specficically,
FH Cons : ∀α.α → List α → List α we can use the standard worker/wrapper transformation [7, 8] to f
Strict Core Cons : hα : ?, {α} , {List α}i → hList αi as follows:
We give Strict Core data constructors an uncurried type to reflect
fwork : Bool → Int
their status as expressing the built-in notions of allocation and
pattern matching (Figure 7). However, since the type of Strict-Core fwork = λx : Bool . case x of True hi → . . . ; False hi → . . .
Cons is not simply the translation of the type of the FH Cons, we f : {Bool } → Int
define a top-level wrapper function Cons wrap which does have the f = λx : {Bool }. fwork hx hii
right type:
Here the worker fwork takes a definitely-evaluated argument of type
Cons wrap = λα : ∗. λx : {α}. λxs : {List α}. Cons hα, x , xsi Bool , while the wrapper f takes a lazy argument and forces it
Now, as Figure 10 shows, we translate a call of a data constructor before calling f . By inlining the f wrapper selectively, we will often
C to a call of Cwrap . (As an optimisation, we refrain from thunking be able to avoid the forcing operation altogether, by cancelling it
the definition of the wrapper and forcing its uses, which accounts with explicit thunk creation. Because every lifted (i.e. lazy) type
for the different treatment of C and x in Figure 10.) We expect that in Strict Core has an unlifted (i.e. strict) equivalent, we are able
the wrappers will be inlined into the program by an optimisation to express all of the strictness information resulting from strictness
pass, exposing the more efficient calling convention at the original analysis by a program transformation in this style. This is unlike
data constructor use site. the situation in GHC today, where we can only do this for product
The final part of the story is the translation of pattern match- types; in particular, strict arguments with sum types such as Bool
ing. This is also given in Figure 10 and is fairly straightforward have their strictness information applied in a much more ad-hoc
once you remember that the types of the bound variables must be manner.
thunked to reflect the change to the type of the data constructor We suggested in Section 2 that this notion could be used to
functions. improve the desugaring of dictionary arguments. At this point,
Finally, the translation for programs, also given in Figure 11, the approach should be clear: during desugaring of Haskell into
ties everything together by using both the data types and expression Strict Core, dictionary arguments should not be wrapped in explicit
translations. thunks, ever. This entirely avoids the overhead of evaluatedness
checking for such arguments.
4.4 The seq function
5.3 Exploiting the multiple-result calling convention
A nice feature of Strict CoreANF is that it is possible to give a
straightforward definition of the primitive seq function of Haskell: Our function types have first-class support for multiple arguments
and results, so we can express the optimisation enabled by a con-
seq : {α : ∗ → β : ∗ → {α} → {β} → β} structed product result (CPR) analysis [9] directly. For example,
= {λα : ∗. λβ : ∗. λx : {α}. λy : {β}. let : α = x hi in y hi} translating splitList from Section 2.4 into Strict Core yields the
following program:
5. Putting Strict Core to work splitList = {λxs : {List Int}. case xs hi of
In this section we concentrate on how the features of Strict Core can Cons hy : {Int}, ys : {List Int}i → (, ) hInt, List Int, y, ysi}
be of aid to an optimising compiler that uses it as an intermediate Here we assume that we have translated the FH pair type in the
language. These optimisations all exploit the additional operational standard way to the following Strict Core definition:
information available from the types-as-calling-conventions corre-
spondence in order to improve the efficiency of generated code. data (, ) α : ∗ β : ∗ = (, ) h{α}, {β}i

8 2009/5/14
n n
β valrec x : τ = λb . e in x an = e[a/b ]
n n
η valrec x : τ = λb . y b in e = let hx : τ i = hyi in e
let let x : τ = a in e = e[a/x]
let-float let x : τ 1 = (let y : σ 2 = e1 in e2 ) in e3 = let y : σ 2 = e1 in let x : τ 1 = e2 in e3
valrec-float let x : τ = (valrec y : σ = e in e2 ) in e3 = valrec y : σ = e in let x : τ = e2 in e3
valrec-join valrec x : τ = e in valrec y : σ = e in e = valrec x : τ = e, y : σ = e in e
n n
case-constructor-elim valrec x : τ = C τ , an in case x of . . . C b → e . . . = valrec x : τ = C τ , an in e[a/b ]
case-literal-elim case ` of . . . ` → e . . . = e

Figure 12: Sample equational laws for Strict CoreANF

After a worker/wrapper transformation informed by CPR analysis The translation of this program into Strict Core will introduce a
we obtain a version of the function that uses multiple results, like wholly unnecessary thunk around xs, thus
so:
valrec xs : {List Int} = {Cons hInt, y, ysi}
splitList work = λxs : {List Int}. case xs hi of
It is obviously stupid to build a thunk for something that is already
Cons hy : {Int}, ys : {List Int}i → hy, ysi a value, so we would prefer to see
splitList = {λxs : {List Int}.
let hy : {Int}, ys : {List Int}i = splitList work xs valrec xs : List Int = Cons hInt, y, ysi
in (, ) hInt, List Int, y, ysi} but now references to xs in the body of the valrec will be badly-
typed! As usual, we can solve the impedence mis-match by adding
Once again, inlining the wrapper splitList at its call sites can often
an auxiliary definition:
avoid the heap allocation of the pair ((, )).
Notice that the worker is a multi-valued function that returns valrec xs 0 : List Int = Cons hInt, y, ysi in
two results. GHC as it stands today has a notion of an “unboxed valrec xs : {List Int} = {xs 0 }
tuple” type supports multiple return values, but this extension has
never fitted neatly into the type system of the intermediate lan- Indeed, if you think of what this transformation would look like
guage. Strict Core gives a much more principled treatment of the in Strict CoreANF , it amounts to floating a valrec (for xs 0 ) out of
same concept. a thunk, a transformation that is widely useful [10]. Now, several
optimisations suggest themselves:
5.4 Redundant evaluation • We can inline xs freely at sites where it is forced, thus (xs hi),
Consider this program: which then simplifies to just xs 0 .
data Colour = R | G | B • Operationally, the thunk λ hi . xs0 behaves just like IND xs 0 ,
except that the former requires an update (Figure 7). So it would
f x = case x of be natural for the code generator to allocate an IND directly for
R → ... a nullary lambda that returns immediately.
→ . . . (case x of G → . . . ; B → . . .) . . .
• GHC’s existing runtime representation goes even further: since
In the innermost case expression, we can be certain that x has al- every heap object needs a header word to guide the garbage
ready been evaluated – and we might like to use this information collector, it costs nothing to allow an evaluated Int to be enter-
to generate better code for that inner case split, by omitting evalu- able. In effect, a heap object of type Int can also be used to
atedness checks. However, notice that it translates into Strict Core represent a value of type {Int}, an idea we call auto-lifting.
like so: That in turn means that the binding for xs generates literally no
f = {λx . case x hi of code at all – we simpy use xs 0 where xs is mentioned.
R hi → . . . One complication is that thunks cannot be auto-lifted. Consider this
→ . . . (case x hi of G hi → . . . program:
B hi → . . .) . . .}
valrec f : {Int} = {⊥} in
It is clear that to avoid redundant evaluation of x we can simply valrec g : {{Int}} = {f } in
apply common-subexpression elimination (CSE) to the program: g hi
f = {λx . let x 0 = x hi in Clearly, the program should terminate. However if we adopt-auto
case x 0 of R hi → . . . lifting for thunks then at runtime g and f will alias and hence we
→ . . . (case x 0 of G hi → . . . will cause the evaluation of ⊥! So we must restrict auto-lifting to
B hi → . . .) . . .} thunks of non-polymorphic, non-thunk types. (Another alternative
would be to restrict the kind system so that thunks of thunks and
This stands in contrast to GHC today, where an ad-hoc mechanism
instantiation of type variables with thunk types is disallowed, which
tries to discover opportunities for exactly this optimisation.
might be an acceptable tradeoff.)
5.5 Thunk elimination
5.6 Deep unboxing
There are some situations where delaying evaluation by inserting
Another interesting possibility for optimisation in Strict Core is
a thunk just does not seem worth the effort. For example, consider
the exploitation of “deep” strictness information by using n-ary
this FH source program:
thunks to remove some heap allocated values (a process known as
let xs : List Int = Cons Int y ys unboxing). What we mean by this is best understood by example:

9 2009/5/14
valrec f : {({Int}, {Int})} → Int valrec fwork : hInt, Inti → Int = λhx : Int, y : Inti. e
= λhpt : {({Int}, {Int})}i. f : Int → Int → Int = λx : Int. λy : Int. fwork hx , yi
valrec c : Bool = . . . in in f 1 2
case c of True hi → 1 At this point, no improvement has yet occurred – indeed, we will
False hi → case pt hi of (x , y) → have made the program worse by adding a layer of indirection via
(+) hx hi, y hii the wrapper! However, once the wrapper is vigourously inlined at
Typical strictness analyses will not be able to say definitively the call sites by the compiler, it will often be the case that the
that f is strict in pt (even if c is manifestly False!). However, wrapper will cancel with work done at the call site, leading to a
some strictness analysers might be able to tell us that if pt is considerable efficiency improvement:
ever evaluated then both of its components certainly are. Taking valrec fwork : hInt, Inti → Int = λhx : Int, y : Inti. e
advantage of this information in a language without explicit thunks in fwork h1, 2i
would be fiddly at best, but in our intermediate language we can
use the worker/wrapper transformation to potentially remove some This is doubly true in the case of recursive functions, because by
thunks by adjusting the definition of f like so: performing the worker/wrapper split and then inlining the wrapper
into the recursive call position, we remove the need to heap-allocate
valrec fwork : {Int, Int} → Int = a number of intermediate function closures representing partial
λpt 0 : {Int, Int}. applications in a loop.
valrec c : Bool = . . . in Although this transformation can be a big win, we have to be a
case c of True hi → 1 bit careful about where we apply it. The ability to apply arguments
False hi → let hx 0 : Int, y 0 : Inti = pt 0 hi one at a time to a curried function really makes a difference to
in (+) hx 0 , y 0 i, efficiency sometimes, because call-by-need (as opposed to call-
f : {({Int}, {Int})} → Int = by-name) semantics allows work to be shared between several
λ(pt : {({Int}, {Int})}) → invocations of the same partial application. To see how this works,
valrec pt 0 : {Int, Int} = consider this Strict Core program fragment:
{ case pt hi of (x , y) → hx hi, y hii} valrec g : Int → Int → Int
in fwork pt 0 = (λx : Int. let s = fibonacci x inλy : Int. . . .) in
let h : Int → Int = g 5
Once again, inlining the new wrapper function at the use sites
in h 10 + h 20
has the potential to cancel with pair and thunk allocation by the
callers, avoiding heap allocation and indirection. Because we share the partial application of g (by naming it h),
Note that the ability to express this translation actually de- we will only compute the application fibonacci 5 once. However,
pended on the ability of our new intermediate language to express if we were to “improve” the arity of g by turning it into a function
multi-thunks (Section 3.4) – i.e. thunks that when forced, evaluate of type hInt, Inti → Int, then it would simply be impossible to
to multiple results, without necessarily allocating anything on the express the desired sharing! Loss of sharing can easily outweigh
heap. the benefits of a more efficient calling convention.
Identifying some common cases where no significant sharing
would be lost by increasing the arity is not hard, however. In
6. Arity raising particular, unlike g, it is safe to increase the arity of f to 2, because
Finally, we move on to two optimisations that are designed to f does no work (except allocate function closures) when applied to
improve function arity – one that improves arity at a function by fewer than 2 arguments. Another interesting case where we might
examining how the function is defined, and one that realises an consider raising the arity is where the potentially-shared work done
improvement by considering how it is used. These optimisations by a partial application is, in some sense, cheap – for example, if
are critical to ameliorating the argument-at-a-time worst case for the sharable expressions between the λs just consist of a bounded
applications that occurs in the output of the naive translation from number of primitive operations. We do not attempt to present a
FH. GHC does some of these arity-related optimisations in an suitable arity analysis in this paper; our point is only that Strict
ad-hoc way already; the contribution here is to make them more Core gives a sufficiently expressive medium to express its results.
systematic and robust.
6.2 Use-site arity raising
6.1 Definition-site arity raising This is, however, not the end of the story as far as arity raising
is concerned. If we can see all the call-sites for a function, and
Consider the following Strict Core binding:
none of the call sites share partial applications of less than than
valrec f : Int → Int → Int = λx : Int. λy : Int. e in f 1 2 n arguments, then it is perfectly safe to increase the arity of that
function to n, regardless of whether or not the function does work
This code is a perfect target for one of the optimisations that Strict that is worth sharing if you apply fewer than n arguments. For
Core lets us express cleanly: definition-site arity raising. Observe example, consider function g from the previous sub-section, and
that currently callers of f are forced to apply it to its arguments one suppose the the body of its valrec was . . . (g p q) . . . (g r s) . . .;
at a time. Why couldn’t we change the function so that it takes both that is, every call to g has two arguments. Then no sharing is
of its arguments at the same time? lost by performing arity raising on its definition, but considerable
We can realise the arity improvement for f by using, once again, efficiency is gained.
a worker/wrapper transformation. The wrapper, which we give this This transformation not only applies to valrec bound functions,
the original function name, f , simply does the arity adaptation but also to uses of higher-order functional arguments. After trans-
before calling into a worker. The worker, which we call fwork , is lation of the zipWith function from Section 2.2 into Strict Core,
then responsible for the rest of the calculation of the function4 : followed by discovery of its strictness and definition-site arity prop-
erties, the worker portion of the function that remains might look
4 Since e may mention f , the two definitions may be mutually recursive. like the following:

10 2009/5/14
valrec zipWith : ha : ∗, b : ∗, c : ∗, {{a} → {b} → c}, 7. Related work
List a, List bi → List c Benton et al’s Monadic Intermediate Language (MIL) [11] is simi-
= λha : ∗, b : ∗, c : ∗, f : {{a} → {b} → c}, lar to our proposed intermediate language. The MIL included both
xs : List a, ys : List bi. n-ary lambdas and multiple returns from a function, but lacked a
case xs of Nil hi → Nil c treatment of thunks due to aiming to compile a strict language. MIL
Cons hx : {a}, xs 0 : {List a}i → also included a sophisticated type system that annotated the return
case ys of Nil hi → Nil c type of functions with potential computational effects, including di-
Cons hy : {b}, ys 0 : {List b}i → vergence. This information could be used to ensure the soundness
Cons hc, f hi x y, zipWith ha, b, c, f , xs 0 hi, ys 0 hiii. of arity-changing transformations – i.e. uncurrying is only sound if
a partial application has no computational effects.
Notice that f is only ever applied in the body to three arguments at Both MIL and the Bigloo Scheme compiler [12] (which could
a time – hi, x , and y (or rather hx i and hyi). Based on this observa- express n-ary lambdas), included versions of what we have called
tion, we could re-factor zipWith so that it applied its function argu- arity definition-site analysis. However, the MIL paper does not
ment to all these arguments (namely hx , yi) at once. The resulting seem to consider the work-duplication issues involved in the arity
wrapper would look like this (omitting a few types for clarity): raising transformation, and the Bigloo analysis was fairly simple
minded – it only coalesced manifestly adjacent lambdas, without
valrec zipWith : ha : ∗, b : ∗, c : ∗, {{a} → {b} → c}, allowing (for example) potentially shareable work to be duplicated
List a, List bi → List c as long as it was cheap. We think that both of these issues deserve a
= λha : ∗, b : ∗, c : ∗, f , xs, ysi. more thorough investigation. A simple arity definition-site analysis
0
valrec f : h{a}, {b}i → c = λhx , yi. f hi x y is used by SML/NJ [13], though the introduction of n-ary argu-
in zipWith work ha, b, c, f 0 , xs, ysi ments is done by a separate argument flattening pass later on in the
compiler rather than being made immediately manifest.
To see how this can lead to code improvement, consider a call In MIL, function application used purely static arity informa-
zipWith hInt, Int, Int, g, xs, ysi, where g is the function from tion. Bigloo used a hybrid static/dynamic arity dispatch scheme,
Section 6.1. Then, after inlining the wrapper of zipWith we can but unfortunately do not appear to report on the cost (or otherwise)
see locally that g is applied to all its arguments can can therefore be of operating purely using static arity information.
arity-raised. Now, the wrapper of g will cancel with definition of f 0 , The intermediate language discussed here is in some ways an
leaving the call we really want: zipWith work hInt, Int, Int, gwork , xs, ysi. extension an extension of the L2 language [14] which also explored
the possibility of an optimising compiler suitable for both strict
6.3 Reflections on arity-raising and lazy languages. We share with L2 an explicit representation
of thunking and forcing operations, but take this further by addi-
Although the use-site analysis might, at first blush, seem to be
tionally representing the operational notions of unboxing (through
more powerful than the definition-site one, it is actually the case
multiple function results) and arity. The L2 language shares with
that the two arity raising transformations are each able to improve
the MIL the fact that it makes an attempt to support impure strict
the arities of some functions where the other cannot. In particular,
languages, which we do not – though impure operations could po-
for a compiler that works module-by-module like GHC, the use-
tentially be desugared into our intermediate language using a state-
site analysis will never be able to improve the arity of a top-level
token or continuation passing style to serialize execution.
function as some of the call sites are unknown statically.
GRIN [15] is another language that used an explicit represen-
The key benefits of the new intermediate language with regard
tation of thunks and boxing properties. Furthermore, GRIN uses a
to the arity raising transformation are as follows:
first order program representation where the structure of closures is
explicit – in particular, this means that unboxing of closures is ex-
• Arity in the intermediate language is more stable. It is almost pressible. Note that the “arity raising” and “generalized unboxing”
impossible for a compiler transformation to accidentally reduce transformations described by Boquist are not the same thing as the
the arity of a function without causing a type error, whereas arity analyses and “deep unboxing” transformation we describe.
accidental reduction of arity is a possibility we must actively The IL language [16] represents thunks explicitly by way of
concern ourselves with avoiding in the GHC of today. continuations with a logical interpretation, and is to our knowledge
• Expressing arity in the type system allows optimisations to be the first time that auto-lifting is discussed in the literature. It seems
applied to the arity of higher-order arguments, as we saw in likely that some way could be found to adapt the logic based
Section 6.2. approach of this paper to accommodate a treatment of arity and
multiple-value expressions, as long as some way is adopted to
• By expressing arity statically in the type information, it is possi-
distinguish between “boxed” and “unboxed” uses of the ∧ tuple
ble that we could replace GHC’s current dynamic arity discov- type formation rule.
ery [2] with purely static arity dispatch. This requires that arity Hannan and Hicks have previously introduced the arity use-site
raising transformations like these two can remove enough of the optimization under the name “higher-order uncurrying” [17] as a
argument-at-a-time worst cases such that we obtain satisfactory type-directed analysis on a source language. They also separately
performance with no run-time tests at all. introduced an optimisation called “higher-order arity raising” [18]
• If purely static arity discovery turns out to be too pessimistic which attempts to unpack tuple arguments where possible – this
in practice (a particular danger for higher order arguments), it is a generalisation of the existing worker/wrapper transformations
would still be straightforward to adapt the dynamic discovery GHC currently does for strict product parameters. However, their
process for this new core language, but we can avoid using it analyses only consider a strict language, and in the case of uncur-
except in those cases where it could give a better result than rying does not try to distinguish between cheap and expensive com-
static dispatch. Essentially, if we appear to be applying at least putation in the manner we propose above. Leroy et al. [19] demon-
two groups of arguments to a function, then at that point we strated a verified version of the framework which operates by coer-
should generate code to dynamically check for a better arity cion insertion, which is similar to our worker/wrapper approach.
before applying the first group.

11 2009/5/14
8. Conclusions and further work References
In this paper we have described what we believe to be a interest- [1] A. P. Black, M. Carlsson, M. P. Jones, D. Kieburtz, and J. Nordlander.
ing point in the design space of compiler intermediate languages. Timber: a programming language for real-time embedded systems.
By making information about a function’s calling convention to- Technical Report CSE-02-002, 2002.
tally explicit in the intermediate language type system, we expose [2] S. Marlow and S. L. Peyton Jones. How to make a fast curry:
it to the optimiser – in particular we allow optimisation of decisions push/enter vs eval/apply. In International Conference on Functional
about function arity. A novel concept – n-ary thunks – arose nat- Programming, pages 4–15, September 2004.
urally from the process of making calling convention explicit, and [3] J. Launchbury. A natural semantics for lazy evaluation. In 20th ACM
this in turn allows at least one novel and previously-inexpressible Symposium on Principles of Programming Languages (POPL’93),
optimisation (deep unboxing) to be expressed. pages 144–154, January 1993.
This lazy λ-calculus FH we present is similar to System FC, [4] S. L. Peyton Jones and John Launchbury. Unboxed values as first
GHC’s current intermediate language. For a long time, a lazy lan- class citizens in a non-strict functional language. In Functional
guage was, to us at least, the obvious intermediate language for a Programming Languages and Computer Architecture, pages 636–
lazy source language such as Haskell – so it was rather surprising 666. Springer, 1991.
to discover that an appropriately-chosen strict calculus seems to be [5] M. Sulzmann, M. Chakravarty, S. L. Peyton Jones, and K. Donnelly.
in many ways better suited to the task! System F with type equality coercions. In ACM SIGPLAN Interna-
However, it still remains to implement the language in GHC tional Workshop on Types in Language Design and Implementation
and gain practical experience with it. In particular, we would like (TLDI’07). ACM, 2007.
to obtain some quantitative evidence as to whether purely static [6] J. Girard. The system F of variable types, fifteen years later.
arity dispatch leads to improved runtimes compared to a dynamic Theoretical Computer Science, 45(2):159–192, 1986.
consideration of the arity of a function such as GHC implements [7] S. L. Peyton Jones and A. Santos. A transformation-based optimiser
at the moment. A related issue is pinning down the exact details for Haskell. Science of Computer Programming, 32(1-3):3–47,
of how a hybrid dynamic/static dispatch scheme would work, and September 1998.
how to implement it without causing code bloat from the extra
[8] A. Gill and G. Hutton. The worker/wrapper transformation. Journal
checks. We anticipate that we can reuse existing technology from of Functional Programming, 19(2):227–251, March 2009.
our experience with the STG machine [20] to do this.
Although we have presented, by way of examples, a number of [9] C. Baker-Finch, K. Glynn, and S. L. Peyton Jones. Constructed prod-
compiler optimisations that are enabled or put on a firmer footing uct result analysis for haskell. Journal of Functional Programming,
14(2):211–245, 2004.
by the use of the new intermediate language, we have not provided
any details about how a compiler would algorithmically decide [10] S. L. Peyton Jones, W. D Partain, and A. Santos. Let-floating: moving
when and how to apply them. In particular, we plan to write a bindings to give faster programs. In International Conference on
paper fully elucidating the details of the two arity optimisations Functional Programming, 1996.
(Section 6.2 and Section 6.1) in a lazy language and reporting on [11] N. Benton, A. Kennedy, and G. Russell. Compiling standard ML
our practical experience of their effectiveness. to Java bytecodes. In International Conference on Functional
There are a number of interesting extensions to the intermediate Programming, pages 129–140, New York, NY, USA, 1998. ACM.
language that would allow us to express even more optimisations. [12] M. Serrano and P. Weis. Bigloo: A portable and optimizing compiler
We are particularly interested in the possibility of using some for strict functional languages. In International Symposium on Static
features of the ΠΣ language [21] to allow us to express even more Analysis, pages 366–381, London, UK, 1995. Springer-Verlag.
optimisations in a typed manner. In particular, adding unboxed Σ [13] A. Appel. Compiling with Continuations. Cambridge University
types would address an asymmetry between function argument and Press, 1992.
result types in Strict Core – binders may not appear to the right [14] S. L. Peyton Jones, M. Shields, J. Launchbury, and A. Tolmach.
of a function arrow currently. They would also allow us to express Bridging the gulf: a common intermediate language for ML and
unboxed existential data types (including function closures, should Haskell. In Principles of Programming Languages, pages 49–61,
we wish) and GADTs. Another ΠΣ feature – types that can depend New York, NY, USA, 1998. ACM.
on “tags” – would allow us to express unboxed sum types, but the [15] U. Boquist. Code Optimisation Techniques for Lazy Functional
implications of this feature for the garbage collector are not clear. Languages. PhD thesis, Chalmers University of Technology, April
We would like to expose the ability to use “strict” types to the 1999.
compiler user, so Haskell programs can, for example, manipulate [16] B. Rudiak-Gould, A. Mycroft, and S. L. Peyton Jones. Haskell is not
lists of strict integers ([!Int ]). Although it is easy to express such not ML. In European Symposium on Programming, 2006.
things in the Strict Core language, it is not obvious how to go about
exposing this ability in the source language in a systematic way – [17] J. Hannan and P. Hicks. Higher-order uncurrying. Higher Order
Symbolic Computation, 13(3):179–216, 2000.
work on this is ongoing.
[18] J. Hannan and P. Hicks. Higher-order arity raising. In International
Conference on Functional Programming, pages 27–38, New York,
NY, USA, 1998. ACM.
[19] Z. Dargaye and X. Leroy. A verified framework for higher-order
uncurrying optimizations. March 2009.
9. Acknowledgements [20] S. L. Peyton Jones. Implementing lazy functional languages on stock
This work was partly supported by a PhD studentship generously hardware: the Spineless Tagless G-machine. Journal of Functional
provided by Microsoft Research. We would like to thank Paul Programming, 2:127–202, April 1992.
Blain Levy for the thought provoking talks and discussions he [21] T. Altenkirch and N. Oury. PiSigma: A core language for dependently
gave while visiting the University of Cambridge which inspired typed programming. 2008.
this work. Thanks are also due to Duncan Coutts, Simon Marlow,
Alan Mycroft and Dominic Orchard for their helpful comments and
suggestions.

12 2009/5/14

You might also like