1ML - Core and Modules United: (F-Ing First-Class Modules)
1ML - Core and Modules United: (F-Ing First-Class Modules)
Andreas Rossberg
Google
[email protected]
Abstract
ML is two languages in one: there is the core, with types and expressions, and there are modules, with signatures, structures and
functors. Modules form a separate, higher-order functional language on top of the core. There are both practical and technical
reasons for this stratification; yet, it creates substantial duplication
in syntax and semantics, and it reduces expressiveness. For example, selecting a module cannot be made a dynamic decision. Language extensions allowing modules to be packaged up as first-class
values have been proposed and implemented in different variations.
However, they remedy expressiveness only to some extent, are syntactically cumbersome, and do not alleviate redundancy.
We propose a redesign of ML in which modules are truly firstclass values, and core and module layer are unified into one language. In this 1ML, functions, functors, and even type constructors are one and the same construct; likewise, no distinction is made
between structures, records, or tuples. Or viewed the other way
round, everything is just (a mode of use of) modules. Yet, 1ML
does not required dependent types, and its type structure is expressible in terms of plain System F , in a minor variation of our F-ing
modules approach. We introduce both an explicitly typed version
of 1ML, and an extension with Damas/Milner-style implicit quantification. Type inference for this language is not complete, but, we
argue, not substantially worse than for Standard ML.
An alternative view is that 1ML is a user-friendly surface syntax
for System F that allows combining term and type abstraction in
a more compositional manner than the bare calculus.
Categories and Subject Descriptors D.3.1 [Programming Languages]: Formal Definitions and Theory; D.3.3 [Programming
Languages]: Language Constructs and FeaturesModules; F.3.3
[Logics and Meanings of Programs]: Studies of Program Constructs
Type structure
General Terms Languages, Design, Theory
Keywords ML modules, first-class modules, type systems, abstract data types, existential types, System F, elaboration
1.
Packaged Modules
Introduction
The ML family of languages is defined by two splendid innovations: parametric polymorphism with Damas/Milner-style type in-
Because core-level polymorphism is first-order, this approach cannot express type sharing between type constructors a complaint
that has come up several times on the OCaml mailing list; for example, if one were to abstract over a monad:
[Copyright notice will appear here once preprint option is removed.]
2015/2/26
First-Class Modules
F-ing Modules
In our work on F-ing modules with Russo & Dreyer [25] we have
demonstrated that ML modules can be expressed and encoded
entirely in vanilla System F (or F , depending on the concrete core
language and the desired semantics for functors). Effectively, the Fing semantics defines a type-directed desugaring of module syntax
into System F types and terms, and inversely, interprets a stylised
subset of System F types as module signatures.
The core language that we assume in that paper is System F( )
itself, leading to the seemingly paradoxical situation that the core
language appears to have more expressive types than the module
language. That makes sense because the translation rules manipulate the sublanguage of module types in ways that would not generalise to arbitrary System F types. In particular, the rules implicitly
introduce and eliminate universal and existential quantifiers, which
is key to making modules a usable means of abstraction. But the
process is guided by, and only meaningful for, module syntax; likewise, the built-in subtyping relation is only complete for the specific occurrences of quantifiers in module types.
Nevertheless, the observation that modules are just sugar for
certain kinds of constructs that the core language can already express (even if less concisely), raises the question: what necessitates
modules to be second-class in that system?
1.4
1ML
The answer to that question is: very little! And the present paper is
motivated by exploring that answer.
In essence, the F-ing modules semantics reveals that the syntactic stratification between ML core and module language is merely
a rather coarse means to enforce predicativity for module types:
it prevents that abstract types themselves can be instantiated with
binders for abstract types. But this heavy syntactic restriction can
be replaced by a more surgical semantic restriction! It is enough
to employ a simple universe distinction between small and large
types (reminiscent of Harper & Mitchells XML [10]), and limit
the equivalent of the FORGET rule shown earlier to only allow small
types for subsitutition, which serves to exclude problematic quantifiers.
That would settle decidability, but what about type inference?
Well, we can use the same distinction! A quick inspection of the
subtyping rules in the F-ing modules semantics reveals that they,
almost, degenerate to type equivalence when applied to small types
the main exception being width subtyping on structures. If we
are willing to accept that inference is not going to be complete
for records (which it already isnt in Standard ML), then a simple
restriction to inferring only small types is sufficient to make type
inference work almost as usual.
In this spirit, this paper presents 1ML, an ML-dialect in which
modules are truly first-class values. The name is both short for 1stclass module language and a pun on the fact that it unifies core and
modules of ML into one language.
We see several benefits with this redesign: it produces a language that is more expressive and concise, and at the same time,
more minimal and uniform. Modules become a natural way to
express all forms of (first-class) polymorphism, and can be freely
intermixed with computational code and data. Type inference
integrates in a rather seamless manner, reducing the need for explicit annotations to large types, module or not. Every programming concept is derived from a small set of orthogonal constructs,
over which general and uniform syntactic sugar can be defined.
T = {type A; f : A ()}
U = {type A; f : (T where type A = A) ()}
V = T where type A = U
: V) = X : U (* V U ? *)
Checking V U would match type A with type A=U, substituting U for A accordingly, and then requires checking that the types
of f are in a subtyping relation which contravariantly requires
checking that (T where type A = A)[U/A] A[U/A], but that
is the same as the V U we wanted to check in the first place.
In fewer words, signature matching is no longer decidable when
module types can be abstracted over, which is the case if module
types are simply collapsed into ordinary types. It also arises if
abstract signatures are added to the language, as in OCaml, where
the same example can be constructed on the module type level.
Some may consider decidability a rather theoretical concern.
However, there also is the quite practical issue that the introduction of signature matching into the core language makes ML-style
type inference impossible. Obviously, Milners algorithm W [18] is
far too weak to handle dependent types. Moreover, modules introduce subtyping, which breaks unification as the basic algorithmic
tool for solving type constraints. And while inference algorithms
for subtyping exist, they have much less satisfactory properties than
our beloved Hindley/Milner sweet spot.
Worse, module types do not even form a lattice under subtyping:
f1 : {type t a; x : t int} int
f2 : {type t a; x : int} int
g = if condition then f1 else f2
2015/2/26
2.
id a (x : a) = x
type pair a b = {fst : a; snd : b}
second a b (p : pair a b) = p.snd
Functional Core A major part of 1MLex consists of fairly conventional functional language constructs. On the expression level,
as a representative for a base type, we have Booleans (in examples
that follow, we will often assume the presence of an integer type
and respective constructs as well). Then there are records, which
consist of a sequence of bindings. And of course, it wouldnt be a
functional language without functions.
In a first approximation, these forms are reflected on the type
level as one would expect, except that for functions we allow
two forms of arrows, distinguishing pure function types () from
impure ones () (discussed later).
Like in the F-ing modules paper [25], most elimination forms in
the kernel syntax only allow variables as subexpressions. However,
the general expression forms are all definable as straightforward
syntactic sugar, as shown in the lower half of Figure 1. For example,
It may seem surprising that we can just reify types as firstclass values. But reified types (or atomic type modules) have
been common in module calculi for a long time [16, 6, 24, 25].
We are merely making them available in the source language directly. For the most part, this is just a notational simplification over
what first-class modules already offer: instead of having to define
T = {type t = int} : {type t} and then refer to T.t, we allow
injecting types into modules (i.e., values) anonymously, without
wrapping them into a structure; thus T = (type int) : type, which
can be referred to as just T.
Translucency The type type allows classifying types abstractly:
given a value of type type, nothing is known about what type
it is. But for modular programming it is essential that types can
selectively be specified transparently, which enables expressing the
vital concept of type sharing [12].
As a simple example, consider these type aliases:
(fun (n : int) n + n) 3
desugars into
size : type
pair : (a : type) (b : type) type
Or transparently:
size : (= type int)
pair : (a : type) (b : type) (= type {fst : a; snd : b})
Reified Types The core feature that makes 1MLex able to express
modules is the ability to embed types in a first-class manner: the expression type T reifies the type T as a value.1 Such an expression
has type type, and thereby can be abstracted over. For example,
id = fun (a : type) fun (x : a) x
type size
type pair a b
which takes a type and returns a type, and effectively defines a type
constructor. Applied to a reified type it yields a reified type. Again,
the implicit projection from paths enables using this as a type:
Functors Returning to the 1ML grammar, the remaining constructs of the language are typical for ML modules, although they
are perhaps a bit more general than what is usually seen. Let us
explain them using an example that demonstrates that our language
can readily express real modules as well. Here is the (unavoidable, it seems) functor that defines a simple map ADT:
type EQ =
{
type t;
eq : t t bool
2015/2/26
(identifiers)
(types)
(declarations)
(expressions)
(bindings)
X
T
D
E
B
::=
::=
::=
::=
(types)
let B in T
:= {B;X= type T }.X
:= (X:T1 )
T1
T2
T2
:= T where (.X: P (=E))
T where (.X P =E)
T where (type .X P =T 0 ) := T where (.X: P (= type T 0 ))
(declarations)
local B in D
X P :T
X P =E
type X P
type X P =T
where:
X := (X: type)
(expressions)
let B in E
if E1 then E2 else E3 :T
E1 E2
ET
E :T
E :> T
fun P E
:= {B;X=E}.X
:= let X=E1 in if X then E2 else E3 :T
:= let X1 =E1 ; X2 =E2 in X1 X2
:= E (type T )
(if T unambiguous)
:= ((X:T ) X) E
:= let X=E in X :> T
:= fun P E
(bindings)
local B in B 0
X P : T 0 :> T 00 =E
type X P =T
:= include (let B in {B 0 })
:= X = fun P E : T 0 :> T 00
:= X = fun P type T
M1 = Map IntEq;
M2 = Map IntEq;
m = M1 .add int 7 M2 .empty (* ill-typed: M1 .map 6= M2 .map *)
type MAP =
{
type key;
type map a;
empty a : map a;
add a : key a map a map a;
lookup a : key map a opt a
};
The record type EQ amounts to a module signature, since it contains an abstract type component t. It is referred to in the type of eq,
which shows that record types are dependent: like for terms, earlier
components are in scope for later components.
Similarly, MAP defines a signature with abstract key and map
types. Note how type parameters on the left-hand side conveniently
and uniformly generalise to value declarations, avoiding the need
for brittle implicit scoping rules like in conventional ML: as shown
in Figure 1, empty a : T means empty : (a : type) T .
The Map function is a functor: it takes a value of type EQ,
i.e., a module. From that it constructs a naive implementation of
maps. X:>T is the usual sealing operator that opaquely ascribes
a type (i.e., signature) to a value (a.k.a. module). The type refinement syntax T where (type .X=T ) should be familiar from
ML, but here it actually is derived from a more general construct:
T where (.X:U ) refines T s subcomponent at path .X to type
U , which can be any subtype of whats declared by T . That form
subsumes module sharing as well as other forms of refinement.
F
G
H
J
2015/2/26
type T1 = type;
type T2 = {type u};
type T3 = {type u = T2 };
type COLL c =
{
type key;
type val;
empty : c;
add : c key val c;
lookup : c key opt val;
keys : c list key
};
would not cause an error when inserted into the above definitions.
Recursion The 1MLex syntax we give in Figure 1 omits a couple
of constructs that one can rightfully expect from any serious ML
contender: in particular, there is no form of recursion, neither for
terms nor for types. It turns out that those are largely orthogonal to
the overall design of 1ML, so we only sketch them here.
ML-style recursive functions can be added simply by throwing
in a primitive polymorphic fixpoint operator
fix a b : (a b) (a b)
plus perhaps some suitable syntactic sugar:
rec X Y (Z:T ) : U =E :=
X = fun Y fix T U (fun(X:(Z:T ) T 0 ) fun(Z:T ) E)
Given an appropriate fixpoint operator, this generalises to mutually
recursive functions in the usual ways. Note how the need to specify
the result type b (respectively, U ) prevents using the operator to
construct transparent recursive types, because U has no way of
referring to the result of the fixpoint. Moreover, fix yields an impure
function, so even an attempt to define an abstract type recursively,
wont type-check, because stream wouldnt be an applicative functor, and so the term stream a on the right-hand side is not a valid
type fortunately, because there would be no way to translate such
a definition into System F with a conventional fixpoint operator.
Recursive (data)types have to be added separately. One approach, that has been used by Harper & Stones type-theoretic account of Standard ML [13], is to interpret a recursive datatype like
datatype t = A | B of T
The only minor nuisance is the need to annotate the type of the
conditional, as explained earlier.
Predicativity What is the restriction we employ to maintain decidability? It is simple: during subtyping (a.k.a. signature matching) the type type can only be matched by small types, which are
those that do not themselves contain the type type opaquely; or
in other words, monomorphic types. This restriction affects annotations, parameterisation over types, and the formation of abstract
2 In
2015/2/26
::= |
::= | | {l: } | :. | :. |
:. |
(terms)
e, f ::= x | x:.e | e e | {l=e} | e.l | :.e | e |
pack h, ei | unpack h, xi=e in e
(environs) ::= | , : | , x:
(kinds)
(types)
type
(= type T )
{X1 :T1 ;X2 :T2 }
(X:T1 ) T2
(X:T1 ) T2
A.t
F(A).u
Figure 2. Syntax of F
(abstracted)
(large)
(small)
(paths)
(purity)
::=
::=
::=
::=
::=
.
| bool | [= ] | {l:} | .
| bool | [= ] | {l:} | I
|
P|I
Desugarings into F :
(types)
[= ]
:= {typ : {}}
1 l 2 := 1 {l : 2 }
Notation:
PI
.l :=
0
{l:, ...}.l := .l
(terms)
[ ]
:= {typ = x:.{}}
l x:.e := x:.{l : e}
:=
P I := I P := I
()
=P
(.) = I
[.l=2 ] := 2
0
{l:, ...}[.l=2 ] := {l: [.l =2 ], ...}
(l = )
0
(l = l.l )
3.
So much for leisure, now for work. The general recipe for 1MLex
is simple: take the semantics from F-ing modules [25], collapse the
levels of modules and core, and impose the predicativity restriction
needed to maintain decidability. This requires surprisingly few
changes to the whole system. Unfortunately, space does not permit
explaining all of the F-ing semantics in detail, so we encourage the
reader to refer to [25] (mostly Section 4) for background, and will
focus primarily on the differences and novelties in what follows.
3.1
.[= ]
[= ]
1 2 .{X1 :1 , X2 :2 }
1 .1 I 2 .2
2 .1 .1 P 2
A.t
F( ).u A.t
3.2
Internal Language
Elaboration
Types and Declarations The main job of the elaboration rules for
types is to name all abstract type components with type variables,
collect them, and bind them hoisted to an outermost existential (or
universal, in the case of functions) quantifier. The rules are mostly
identical to [25], except that type is a free-standing construct instead of being tied to the syntax of bindings, and 1MLs where
construct requires a slightly more general rule.
Also, we drop the side condition for to be explicit in rule
T SING (corresponding to rule S- LIKE in [25]), as explained below.
2015/2/26
`T
T STR
Types
` E :P [= ]
`E
=
T TYPE
` type
.[= ]
T PATH
` T1
1 .1
, 1 , X:1 ` T2
2 .2
T FUN
` (X:T1 ) T2
1 . 1 I 2 .2
` E :P
` (= E)
Declarations
` T1
` T2
e
T SING
`T
` X:T
` false :P bool
false
` B :
` {B} :
` (X:T1 ) T2
P 2 [20 1 /2 ]
1 .1
2 .2
1 = 11 ] 12
, 11 , 2 ` 2 12 1 .X
; f
11 2 .1 [.X=2 ]
X1 X2 =
1 2 .{X1 :1 , X2 :2 }
[]
D SEQ
E TYPE
`
{}
` true :P bool
T WHERE
`D
` E :
D EMPTY
true
E TRUE
` X :P bool
e
` E 1 : 1 1
e1
` 1
f1
`T
` E 2 : 2 2
e2
` 2
f2
E IF
` if X then E1 else E2 : T :1 2 ()
if e then f1 e1 else f2 e2
e
X: X 0 :0
` E : .{X 0 :0 }
E DOT
` E.X : .
unpack h, yi = e in pack h, y.Xi
E STR
` X1 :P . 1
e1
` X2 :P 2
e2
` 2 1
` X1 X2 :
(e1 () (f e2 )).
`T
.
, , X: ` E :
e
E FUN
` fun (X:T ) E :P .
. X:.e
` X :P 1
e
`T
.2
` X:>T :(.2 ) .2
` 1 2
pack h, f ei
; f
; f
E APP
E SEAL
` B :
Bindings
` E : .
e
B VAR
` X=E : .{X:}
unpack h, xi = e in pack h, {X=x}i
` B1 :1 1 .{X1 :1 }
, 1 , X1 :1 ` B2 :2 2 .{X2 :2 }
` B1 ;B2 :1 2 1 2 .{X10 :01 , X2 :2 }
x.x
{}
x.{}
S EMPTY
f := ` 0
S PATH
f0
B INCL
` :P {}
x.x
{}
B EMPTY
` 0
; f
S BOOL
= 0
S FORGET
` [= ] [= ]
[0 ./]; x.x
S TYPE
` 01 1 1
2 {l: 1 }
{l0 :0 }
{l1 :01 , l0 :0 }
, ` 0 0
1 ; f1
0
0
2 ; f2
2 =
, ` 1
S FUN
` (0 .0 0 0 ) (. )
2 ; x. . y:. f2 ((x (1 0 ) (f1 y)).0 )
B SEQ
id; f
` bool bool
`
`
` 0
f
` 0
` [= 0 ] [= ]
x.[]
{l:0 }
` E : .{X:}
e
` include E : .{X:}
X1 = X1 X2
X10 :01 X1 :1
e1
e2
` 0
Subtyping
T PFUN
`T
.{X:}
D INCL
` include T
.{X:}
1 .{X1 :1 }
2 .{X2 :2 }
E FALSE
02 = 1 2
02 .1 . 1
`T
` type T :P [= ]
(X) =
E VAR
` X :P
X
T BOOL
1 .1
2 .2
.
DVAR
.{X:}
` D1 ;D2
bool
` T1
, 1 , X:1 ` T2
` T1 where (.X:T2 )
` D1
, 1 , X1 :1 ` D2
Expressions
` bool
`D
` {D}
1 2 {l1 :1 , l:}
1 ; f1
2 ; f2
2 1 = 1
S STR
, 0 ` 0
; f
0 6=
S ABS
0
` . .
x. unpack h , yi = x in pack h, f yi
0
2015/2/26
1. If ` T /D
, then ` : .
2. If ` E/B :
e, then ` e : , and if =P then =.
3. If ` 0 0
; f and ` 0 : and , ` : ,
then dom() = and ` : , and ` f : 0 .
The only other non-editorial changes over [25] are that type
T is now handled as a first-class value, no longer tied to bindings,
and that Booleans have been added as representatives of the core.
The rules collect all abstract types generated by an expression
(e.g. by sealing or by functor application) into an existential package. This requires repeated unpacking and repacking of existentials
created by constituent expressions. Moreover, the sequencing rule
B SEQ combines two (n-ary) existentials into one.
It is an invariant of the expression elaboration judgement that
= I if is not a concrete type i.e., abstract type generation is impure. Without this invariant, rule E FUN might form an
invalid function type that is marked pure but yet has an inner existential quantifier (i.e., is generative). To maintain the invariant,
both sealing (rule E SEAL) and conditionals (rule E IF) have to be
deemed impure if they generate abstract types enforced by the
notation () defined in Figure 3. In that sense, our notion of purity
actually corresponds to the stronger property of valuability in the
parlance of Dreyer [4], which also implies phase separation, i.e.,
the ability to separate static type information from dynamic computation, key to avoiding the need for dependent types.
4.
Full 1ML
A language without type inference is not worth naming ML. Because that is so, Figure 5 shows the minimal extension to 1MLex
necessary to recover ML-style implicit polymorphism. Syntactically, there are merely two new forms of type expression.
First, stands for a type that is to be inferred from context.
The crucial restriction here is that this can only be a small type. This
fits nicely with the notion of a monotype in core ML, and prevents
the need to infer polymorphic types in an analogous manner.
On top of this new piece of kernel syntax we allow a type
annotation : on a function parameter or conditional to be
omitted, thereby recovering the implicitly typed expression syntax
familiar from ML. (At the same time we drop the 1MLex sugar
interpreting an unannotated parameter as a type; we only keep that
interpretation in type declarations or bindings.)
Second, there is a new type of implicit function, distinguished
by a leading tick (a choice that will become clear in a moment).
This corresponds to an ML-style polymorphic type. The parameter
has to be of type type, whose being small fits nicely with the fact
that ML can only abstract monotypes, and no type constructors. For
obvious reasons, an implicit function has to be pure. We write the
semantic type of implicit functions with an arrow A , in order to
reuse notation. It is distinct from , however, and A not an effect.
As the name would suggest, there are no explicit introduction
or elimination forms for implicit functions. Instead, they are introduced and eliminated implicitly. The respective typing rules (E GEN
and E INST) match common formulations of ML-style polymorphism [3]. Any pure expression can have its type generalised, which
is more liberal than MLs value restriction [35] (recall that purity
also implies that no abstract types are produced).
Subtyping allows the implicit elimination of implicit functions
as well, via instantiation on the left, or skolemisation on the right
(rules S IMPLL and S IMPLR). This closely corresponds to MLs
However, this does not break anything else, so we make that simplification anyway (if desired, explicitness could easily be revived).
3.3
e, then
Meta-Theory
2015/2/26
Syntax
(types)
(expressions)
T
::=
... |
| (X:type) T
(types)
(declarations)
if E1 then E2 else E3
fun X E
X T
X Y :T
:=
:=
:=
:=
if E1 then E2 else E3 :
fun (X: ) E
(X:type) T
X : (Y : type) T
Semantic Types
(large signatures)
. . . | .{} A
::=
Types
Expressions
, ` E :P
e
` E :P .{} A
=
E GEN
.A x:{}.e
`T
, , X:[= ] ` T
=
T IMPL
` (X:type) T
.{} A
`:
T INFER
`
` E :
` E : .0 .{} A
e
, ` : 0
E INST
` E : .[/0 ]
unpack h, xi = e in pack h, (x {}).Ai
` 0
Subtyping
0
` : 0
` [/ ]
; f
S IMPLL
` 0 .{} A 0
; x. f ((x {}).A)
; f
, `
; f
fv() 6
S IMPLR
` 0 .{} A
; x. .A y:{}.f x
With these few extensions, the Map functor from Section 2 can
now be written in 1ML very much like in traditional ML:
type MAP =
{
type key;
type map a;
empty a : map a;
lookup a : key map a opt a;
add a : key a map a map a
};
5.1
Figure 6 shows the essence of this algorithm, formulated via inference rules. The basic idea is to modify the declarative typing
rules such that wherever they have to guess a (small) type, we simply introduce a (free) inference variable. Furthermore, the rules are
augmented with outputting a substitution for resolved inference
variables: all judgements have the form ` J , which, roughly,
implies the respective declarative judgement , ` J , where
binds the unresolved inference variables that still appear free in
or J . Notation is simplified by abbreviations of the form
` 0 J
:=
`00 J 0 = 00
The MAP signature here uses one last bit of syntactic sugar defined
in Figure 5, which is to allow implicit parameters on the left-hand
side of declarations, like we already do for explicit parameters (cf.
Figure 1), The tick becomes a pun on MLs type variable syntax,
but without relying on brittle implicit scoping rules.
Space reasons forbid more extensive examples, but it should
be clear from the rules that there is nothing preventing the use of
implicit functions as first-class values, given sufficient annotations
for their (large) types. For example:
.[= ] P [= ]
which can be solved just fine with = [= ] I [= ] for
any ; through contravariance, similar situations can arise with an
inference variable on the left. Because of this, it is not enough to
just consider the cases or for resolving . Instead,
when the subtyping algorithm hits or (rules IS RESL
and IS RESR, where may or may not be small) it invokes the
auxiliary Resolution judgement ` , which only resolves
so far as to match the shape of and inserts fresh inference
variables for its subcomponents. After that, subtyping tries again.
Second, an inference variable can be introduced in the scope
of abstract types (i.e., regular type variables). In general, it would
be incorrect to resolve to a type containing type variables that are
5.
Algorithm
Type Inference
2015/2/26
Types
`! E :P [= ]
IT PATH
` E
; 1 , X:1
fresh
`1 T1
1 `2 T2
`2 (X:T1 ) T2
= dom()
IT INFER
`[]
1 .1
2 .2
02 = 1 2
02 .1 . 1 P 2 [20 1 /2 ]
`[] type
.[= ]
IT TYPE
` T
` E :
IT SING
` (= E)
; , X:[= ] ` T
=
IT IMPL
` (X:type) T
.{} A
IT PFUN
` E :
3 `4 1
`!0 X :P bool
0 `1 E1 :1 1
!
0
0
4 `5 2
2 ` 3 T
1 `2 E2 :2 2
` E : .{X:, X : }
(X) =
IE IF
IE DOT
IE VAR
`5 if X then E1 else E2 : T :1 2 ()
` E.X : .
`[] X :P
`!1 X1 :P . 1
1 `2 X2 :P 2
2 `3 2 1
.
`1 T
; , X: 1 `2 E :
IE APP
IE FUN
`2 fun (X:T ) E :P .
`3 X1 X2 :
Expressions
Bindings
` E :I .
IB VAR
` X=E :I .{X : }
` E :P
= undet() undet()
=
IB PVAR
` X=E :P {X : .{} A [/]}
Subtyping
`! 0
`0 0
IS RESR
` 0 0
`!
` 0
IS REFL
IS RESL
`[]
` 0
0
0
1
, `1 0
; 1 `2 1 0
2
2 2 = 2
IS FUN
`2 (0 .0 0 0 ) (. )
2
fresh
= dom()
` 0 [/0 ]
0
` .{} A 0
`!
Resolution
00 fresh
00 = 0
IR INFER
`[00 /,00 /0 ] 0
`[bool/] bool
IN REFL
; `
IN RES
` . .
:=
0 fresh
0 =
IR PATH
`[ 0 /]
1 , 2 fresh
1 = 2 =
IR FUN
`[(1 I 2 )/] .
` E : 0 `0 0
fresh
; ` 0
; f
6 fv()
IS IMPLR
` 0 .{} A
IS IMPLL
0 fresh
0 =
IRTYPE
`[[=0 ]/] [= ]
`! E :
Instantiation
`
IR BOOL
` 0
0 6=
; ` 0
IS ABS
0
0
` . .
/ undet() `
:=
` B :
` 0
= dom(, )
` .[/0 ] .0
IN IMPL
0
` . .{} A .0
Implicit functions work mostly like in ML. Like with letpolymorphism, generalisation is deferred to the point where an
expression is bound in this case, in rule IB PVAR.
Similarly, instantiation is deferred to rules corresponding to
elimination forms (e.g. IE IF, IE DOT, IE APP, but also IT PATH).
There, the auxiliary Instantiation judgement is invoked (as part
of the notation `! J .). This does not only instantiate implicit
functions (possibly under existential binders), it also may resolve
inference variables to create a type whose shape matches the shape
that is expected by the invoking rule.
Instantiation can also happen implicitly as part of subtyping
(rule IS IMPLL), which covers the case where a polymorphic value
is supplied as the argument to a function expecting a monomorphic
(or less polymorphic) parameter.
5.2
Incompleteness
The net effect is that all local s from 0 are removed from all
-sets of inference variable remaining after executing , 0 ` J .
We omit in this notation when it is the identity.
10
2015/2/26
.
1. If ` T /D
, then 0 , ` T /D
2. If ` E/B :
e, then 0 , ` E/B :
e.
3. If ` 0
;f and , ` 0 : and , , ` : ,
then 0 , ` 0
; f .
T HEOREM 5.2 (Termination of 1ML Inference).
All 1ML type inference judgements terminate.
We have to defer the details to the Technical Appendix [23].
6.
Type Scoping Tracking of the sets is conservative: after leaving the scope of a type variable , we exclude any solution for
that would still involve , even if only appears inside a type
binder for . Consider, for example [5]:
G (x : int) = {M = {type t = int; v = x} :> {type t; v : t}; f = id id};
C = G 3;
x = C.f (C.M.v);
First-Class Modules The first to unify MLs stratified type system into one language was Harper & Mitchells XML calculus [10].
It is a dependent type theory modeling modules as terms of MartinLof-style and types, closely following MacQueens original
ideas [17]. The system enforces predicativity through the introduction of two universes U1 and U2 , which correspond directly to our
notion of small and large type, and both systems allow both U1 : U2
and U1 U2 . XML lacks any account of either sealing or translucency, which makes it fall short as a foundation for modern ML.
That gap was closed by Harper & Lillibridges calculus of
translucent sums [9, 16], which also was a dependently typed language of first-class modules. Its main novelty were records with
both opaque and transparent type components, directly modeling
ML structures. However, unlike XML, the calculus is impredicative, which renders it undecidable.
Translucent sums where later superseded by the notion of singleton types [31]; they formed the foundation of Dreyer et al.s type
theory for higher-order modules [6]. However, to avoid undecidability, this system went back to second-class modules.
One concern in dependently typed theories is phase separation:
to enable compile-time checking without requiring core-level computation, such theories must be sufficiently restricted. For example,
Harper et al. [11] investigate phase separation for the XML calculus. The beauty of the F-ing approach is that it enjoys phase separation by construction, since it does not use dependent types.
int .{M : {t : [= ], v : }, f : I }
with
/ (because goes out of scope the moment we bind it
with a local quantifier), and then generalises to
G : .{} A int .{M : {t : [= ], v : }, f : I }
But its too late, the solution = , which would make x welltyped, is already precluded. When typing C, instantiating with
is not possible either, because can only come into scope again
after having applied an argument for already.
Although not well-known, this very problem is already present
in good old ML, as Dreyer & Blume point out [5]: existing type inference implementations are incomplete, because combinations of
functors and the value restriction (like above) do not have principal
types. Interestingly, a variation of the solution suggested by Dreyer
& Blume (implicitly generalising the types of functors) is implied
by the 1ML typing rules: since functors are just functions, their
types can already be generalised. However, generalisation happens
outside the abstraction, which is more rigid than what they propose
(but which is not expressible in System F ). Consequently, 1ML
can type some examples from their paper, but not all.
Purity Annotations Due to effect subtyping, a function type as
an upper bound does not determine the purity of a smaller type.
Technically, that does not affect completeness, because we defined
small types to only include impure functions: the resolution rule
IR FUN can always pick I. But arguably, that is cheating a little
by side-stepping the issue, and it prevents the natural use of pure
function types to specify core-like functions.
Again, the solution would be more polymorphism, in this case a
simple form of effect polymorphism [32]. That will be future work.
Related Work
Metatheory
11
2015/2/26
[7] J. Garrigue and A. Frisch. First-class modules and composable signatures in Objective Caml 3.12. In ML, 2010.
[8] J. Garrigue and D. Remy. Semi-explicit first-class polymorphism for
ML. Information and Computation, 155(1-2), 1999.
Type Inference There has been little work that has considered
type inference for modules. Russo examined the interplay between
core-level inference and modules [28], elegantly dealing with variable scoping via unification under a mixed prefix. Dreyer & Blume
investigated how functors interfere with the value restriction [5].
At the same time, there have been ambitious extensions of MLstyle type inference with higher-rank or impredicative types [8,
14, 33, 29]. Unlike those systems, 1ML never tries to infer a
polymorphic type annotation: all guessed types are monomorphic
and polymorphic parameters require annotation.
On the other hand, 1ML allows bundling types and terms together into structures. While it is necessary to explicitly annotate
terms that contain types, associated type quantifiers (both universal and existential) and their actual introduction and elimination are
implicit and effectively inferred as part of the elaboration process.
7.
[9] R. Harper and M. Lillibridge. A type-theoretic approach to higherorder modules with sharing. In POPL, 1994.
[10] R. Harper and J. C. Mitchell. On the type structure of Standard ML.
In ACM TOPLAS, volume 15(2), 1993.
[11] R. Harper, J. C. Mitchell, and E. Moggi. Higher-order modules and
the phase distinction. In POPL, 1990.
[12] R. Harper and B. Pierce. Design considerations for ML-style module
systems. In B. C. Pierce, editor, Advanced Topics in Types and Programming Languages, chapter 8, pages 293346. MIT Press, 2005.
[13] R. Harper and C. Stone. A type-theoretic interpretation of Standard
ML. In Proof, Language, and Interaction: Essays in Honor of Robin
Milner. MIT Press, 2000.
[14] D. Le Botlan and D. Remy. MLF: Raising ML to the power of System
F. In ICFP, 2003.
[15] X. Leroy. Applicative functors and fully transparent higher-order
modules. In POPL, 1995.
Future Work
1ML, as shown here, is but a first step. There are several possible
improvements and extensions.
Implementation We have implemented a simple prototype interpreter for 1ML (mpi-sws.org/rossberg/1ml/), but it would be
great to gather more experience with a real implementation.
Applicative Functors We would like to extend 1MLs rather basic
notion of applicative functor with pure sealing a` la F-ing modules
(see the Technical Appendix [23]), but more importantly, make it
properly abstraction-safe by tracking value identities [25].
Implicits The domain of implicit functions in 1ML is limited to
type type. Allowing richer types would be a natural extension, and
might provide functionality like Haskell-style type classes [34].
Type Inference Despite the ability to express first-class and
higher-order polymorphism, inference in 1ML is rather simple.
Perhaps it is possible to combine 1ML elaboration with some of
the more advanced approaches to inference described in literature.
More Polymorphism Replacing more of subtyping with polymorphism might lead to better inference: row polymorphism [21]
could express width subtyping, and simple effect polymorphism [32]
would allow more extensive use of pure function types.
Dependent Types Finally, 1ML goes to length to push the boundaries of non-dependent typing. Its a legitimate question to ask,
what for? Why not go fully dependent? Well, even then sealing necessitates some equivalent of weak sums (existential types). Incorporating them, along with the quantifier pushing of our elaboration,
into a dependent type system might pose an interesting challenge.
JFP,
References
[1] H. Barendregt. Lambda calculi with types. In S. Abramsky, D. Gabbay, and T. Maibaum, editors, Handbook of Logic in Computer Science, vol. 2, chapter 2, pages 117309. Oxford University Press, 1992.
F-ing modules.
[32] J.-P. Talpin and P. Jouvelot. Polymorphic type, region and effect
inference. JFP, 2(3):245271, 1992.
[5] D. Dreyer and M. Blume. Principal type schemes for modular programs. In ESOP, 2007.
12
2015/2/26