0% found this document useful (0 votes)

8 views527 pages

978-3-642-03359-9

This document contains the proceedings of the 22nd International Conference on Theorem Proving in Higher Order Logics (TPHOLs 2009), held in Munich, Germany from August 17-20, 2009. It includes a total of 55 submitted papers, with 26 research papers and 1 proof pearl accepted for presentation and publication. The conference also featured invited talks and tutorials by notable researchers in the field of theorem proving.

Uploaded by

Fábio Lourenço

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views527 pages

978-3-642-03359-9

Uploaded by

Fábio Lourenço

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 527

Lecture Notes in Computer Science 5674

Commenced Publication in 1973

Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Alfred Kobsa
University of California, Irvine, CA, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
University of Dortmund, Germany
Madhu Sudan
Microsoft Research, Cambridge, MA, USA
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max-Planck Institute of Computer Science, Saarbruecken, Germany
Stefan Berghofer Tobias Nipkow
Christian Urban Makarius Wenzel (Eds.)

Theorem Proving
in Higher Order Logics

22nd International Conference, TPHOLs 2009

Munich, Germany, August 17-20, 2009
Proceedings

13
Volume Editors

Stefan Berghofer
Tobias Nipkow
Christian Urban
Makarius Wenzel

Technische Universität München

Institut für Informatik
Boltzmannstraße 3
85748, Garching, Germany

E-mail: {berghofe,nipkow,urbanc,wenzelm}@in.tum.de

Library of Congress Control Number: 2009931594

CR Subject Classification (1998): F.4, F.3, F.1, D.2.4, B.6.3, B.6.1, D.4.5, G.4, I.2.2

LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues

ISSN 0302-9743
ISBN-10 3-642-03358-X Springer Berlin Heidelberg New York
ISBN-13 978-3-642-03358-2 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
springer.com
© Springer-Verlag Berlin Heidelberg 2009
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper SPIN: 12727186 06/3180 543210
Preface

This volume constitutes the proceedings of the 22nd International Conference

on Theorem Proving in Higher Order Logics (TPHOLs 2009), which was held
during August 17-20, 2009 in Munich, Germany. TPHOLs covers all aspects
of theorem proving in higher order logics as well as related topics in theorem
proving and veriﬁcation.
There were 55 papers submitted to TPHOLs 2009 in the full research cat-
egory, each of which was refereed by at least three reviewers selected by the
Program Committee. Of these submissions, 26 research papers and 1 proof pearl
were accepted for presentation at the conference and publication in this vol-
ume. In keeping with longstanding tradition, TPHOLs 2009 also oﬀered a venue
for the presentation of emerging trends, where researchers invited discussion by
means of a brief introductory talk and then discussed their work at a poster
session. A supplementary proceedings volume was published as a 2009 technical
report of the Technische Universität München.
The organizers are grateful to David Basin, John Harrison and Wolfram
Schulte for agreeing to give invited talks. We also invited four tool develop-
ers to give tutorials about their systems. The following speakers kindly accepted
our invitation and we are grateful to them: John Harrison (HOL Light), Adam
Naumowicz (Mizar), Ulf Norell (Agda) and Carsten Schürmann (Twelf).
The TPHOLs conference traditionally changes continents each year to maxi-
mize the chances that researchers around the world can attend. TPHOLs started
in 1998 in the University of Cambridge as an informal users’ meeting for the
HOL system. Since 1993, the proceedings of TPHOLs have been published in
the Springer Lecture Notes in Computer Science series:

1993 (Canada) Vol. 780 2001 (UK) Vol. 2152

1994 (Malta) Vol. 859 2002 (USA) Vol. 2410
1995 (USA) Vol. 971 2003 (Italy) Vol. 2758
1996 (Finland) Vol. 1125 2004 (USA) Vol. 3223
1197 (USA) Vol. 1275 2005 (UK) Vol. 3603
1998 (Australia) Vol. 1479 2006 (USA) Vol. 4130
1999 (France) Vol. 1690 2007 (Germany) Vol. 4732
2000 (USA) Vol. 1869 2008 (Canada) Vol. 5170

We thank our sponsors: Microsoft Research Redmond, Galois, Verisoft XT,

Validas AG and the DFG doctorate programme Puma, for their support.
Finally, we are grateful to Andrei Voronkov. His EasyChair tool greatly eased
the task of reviewing the submissions and of generating these proceedings. He
also helped us with the ﬁner details of EasyChair.
Next year, in 2010, TPHOLs will change its name to ITP, Interactive Theorem
Proving. This is not a change in direction but merely reﬂects the fact better that
VI Preface

TPHOLs is the premier forum for interactive theorem proving. ITP 2010 will be
part of the Federated Logic Conference, FLoC, in Edinburgh.

June 2009 Stefan Berghofer

Tobias Nipkow
Christian Urban
Makarius Wenzel
Organisation

Programme Chairs
Tobias Nipkow TU München, Germany
Christian Urban TU München, Germany

Programme Committee
Thorsten Altenkirch David Aspinall Jeremy Avigad
Gilles Barthe Christoph Benzmüller Peter Dybjer
Jean-Christophe Filliâtre Georges Gonthier Mike Gordon
Jim Grundy Joe Hurd Reiner Hähnle
Gerwin Klein Xavier Leroy Pete Manolios
César Muñoz Michael Norrish Sam Owre
Larry Paulson Frank Pfenning Randy Pollack
Soﬁène Tahar Laurent Théry Freek Wiedijk

Local Organisation
Stefan Berghofer
Makarius Wenzel

External Reviewers
Naeem Abbasi Martin Giese Zhaohui Luo
Behzad Akbarpour Alwyn Goodloe Kenneth MacKenzie
Knut Akesson Thomas Göthel Jeﬀ Maddalon
June Andronick Osman Hasan Lionel Mamane
Bob Atkey Daniel Hedin Conor McBride
Stefan Berghofer Hugo Herbelin James McKinna
Yves Bertot Brian Huﬀman Russell O’Connor
Johannes Borgstrom Clment Hurlin Steven Obua
Ana Bove Ullrich Hustadt Anne Pacalet
Cristiano Calcagno Rafal Kolanski Florian Rabe
Harsh Raju Chamarthi Alexander Krauss Bernhard Reus
Benjamin Chambers Sava Krstic Norbert Schirmer
Nils Anders Danielsson Cesar Kunz Stefan Schwoon
William Denman Stphane Lescuyer Jaroslav Sevcik
Peter Dillinger Rebekah Leslie Thomas Sewell
Bruno Dutertre Pierre Letouzey Natarajan Shankar
VIII Organisation

Matthieu Sozeau Emina Torlak Simon Winwood

Mark-Oliver Stehr Aaron Turon Claus-Peter Wirth
Dan Synek Norbert Voelker Santiago Zanella
Murali Talupur Eelis van der Weegen
Table of Contents

Invited Papers
Let’s Get Physical: Models and Methods for Real-World Security
Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
David Basin, Srdjan Capkun, Patrick Schaller, and Benedikt Schmidt

VCC: A Practical System for Verifying Concurrent C . . . . . . . . . . . . . . . . . 23

Ernie Cohen, Markus Dahlweid, Mark Hillebrand, Dirk Leinenbach,
Michal Moskal, Thomas Santen, Wolfram Schulte, and
Stephan Tobies

Without Loss of Generality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

John Harrison

Invited Tutorials
HOL Light: An Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
John Harrison

A Brief Overview of Mizar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Adam Naumowicz and Artur Kornilowicz

A Brief Overview of Agda – A Functional Language with Dependent

Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Ana Bove, Peter Dybjer, and Ulf Norell

The Twelf Proof Assistant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Carsten Schürmann

Regular Papers
Hints in Uniﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Andrea Asperti, Wilmer Ricciotti, Claudio Sacerdoti Coen, and
Enrico Tassi

Psi-calculi in Isabelle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Jesper Bengtson and Joachim Parrow

Some Domain Theory and Denotational Semantics in Coq . . . . . . . . . . . . . 115

Nick Benton, Andrew Kennedy, and Carsten Varming

Turning Inductive into Equational Speciﬁcations . . . . . . . . . . . . . . . . . . . . . 131

Stefan Berghofer, Lukas Bulwahn, and Florian Haftmann
X Table of Contents

Formalizing the Logic-Automaton Connection . . . . . . . . . . . . . . . . . . . . . . . 147

Stefan Berghofer and Markus Reiter
Extended First-Order Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Chad E. Brown and Gert Smolka
Formalising Observer Theory for Environment-Sensitive Bisimulation . . . 180
Jeremy E. Dawson and Alwen Tiu
Formal Certification of a Resource-Aware Language Implementation . . . . 196
Javier de Dios and Ricardo Peña
A Certified Data Race Analysis for a Java-like Language . . . . . . . . . . . . . . 212
Frédéric Dabrowski and David Pichardie
Formal Analysis of Optical Waveguides in HOL . . . . . . . . . . . . . . . . . . . . . . 228
Osman Hasan, Sanaz Khan Afshar, and Sofiène Tahar
The HOL-Omega Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Peter V. Homeier
A Purely Definitional Universal Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Brian Huffman
Types, Maps and Separation Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
Rafal Kolanski and Gerwin Klein
Acyclic Preferences and Existence of Sequential Nash Equilibria: A
Formal and Constructive Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Stéphane Le Roux
Formalising FinFuns – Generating Code for Functions as Data from
Isabelle/HOL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
Andreas Lochbihler
Packaging Mathematical Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
François Garillot, Georges Gonthier, Assia Mahboubi, and
Laurence Rideau
Practical Tactics for Separation Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
Andrew McCreight
Verified LISP Implementations on ARM, x86 and PowerPC . . . . . . . . . . . 359
Magnus O. Myreen and Michael J.C. Gordon
Trace-Based Coinductive Operational Semantics for While: Big-Step
and Small-Step, Relational and Functional Styles . . . . . . . . . . . . . . . . . . . . 375
Keiko Nakata and Tarmo Uustalu
A Better x86 Memory Model: x86-TSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
Scott Owens, Susmit Sarkar, and Peter Sewell
Table of Contents XI

Formal Veriﬁcation of Exact Computations Using Newton’s Method . . . . 408

Nicolas Julien and Ioana Paşca

Construction of Büchi Automata for LTL Model Checking Veriﬁed in

Isabelle/HOL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
Alexander Schimpf, Stephan Merz, and Jan-Georg Smaus

A Hoare Logic for the State Monad: Proof Pearl . . . . . . . . . . . . . . . . . . . . . 440

Wouter Swierstra

Certiﬁcation of Termination Proofs Using CeTA . . . . . . . . . . . . . . . . . . . . . . . 452

René Thiemann and Christian Sternagel

A Formalisation of Smallfoot in HOL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469

Thomas Tuerk

Liveness Reasoning with Isabelle/HOL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485

Jinshuang Wang, Huabing Yang, and Xingyuan Zhang

Mind the Gap: A Veriﬁcation Framework for Low-Level C . . . . . . . . . . . . . 500

Simon Winwood, Gerwin Klein, Thomas Sewell, June Andronick,
David Cock, and Michael Norrish

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517

Let’s Get Physical: Models and Methods for
Real-World Security Protocols

David Basin, Srdjan Capkun, Patrick Schaller, and Benedikt Schmidt

ETH Zurich, 8092 Zurich, Switzerland

Abstract. Traditional security protocols are mainly concerned with key

establishment and principal authentication and rely on predistributed
keys and properties of cryptographic operators. In contrast, new appli-
cation areas are emerging that establish and rely on properties of the
physical world. Examples include protocols for secure localization, dis-
tance bounding, and device pairing.
We present a formal model that extends inductive, trace-based ap-
proaches in two directions. First, we reﬁne the standard Dolev-Yao model
to account for network topology, transmission delays, and node posi-
tions. This results in a distributed intruder with restricted, but more
realistic, communication capabilities. Second, we develop an abstract
message theory that formalizes protocol-independent facts about mes-
sages, which hold for all instances. When verifying protocols, we in-
stantiate the abstract message theory, modeling the properties of the
cryptographic operators under consideration. We have formalized this
model in Isabelle/HOL and used it to verify distance bounding protocols
where the concrete message theory includes exclusive-or.

1 Introduction
Situating Adversaries in the Physical World. There are now over three decades of
research on symbolic models and associated formal methods for security protocol
verification. The models developed represent messages as terms rather than bit
strings, take an idealized view of cryptography, and focus on the communication
of agents over a network controlled by an active intruder. The standard intruder
model used, the Dolev-Yao model, captures the above aspects. Noteworthy for
our work is that this model abstracts away all aspects of the physical environ-
ment, such as the location of principals and the speed of the communication
medium used. This is understandable: the Dolev-Yao model was developed for
authentication and key-exchange protocols whose correctness is independent of
the principals’ physical environment. Abstracting away these details, effectively
by identifying the network with the intruder, results in a simpler model that is
adequate for verifying such protocols.
With the emergence of wireless networks, protocols have been developed whose
security goals and assumptions differ from those in traditional wireline networks.
A prominent example is distance bounding [1,2,3,4,5], where one device must
determine an upper bound on its physical distance to another, potentially un-
trusted, device. The goal of distance bounding is neither message secrecy nor

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 1–22, 2009.

c Springer-Verlag Berlin Heidelberg 2009
2 D. Basin et al.

authentication, but rather to establish a physical property. To achieve this, dis-

tance bounding protocols typically combine cryptographic guarantees, such as
message-origin authentication, with properties of the physical (communication)
layer, for example that attackers cannot relay messages between locations faster
than the speed of light. Other examples of “physical protocols” include secure
time synchronization, wormhole and neighborhood detection, secure localization,
broadcast authentication, and device pairing.
In [6], we presented the first formal model that is capable of modeling and
reasoning about a wide class of physical protocols and their properties. The key
idea is to reflect relevant aspects of the physical world in the model, namely net-
work topology, transmission delays, and node positions. In particular, all agents
are modeled as network nodes. This includes the intruder, who is no longer a
single entity but instead is distributed and therefore corresponds to a set of
nodes. Communication between nodes is subject to restrictions reflecting the
nodes’ physical environment and communication capabilities. For example, not
all nodes can communicate and communication takes time determined by the
network topology and the propagation delays of the communication technologies
used. Hence, nodes require time to share their knowledge and information cannot
travel at speeds faster than the speed of light. Possible communication histories
are formalized as traces and the resulting model is an inductively-defined, sym-
bolic, trace-based model, along the lines of Paulson’s Inductive Approach [7].
In [6], we formalized this model in Isabelle/HOL [8] and verified the security
properties of three physical protocols: an authenticated ranging protocol [9], a pro-
tocol for distance bounding using ultrasound [5], and a broadcast-authentication
protocol based on delayed key disclosure [10].

Verifying distance bounding protocols. Our starting point in this paper is a family
of distance bounding protocols proposed by Meadows [4]. The family is defined
by a protocol pattern containing a function variable F , where different instances
of F result in different protocols. We present two security properties, which
distinguish between the cases of honest and dishonest participants. For each
property, we reduce the security of a protocol defined by an instance of F to
conditions on F . Afterwards, we analyze several instances of F , either showing
that the conditions are fulfilled or presenting counterexamples to the security
properties.
This protocol family is interesting as a practically-relevant case study in apply-
ing our framework to formalize and reason about nontrivial physical protocols.
Moreover, it also illustrates how we can extend our framework (originally defined
over a free term algebra) to handle protocols involving equationally-defined op-
erators on messages and how this can be done in a general way. Altogether,
we have worked with five different protocols and two different message theories.
To support this, we have used Isabelle’s locales construct to formalize an ab-
stract message theory and a general theory of protocols. Within the locales, we
prove general, protocol-independent facts about (abstract) messages, which hold
when we subsequently instantiate the locales with our different concrete message
theories and protocols.
Let’s Get Physical: Models and Methods for Real-World Security Protocols 3

Contributions. First, we show that our framework for modeling physical security
protocols can be extended to handle protocols involving equationally-defined op-
erators. This results in a message theory extended with an XOR operator and
a zero element, consisting of equivalence classes of messages with respect to the
equational theory of XOR. We use normalized terms here as the representatives
of the equivalence classes. With this extension, we substantially widen the scope
of our approach. Note that this extension is actually independent of our “phys-
ical” refinement of communication and also could be used in protocol models
based on the standard Dolev-Yao intruder.
Second, we show how such extensions can be made in a generic, modular way.
Noteworthy here is that we could formulate a collection of message-independent
and protocol-independent facts that hold for a large class of intended extensions.
An example of such a fact is that the minimal message-transmission time between
two agents A and B determines a lower bound on the time difference between
A creating a fresh nonce and B learning it.
Finally, physical protocols often contain time-critical steps, which must be op-
timized to reduce computation and communication time. As a result, these steps
typically employ low-level operations like XOR, in contrast to more conventional
protocols where nanosecond time differences are unimportant. Our experience
indicates that the use of such low-level, equationally-defined operators results
in substantial additional complexity in reasoning about protocols in compari-
son to the standard Dolev-Yao model. Moreover, the complexity is also higher
because security properties are topology dependent and so are attacks. Attacks
now depend not only on what the attackers know, but also their own physical
properties, i.e., the possible constellations of the distributed intruders. Due to
this complexity, pencil-and-paper proofs quickly reach their limits. Our work
highlights the important role that Formal Methods can play in the systematic
development and analysis of physical protocols.

Organization. In Section 2, we provide background on Isabelle/HOL and the

distance bounding protocols that we analyze in this paper. In Section 3, we
present our formal model of physical protocols, which we apply in Section 4.
Finally, in Section 5, we discuss related work and draw conclusions.

2 Background
2.1 Isabelle/HOL
Isabelle [8] is a generic theorem prover with a specialization for higher-order logic
(HOL). We will avoid Isabelle-specific details in this paper as far as possible or
explain them in context, as needed.
We briefly review two aspects of Isabelle/HOL that are central to our work.
First, Isabelle supports the definition of (parameterized) inductively-defined sets.
An inductively-defined set is defined by sets of rules and denotes the least set
closed under the rules. Given an inductive definition, Isabelle generates a rule
for proof by induction.
4 D. Basin et al.

Second, Isabelle provides a mechanism, called locales [11] that can be used
to structure generic developments, which can later be specialized. A locale can
be seen as either a general kind of proof context or, alternatively, as a kind of
parameterized module. A locale declaration contains:
– a name, so that the locale can be referenced and used,
– typed parameters, e.g., ranging over relations or functions,
– assumptions about the parameters (the module axioms), and
– functions defined using the parameters.
In the context of a locale, one can make definitions and prove theorems that
depend on the locale’s assumptions and parameters. Finally, a locale can be
interpreted by instantiating its parameters so that the assumptions are theorems.
After interpretation, not only can the assumptions be used for the instance, but
also all theorems proved and definitions made in the locale’s context.

2.2 Distance Bounding Protocols

Distance bounding protocols are two-party protocols involving a verifier who
must establish a bound on his distance to a prover. These protocols were origi-
nally introduced in [1] to prevent a man-in-the-middle attack called Mafia Fraud.
Suppose, for example, that an attacker possesses a fake automated teller ma-
chine (ATM). When a user uses his banking card to authenticate himself to
the fake ATM, the attacker simply forwards the authenticating information to
a real ATM. After successful authentication, the attacker can plunder the user’s
account. Distance bounding protocols prevent this attack by determining an up-
per bound on the distance between the ATM and the banking card. The ATM
is the verifier and checks that the card, acting as the prover, is sufficiently close
by to rule out the man-in-the-middle.
The idea behind distance bounding is simple. The verifier starts by sending
a challenge to the prover. The prover’s reply contains an authenticated message
involving the challenge, which shows that it has been received by the prover.
After receiving the reply, the verifier knows that the challenge has traveled back
and forth between him and the prover. Assuming that the signal encoding the
challenge travels with a known speed, the verifier can compute an upper bound
on the distance to the prover by multiplying the measured round-trip time of
his challenge by the signal’s velocity.
For distance bounding to yield accurate results, the verifier’s round-trip time
measurement should correspond as close as possible to the physical distance
between the prover and verifier. This is achieved by having the prover generate
his response as quickly as possible. Expensive cryptographic operations such as
digital signatures should therefore be avoided. A distance bounding protocol can
typically be decomposed into three phases: a setup phase, a measurement phase,
and a validation phase. Only the measurement phase is time-critical. The prover
makes computationally inexpensive operations, such as XOR, during this phase
and may use more sophisticated cryptographic algorithms, such as commitment
schemes and message-authentication codes, in the other phases.
Let’s Get Physical: Models and Methods for Real-World Security Protocols 5

V P

V,request
Setup Phase /

NV /
Measurement Phase
F (NV,NP,P )
o
Validation Phase
P,NP,NV,MAC K (P,NP,NV )
o VP

Fig. 1. Pattern for Distance Bounding Protocols

In [4], Meadows et al. present a suite of distance bounding protocols, follow-

ing the pattern shown in Diagram 1. Here, V denotes the verifier and P is the
prover. Both parties initially create nonces N V and N P . V then sends a request
to P , followed by a nonce N V . Upon receiving N V , P replies as quickly as pos-
sible with F (N V, N P, P ), where F is instantiated with an appropriate function.
Finally P uses a key KV P shared with V to create a message-authentication
code (MAC). This proves that the nonce N P originated with P and binds the
reply in the measurement phase to P ’s identity.
This protocol description is schematic in F . [4] provides four examples of
instantiations of F (N V, N P, P ) built from different combinations of concatena-
tion, exclusive-or, and hashing, e.g. (N V ⊕P, N P ) or, even simpler, (N V, N P, P ).
Each instantiation uses only simple cryptographic operations, which could even
be implemented in hardware to further reduce their computation time.
The security property we want to prove is: “If V has successfully finished a
protocol run with P , then V ’s conclusion about the distance to P is an upper
bound on the physical distance between the two nodes.” We will formalize this
property, along with associated provisos, in subsequent sections.

3 Formal Model
In this section, we present our model of physical protocols. To support the verifi-
cation of multiple protocols, we use locales to parameterize our model both with
respect to the concrete protocol and message theory. Figure 2 depicts the theories
we formalized in Isabelle and their dependencies. Some of these theories are con-
crete to begin with (e.g. Geometric Properties of R3 ) whereas other theories con-
sist of locales or their interpretations. For example, the Abstract Message Theory
contains a locale describing message theories, which is interpreted in our two con-
crete message theories (Free and XOR). In the theory Parametrized Communica-
tion Systems, we abstractly define the set of valid traces as a set of (parametric)
inductive rules. In formalizations of concrete protocols using either of the two
concrete message theories, we can therefore use both message-theory indepen-
dent and message-theory specific facts by importing the required theories.
6 D. Basin et al.

Agents & Geometric

Physical Properties
Environment of R3

Abstract
Message
Theory

Free Parameterized XOR

Message Communication Message
Theory Systems Theory

Protocol
Independent
Properties

Ultrasound TESLA Meadows

Authenticated
Distance Broadcast Distance
Ranging
Bounding Authentication Bounding

Fig. 2. Dependency Graph of our Isabelle Theory Files

3.1 Agents and Environment

Agents are either honest agents or dishonest intruders. We model each kind
using the natural numbers nat. Hence there are inﬁnitely many agents of each
kind.
datatype agent = Honest nat | Intruder nat

We refer to agents using capital letters like A and B. We also write HA and HB
for honest agents and IA and IB for intruders, when we require this distinction.
In contrast to the Dolev-Yao setting, agents’ communication abilities are subject
to the network topology and physical laws. Therefore, we cannot reduce a set of
dishonest users at diﬀerent locations to a single one.

Location and Physical Distance. To support reasoning about physical proto-

cols, we associate every node A with a location loc A . We define loc : agent → R3
as an uninterpreted function constant. Protocol-specific assumptions about the
position of nodes can be added as local assumptions to the corresponding the-
orems or using Isabelle/HOL’s specification mechanism.1 We use the standard
Euclidean metric on R3 to define the physical distance between two agents A
and B as | loc A − loc B |.
Taking the straight-line distance between the locations of the agents A and
B in R3 as the shortest path (taken for example by electromagnetic waves
1
Definition by specification allows us to assert properties of an uninterpreted function.
It uses Hilbert’s -operator and requires a proof that a function with the required
properties exists.
Let’s Get Physical: Models and Methods for Real-World Security Protocols 7

when there are no obstacles), we deﬁne the line-of-sight communication distance

cdist LoS : agent × agent → R as
| loc A − loc B |
cdist LoS (A, B) = ,
c
where c is the speed of light. Note that cdist LoS depends only on A and B’s
location and is independent of the network topology.

Transmitters, Receivers, and Communication Distance. To distinguish

communication technologies with diﬀerent characteristics, we equip each agent
with an indexed set of transmitters.
datatype transmitter = Tx agent nat
The constructor Tx returns a transmitter, given an agent A and an index i,
denoted Tx iA . Receivers are formalized analogously.
datatype receiver = Rx agent nat

We model the network topology using the uninterpreted function constant

cdist Net : transmitter × receiver → R≥0 ∪ {⊥}. We use cdist Net (Tx iA , Rx jB ) =
⊥ to denote that Rx jB cannot receive transmissions from Tx iA . In contrast,
cdist Net (Tx iA , Rx jB ) = t, where t = ⊥, describes that Rx jB can receive signals
(messages) emitted by Tx iA after a delay of at least t time units. This function
models the minimal signal-transmission time for the given configuration. This
time reflects environmental factors such as the communication medium used by
the given transceivers and obstacles between transmitters and receivers. Since
we assume that information cannot travel faster than the speed of light, we al-
ways require that cdist LoS (A, B) ≤ cdist Net (Tx iA , Rx jB ) using the specification
mechanism.
We use the formalization of real numbers and vectors provided in Isabelle’s
standard library for time and location. Additionally, we use the formalization of
the Cauchy-Schwarz inequality [12] to establish that cdist LoS is a pseudometric.

3.2 Messages
Instead of restricting our model to a concrete message theory, we first define a
locale that specifies a collection of message operators and their properties. In
the context of this locale, we prove a number of properties independent of the
protocol and message theory. For example, cdist LoS (A, B) is a lower bound on
the time required for a nonce freshly created by A to become known by another
agent B, since the nonce must be transmitted. For the results in [6], we have
instantiated the locale with a message theory similar to Paulson’s [7], modeling
a free term algebra. In Section 3.6, we describe the instantiation with a message
theory that includes the algebraic properties of the XOR operator, which we use
in Section 4.
The theory of keys is shared by all concrete message theories and reuses
Paulson’s formalization. Keys are represented by natural numbers. The function
8 D. Basin et al.

inv : key → key partitions the set of keys into symmetric keys, where inv k = k,
and asymmetric keys. We model key distributions as functions from agents to
keys, e.g. the theory assumes that KAB returns a shared symmetric key for a
pair of agents A and B.

Abstract Message Theory. Our Message Theory locale is parametric in

the message type ’msg and consists of the following function constants.

Nonce : agent → nat → ’msg Key : key → ’msg

Int : int → ’msg Real : real → ’msg
Hash : ’msg → ’msg Crypt : key → ’msg → ’msg
MPair : ’msg → ’msg → ’msg parts : ’msg set → ’msg set
subterms : ’msg set → ’msg set dm : agent → ’msg set → ’msg set

This formalizes that every interpretation of the Message Theory locale de-
fines the seven given message construction functions and three functions on
message sets. A Nonce is tagged with a unique identifier and the name of the
agent who created it. This ensures that independently created nonces never col-
lide. Indeed, even colluding intruders must communicate to share a nonce. The
constructor Crypt denotes signing, asymmetric, or symmetric encryption, de-
pending on the key used. We also require that functions for pairing (MPair ),
hashing (Hash), integers (Int ), and reals (Real ) are defined. We use the ab-
breviations A, B for MPair A B and {m}k for Crypt k m. Moreover, we de-
fine MAC k (m) = HashKey k, m as the keyed MAC of the message m and
MACM k (m) = MAC k (m), m as the pair consisting of m and its MAC . Ad-
ditionally, every interpretation of Message Theory must define the functions
subterms, parts, and dm. These respectively formalize the notions of subterms,
extractable subterms, and the set of messages derivable from a set of known mes-
sages by a given agent. In the free message theory, subterms corresponds to syn-
tactic subterms, for example x ∈ subterms({Hash x}) while x ∈ / parts({Hash x}).
We assume that the following properties hold for any interpretation of parts.

X∈H X ∈ parts(H) G⊆H

X ∈ parts(H) ∃Y ∈ H.X ∈ parts({Y }) parts(G) ⊆ parts(H)

parts(parts(H)) = parts(H) parts(H) ⊆ subterms(H)

These properties allow us to derive most of the lemmas about parts from Paul-
son’s formalization [7] in our abstract setting. For example,

parts(G) ∪ parts(H) = parts(G ∪ H) .

Similar properties are assumed to hold for the subterms function.

We also assume properties of the message-derivation operator dm that state
that no agent can guess another agent’s nonces or keys, or forge encryptions
Let’s Get Physical: Models and Methods for Real-World Security Protocols 9

or MAC s. These assumptions are reasonable for message theories formalizing

idealized encryption.

Nonce B NB ∈ subterms(dm A H) A = B Key k ∈ parts(dm A H)

Nonce B NB ∈ subterms(H) Key k ∈ parts(H)

{m}k ∈ subterms(dm A H)
{m}k ∈ subterms(H) ∨ Key k ∈ parts(H)

MAC k (m) ∈ subterms(dm A H)

MAC k (m) ∈ subterms(H) ∨ Key k ∈ parts(H)

3.3 Events and Traces

We distinguish between three types of events: an agent sending a message, re-

ceiving a message, or making a claim. We use a polymorphic data type to model
these diﬀerent message types.

datatype ’msg event = Send transmitter ’msg (’msg list )

| Recv receiver ’msg | Claim agent ’msg

A trace is a list of timed events, where a timed event (t, e) ∈ real × event pairs
a time-stamp with an event.
A timed event (tS , Send Tx iA m L) denotes that the agent A has sent the
message m using his transmitter Tx iA at time tS and has associated the protocol
data L with the event. The list of messages L models local state information
and contains the messages used to construct m. The sender may require these
messages in subsequent protocol steps. Storing L with the Send event is necessary
since we support non-free message construction functions like XOR where a
function’s arguments cannot be recovered from the function’s image alone.
A send event like the above may result in multiple timed Recv -events of the
form (tR , Recv RxjB m), where the time-stamps tR and the receivers RxjB must
be consistent with the network topology. Note that the protocol data stored in
L when sending the message does not aﬀect the events on the receiver’s side.
A Claim-event models a belief or conclusion made by a protocol participant,
formalized as a message. For example, after successfully completing a run of a
distance bounding protocol with a prover P , the veriﬁer V concludes at time t
that d is an upper bound on the distance to P . We model this by adding the
timed event (t, Claim V P, Real d ) to the trace. The protocol is secure if the
conclusion holds for all traces containing this claim event.
Note that the time-stamps used in traces and the rules use the notion of
absolute time. However, agents’ clocks may deviate arbitrarily from absolute
time. We must therefore translate the absolute time-stamps to model the local
views of agents. We describe this translation in Section 3.4.
10 D. Basin et al.

tr ∈ Tr tR ≥ maxtime(tr)
(t , Send Tx iA m L) ∈ tr
S

cdist Net (Tx iA , Rx jB ) = tAB tr ∈ Tr t ≥ maxtime(tr)

tAB = ⊥ tR ≥ tS + tAB m ∈ dm IA (knowsIA (tr))
Net Fake
tr.(tR , Recv Rx jB m) ∈ Tr tr.(t, Send Tx kIA m [ ]) ∈ Tr

tr ∈ Tr t ≥ maxtime(tr) step ∈ proto

(act, m) ∈ step(view (HA , tr), HA , ctime(HA , t))
m ∈ dm HA (knowsHA (tr))
Nil Proto
[ ] ∈ Tr tr.(t, translateEv (HA , act, m)) ∈ Tr

Fig. 3. Rules for Tr

Knowledge and Used Messages. Each agent A initially possesses some

knowledge, denoted initKnows A , which depends on the protocol executed. We
use locales to underspecify the initial knowledge. We define a locale InitKnows
that only includes the constant initKnows : agent → ’msg set. Different key
distributions are specified by locales extending InitKnows with additional as-
sumptions. For example, the locale InitKnows Shared assumes that any two
agents A and B share a secret key Key KAB . In a system run with trace
tr, A’s knowledge consists of all messages he received together with his initial
knowledge.

knows A (tr) = {m |∃ k t.(t, Recv Tx kA m) ∈ tr} ∪ initKnows A

Each agent can derive all messages in the set dm A (knowsA (tr)) by applying the
derivation operator to the set of known messages. We use the subterms function
to deﬁne the set of messages used in a trace tr.

used(tr) = {n | ∃ A k t m L.(t, Send Tx kA m L) ∈ tr ∧ n ∈ subterms({m})}

A nonce is fresh for a trace tr if it is not in used (tr). Note that since a nonce is
not fresh if its hash has been sent, we cannot use parts instead of subterms in
the above deﬁnition.

3.4 Network, Intruder, and Protocols

We now describe the rules used to inductively deﬁne the set of traces Tr for a sys-
tem parameterized by a protocol proto, an initial knowledge function initKnows,
and the parameters from the abstract message theory. The base case, modeled
by the Nil rule in Figure 3, states that the empty trace is a valid trace for
all protocols. The other rules describe how valid traces can be extended. The
rules model the network behavior, the possible actions of the intruders, and the
actions taken by honest agents following the protocol steps.
Let’s Get Physical: Models and Methods for Real-World Security Protocols 11

Network Rule. The Net-rule models message transmission from transmitters

to receivers, constrained by the network topology as given by cdist Net . A Send-
event from a transmitter may induce a Recv-event at a receiver only if the
receiver can receive messages from the transmitter as specified by cdist Net . The
time delay between these events is bounded below by the communication distance
between the transmitter and the receiver.
If there is a Send-event in the trace tr and the Net-rule’s premises are fulfilled,
a corresponding Recv -event is appended (denoted by xs.x) to the trace. The
restriction on connectivity and transmission delay are ensured by tAB = ⊥ and
tR ≥ tS + tAB . Here, tAB is the communication distance between the receiver
and transmitter, tS is the sending time, and tR is the receiving time.
Note that one Send -event can result in multiple Recv-events at the same
receiver at different times. This is because cdist Net models the minimal com-
munication distance and messages may also arrive later, for example due to the
reflection of the signal carrying the message. Moreover, a Send -event can result
in multiple Recv-events at different receivers, modeling for example broadcast
communication. Finally, note that transmission failures and jamming by an in-
truder, resulting in message loss, are captured by not applying the Net-rule for
a given Send-event and receiver, even if all premises are fulfilled.
The time-stamps associated with Send-events and Recv -events denote the
starting times of message transmission and reception. Thus, our network rule
captures the latency of links, but not the message-transmission time, which also
depends on the message’s size and the transmission speed of the transmitter and
the receiver. Some implementation-specific attacks, such as those described in
[13,5], are therefore not captured in our model.
The premise t ≥ maxtime(tr), included in every rule (except Nil), ensures
that time-stamps increase monotonically within each trace. Here t denotes the
time-stamp associated with the new event and maxtime(tr) denotes the latest
time-stamp in the trace tr. This premise guarantees that the partial order on
events induced by their time-stamps (note that events can happen simultane-
ously) is consistent with the order of events in the list representing the trace.

Intruder Rule. The Fake-rule in Figure 3 describes the intruders’ behavior. An

intruder can always send any message m derivable from his knowledge. Intruders
do not need any protocol state since they behave arbitrarily.
Since knowledge is distributed, we use explicit Send-events and Recv -events
to model the exchange of information between colluding intruders. With an ap-
propriate cdist Net function, it is possible to model an environment where the
intruders are connected by high-speed links, allowing them to carry out worm-
hole attacks. Restrictions on the degree of cooperation between intruders can be
modeled as predicates on traces. Internal and external attackers are both cap-
tured since they diﬀer only in their initial knowledge (or associated transceivers).

Protocols. In contrast to intruders who can send arbitrary derivable messages,

honest agents follow the protocol. A protocol is deﬁned by a set of step functions.
12 D. Basin et al.

Each step function takes the local view and time of an agent as input and returns
all possible actions consistent with the protocol speciﬁcation.
There are two types of possible actions, which model an agent either sending
a message with a given transmitter id and storing the associated protocol data
or making a claim.

datatype ’msg action = SendA nat (’msg list ) | ClaimA

Note that message reception has already been modeled by the Net-rule.
An action associated with an agent and a message can be translated into the
corresponding trace event using the translateEv function.

translateEv (A, SendA k L, m) = Send Tx kA m L

translateEv (A, ClaimA , m) = Claim A m

A protocol step is therefore of type agent × trace × real → (action × msg) set.
Since our protocol rule Proto (described below) is parameterized by the proto-
col, we define a locale Protocol that defines a constant proto of type step set
and inductively define Tr in the context of this locale.
Since the actions of an agent A only depend on his own previous actions and
observations, we define A’s view of a trace tr as the projection of tr on those
events involving A. For this purpose, we introduce the function occursAt, which
maps events to associated agents, e.g. occursAt(Send Tx iA m L) = A.

view (A, tr) = [(ctime(A, t), ev) |(t, ev) ∈ tr ∧ occursAt(ev) = A]

Since the time-stamps of trace events refer to absolute time, the view function
accounts for the offset of A’s clock by translating times using the ctime function.
Given an agent and an absolute time-stamp, the uninterpreted function ctime :
agent × real → real returns the corresponding time-stamp for the agent’s clock.
Using the above definitions, we define the Proto-rule in Figure 3. For a given
protocol, specified as a set of the step functions, the Proto rule describes all
possible actions of honest agents, given their local views of a valid trace tr at
a given time t. If all premises are met, the Proto-rule appends the translated
event to the trace. Note that agents’ behavior, modeled by the function step, is
based only on the local clocks of the agents, i.e., agents cannot access the global
time. Moreover, the restriction that all messages must be in dm HA (knowsHA (tr))
ensures that agents only send messages derivable from their knowledge.

3.5 Protocol-Independent Results

The set of all possible traces T r is inductively defined by the rules Nil, Net,
Fake, and Proto in the context of the Message Theory, InitKnows, and
Protocol locales. To verify a concrete protocol, we instantiate these locales
thereby defining the concrete set of traces for the given protocol, initial knowl-
edge, and message theory. Additional requirements are specified by defining new
locales that extend Protocol and InitKnows.
Let’s Get Physical: Models and Methods for Real-World Security Protocols 13

Our first lemma specifies a lower bound on the time between when an agent
first uses a nonce and another agent later uses the same nonce. The lemma holds
whenever the initial knowledge of all agents does not contain any nonces.
Lemma 3.1. Let A be an arbitrary (honest or dishonest) agent, N an arbitrary
nonce, and (tSA , Send Tx iA mA LA ) the first event in a trace tr where N ∈
subterms {mA }. If tr contains an event (t, Send Tx jB mB LB ) or (t, Recv Rx jB
mB ) where A = B and N ∈ subterms {mB }, then t − tSA ≥ cdist LoS (A, B).
Our next lemma holds whenever agents’ keys are not parts of protocol messages
and concerns when MACs can be created. Note that we need the notion of
extractable subterms here since protocols use keys in MACs, but never send
them in extractable positions.
Lemma 3.2. Let A and B be honest agents and C a different possibly dishon-
est agent. Furthermore let (tSC , Send Tx iC mC LC ) be an event in the trace
tr where MAC KAB (m) ∈ subterms {mC } for some message m and a shared
secret key KAB . Then, for E either equal to A or B, there is a send event
(tSE , Send Tx jE mE LE ) ∈ tr where MAC KAB (m) ∈ subterms {mE } and
tSC − tSE ≥ cdist LoS (E, C).
Note that the lemmas are similar to the axioms presented in [4]. The proofs of
these lemmas can be found in our Isabelle/HOL formalization [14].

3.6 XOR Message Theory

In this section, we present a message theory including XOR, which instantiates
the message-theory locale introduced in Section 3.2.

The Free Message Type. We ﬁrst deﬁne the free term algebra of messages.
Messages are built from agent names, integers, reals, nonces, keys, hashes, pairs,
encryption, exclusive-or, and zero.

datatype fmsg = AGENT agent | INT int | REAL real

To faithfully model ⊕,
¯ we require the following set of equations E:

(x ⊕
¯ y) ⊕
¯ z ≈ x ⊕(y
¯ ⊕ ¯ z) (A) x⊕¯ y ≈y⊕¯ x (C)
x⊕ ¯ ZERO ≈ x (U) x⊕
¯ x ≈ ZERO (N)

We deﬁne the corresponding equivalence relation =E as the reﬂexive, symmetric,

transitive, and congruent closure of E. We also deﬁne the reduction relation →E
as the reﬂexive, transitive, and congruent closure of E, where the cancellation
rules (U) and (N) are directed from left to right and (A) and (C) can be used in
both directions. Note that x →E y implies x =E y, for all x and y.
14 D. Basin et al.

Agent Int Real

reduced (AGENT a) reduced (INT i) reduced (REAL i)

reduced h
Nonce Hash
reduced (NONCE a na) reduced (HASH h)

reduced a reduced b reduced m

MPair Crypt
reduced (MPAIR a b) reduced (CRYPT k m)

reduced a reduced b standard a a < first b b = ZERO

Xor
reduced (a ⊕
¯ b)

Fig. 4. Rules for reduced

Reduced Messages and the Reduction Function. We deﬁne the predicate

standard on fmsg that returns true for all messages where the outermost con-
structor is neither equal to ⊕¯ nor ZERO. We also define the projection function
first, where first x equals a when x = a ⊕ ¯ b for some a and b and equals x
otherwise. We use both functions to define reduced messages. We show below
that every equivalence class with respect to =E contains exactly one reduced
message, used as the classes canonical representative. To handle commutativity,
we define a linear order on fmsg using the underlying orders on nat, int, and
agent . A message is reduced if ⊕¯ messages are right-associated, ordered, and all
cancellation rules have been applied. This is captured by the inductive definition
in Figure 4.
To obtain a decision procedure for x =E y, we define a reduction function ↓
on fmsg that reduces a message, that is x↓ is reduced and (x↓) =E x. We begin
with the definition of an auxiliary function ⊕? : fmsg → fmsg: a ⊕? b = if b =
ZERO then a else a ⊕ ¯ b.
We now define the main part of the reduction: the function ⊕↓ : fmsg →
fmsg → fmsg presented in Figure 5 combines two reduced messages a and b,
yielding (a ⊕¯ b)↓. Note that the order of the equations is relevant: given overlap-
ping patterns, the first applicable equation is used. The algorithm is similar to
a merge-sort on lists. The first two cases are straightforward and correspond to
the application of the (U) rule. The other cases are justified by combinations of
all four rules.
The definition of (·)↓ : fmsg → fmsg is straightforward:

(HASH m)↓ = HASH m↓

(MPAIR a b)↓ = MPAIR (a↓) (b↓)
(CRYPT k m)↓ = CRYPT k (m↓)
(a ⊕
¯ b)↓ = (a↓) ⊕↓ (b↓)
x↓ =x
Let’s Get Physical: Models and Methods for Real-World Security Protocols 15

x ⊕↓ ZERO =x (1)
↓
ZERO ⊕ x =x (2)
↓ ↓
(a1 ⊕
¯ a2 ) ⊕ (b1 ⊕
¯ b2 ) = if a1 = b1 then a2 ⊕ b2 (3)
↓
else if a1 < b1 then a1 ⊕ (a2 ⊕ (b1 ⊕
? ¯ b2 )) (4)
? ¯ a 2 ) ⊕ ↓ b2 )
else b1 ⊕ ((a1 ⊕ (5)
¯ a2 ) ⊕↓ b
(a1 ⊕ = if a1 = b then a2 (6)
↓
else if a1 < b then a1 ⊕ (a2 ⊕ b)
?
(7)
¯ 1⊕
else b ⊕(a ¯ a2 ) (8)
↓ ↓
a ⊕ (b1 ⊕
¯ b2 ) ¯ b2 ) ⊕ a
= (b1 ⊕ (9)
↓
a⊕ b = if a = b then ZERO (10)
else if a < b then a ⊕
¯ b else b ⊕
¯a (11)

Fig. 5. Deﬁnition of ⊕↓

We have proved the following facts about reduction: (1) if reduced x then
(x↓) = x, (2) reduced (x↓), and (3) x →E (x↓). Using these facts we establish:
Lemma 3.3. For all messages x and y, x =E y iﬀ (if and only if ) (x↓) = (y↓).
Furthermore, if reduced x and reduced y, then x =E y iﬀ x = y.

The Message Type, Parts, and dm. Given the above lemma, we use the
function ↓ and the predicate reduced to characterize =E . Isabelle’s typedef mech-
anism allows us to define the quotient type msg with {m | reduced m} as the
representing set. This defines a new type msg with a bijection between the rep-
resenting set in fmsg and msg given by the function Abs msg : fmsg → msg
and its inverse Rep msg : msg → fmsg. Note that =E on fmsg corresponds to
object-logic equality on msg. This is reflected in the following lemma.
Lemma 3.4. For all messages x and y, x = y iff Rep msg(x) =E Rep msg(y).
We define functions on msg by using the corresponding definitions on fmsg and
the embedding and projection functions. That is, we lift the message constructors
to msg using the ↓ function. For example:

Nonce a n = Abs msg(NONCE a n)

MPair a b = Abs msg(MPAIR (Rep msg(a)) (Rep msg(b)))
Hash m = Abs msg(HASH (Rep msg(m)))
Xor a b = Abs msg((Rep msg(a) ⊕¯ Rep msg(b))↓)
Zero = Abs msg(ZERO )

In the following, we write 0 for Zero and x ⊕ y for Xor x y. We deﬁne a function
fparts on fmsg that returns all extractable subterms of a given message, e.g.
m ∈ fparts({CRYPT k m}), but m ∈ / fparts({HASH m}). The function parts
16 D. Basin et al.

m∈M m ∈ dm A (M )
inj zero hash
m ∈ dm A (M ) 0 ∈ dm A (M ) Hash m ∈ dm A (M )

m, n ∈ dm A (M ) m, n ∈ dm A (M )
fst snd
m ∈ dm A (M ) n ∈ dm A (M )

m ∈ dm A (M ) n ∈ dm A (M ) m ∈ dm A (M ) n ∈ dm A (M )
pair xor
m, n ∈ dm A (M ) m ⊕ n ∈ dm A (M )

m ∈ dm A (M ) Key k ∈ DMA (M )
enc nonce
{m}k ∈ dm A (M ) Nonce A n ∈ dm A (M )

{m}k ∈ dm A (M ) (Key k)−1 ∈ dm A (M )

dec agent
m ∈ dm A (M ) Agent a ∈ dm A (M )

int real
Int n ∈ dm A (M ) Real n ∈ dm A (M )

Fig. 6. Rules for dm A (M )

on msg that is used to instantiate the function of the same name in the message-
theory locale is then deﬁned as

parts(H) = {Abs msg(m) | m ∈ fparts {Rep msg(x) | x ∈ H}} .

This deﬁnes the parts of a message m in the equivalence class represented by

m↓ as the fparts of m↓. For example parts({X ⊕ X}) = {0}. The function
subterms is defined similarly, but returns all subterms and not just the ex-
tractable ones. We give the rules for the inductively-defined message-derivation
operator dm : agent → msg set → msg set in Figure 6. The rules specify message
decryption, projection on pairs, pairing, encryption, signing, hashing, XORing,
and the generation of integers, reals, agent names, and nonces. For example, the
Dec-rule states that if an agent A can derive the ciphertext {m}k and the de-
cryption key (Key k)−1 , then he can also derive the plaintext m. When Key k
is used as a signing key, A uses the verification key (Key k)−1 to verify the
signature. The Xor rule uses the constructor Xor , which ensures that the re-
sulting message is reduced. We can now interpret the locale Message Theory
by proving that the defined operators have the required properties.

4 Protocol Correctness

In this section, we present highlights from our veriﬁcation of instances of the

protocol pattern in Figure 1. Complete details are provided in [14]. We ﬁrst
formalize the protocol pattern and its desired security properties. Afterwards,
we reduce the security of pattern instances to properties of the rapid-response
Let’s Get Physical: Models and Methods for Real-World Security Protocols 17

function used. Finally, we consider several concrete rapid-response functions pro-

posed in [4] and analyze the security of the corresponding instantiations.

4.1 Protocol Rules

We conduct our security analysis using our XOR message theory. We first de-
fine the concrete initial knowledge as initKnows A = B {Key KAB }, which we
interpret as an instance of the InitKnows Shared locale. Next, we instan-
tiate the Protocol locale by defining the protocol pattern in Figure 1 as
proto = {mdb1 , mdb2 , mdb3 , mdb4 }. Each step function mdbi (A, tr, t) yields the
possible actions of the agent A executing the protocol step i with his view of the
trace tr at the local time t.
Our distance bounding protocol definition uses the uninterpreted rapid-response
function F : msg × msg × msg → msg and consists of the following steps.2
Start: An honest verifier V can start a protocol run by sending a fresh nonce
using his radio transmitter r at any local time t.
Nonce V NV ∈ / used(tr)
(SendA r [ ], Nonce V NV ) ∈ mdb1 (V, tr, t)
Rapid-Response: If a prover P receives a nonce NV , he may continue the pro-
tocol by replying with the message F (NV , NP , P ) and storing the protocol data
[NV , Nonce P NP ]. This information must be stored since it is needed in the au-
thentication step and cannot, in general, be reconstructed from F (NV , NP , P ).

P , Recv Rx P NV ) ∈ tr Nonce P NP ∈
r
(tR / used (tr)
(SendA r [NV , Nonce P NP], F (NV , Nonce P NP, Agent P )) ∈ mdb2 (P, tr, t)
Authentication: After a prover P has answered a veriﬁer’s challenge with a
rapid-response, he authenticates the response with the corresponding MAC.

P , Recv Rx P NV ) ∈ tr
r
(tR
(tSP , Send F (NV , Nonce P NP, Agent P ) [NV , Nonce P NP ]) ∈ tr
Tx rP
(SendA r [ ], MACM KV P (NV , Nonce P NP, Agent P )) ∈ mdb3 (P, tr, t)
Claim: Suppose the veriﬁer receives a rapid-response in the measurement phase
at time tR
1 and the corresponding MAC in the validation phase, both involving
the nonce that he initially sent at time tS1 . The veriﬁer therefore concludes that
1 −t1 )∗c/2 is an upper bound on the distance to the prover P , where c denotes
(tR S

the speed of light.

(tS1 , Send Tx rV (Nonce V NV ) [ ]) ∈ tr
(t1 , Recv (Rx rV ) F (Nonce V NV , NP, Agent P )) ∈ tr
R

(t2 , Recv Rx rV MACM KV P (Nonce V NV , NP , Agent P )) ∈ tr

(ClaimA, (Agent P, Real (tR 1 − t1 ) ∗ c/2)) ∈ mdb4 (V, tr, t)

2
We have formalized each step in Isabelle/HOL using set comprehension, but present
the steps here as rules for readability. For each rule r, the set we deﬁne by compre-
hension is equivalent to the set deﬁned inductively by the rule r.
18 D. Basin et al.

The set of traces Tr is inductively defined by the rules Nil, Fake, Net, and
Proto. Note that the same set of traces can be inductively defined by the Nil,
Fake, and Net rules along with rules describing the individual protocol steps.
See [6] for more details on these different representations.

4.2 Security Properties

In this section, we present properties of distance bounding protocols instantiating
the protocol pattern from Figure 1. First, we note that we only consider honest
verifiers. Since successful protocol execution leads to claims on the verifier’s
side, it makes no sense to consider dishonest verifiers. However, we distinguish
between honest and dishonest provers. For honest provers, we require that the
claimed distance after a successful protocol execution denoted by a Claim event
is an upper bound on the physical distance between the prover and the verifier.

Deﬁnition 4.1. A distance bounding protocol is secure for honest provers (hp-
secure) iﬀ whenever Claim V P, Real d ∈ tr, then d ≥ | loc V − loc P |.

In the case of a dishonest prover, it is impossible to distinguish between diﬀerent

intruders who exchange their keys. Hence, our weaker property must accommo-
date for attacks where one intruder pretends to be another intruder. We therefore
require that the claimed distance is an upper bound on the distance between the
veriﬁer and some intruder.

Deﬁnition 4.2. A distance bounding protocol is secure for dishonest provers

(dp-secure) iﬀ whenever Claim V P, Real d ∈ tr for an intruder P , then there
is an intruder P such that d ≥ | loc V − loc P |.

4.3 Security Proofs Based on Properties of F

In order to prove security properties of an instance of the protocol pattern, we
show how security properties of the protocol can be reduced to properties of F .
To prove the security of a concrete instantiation, we must then only prove that
it has the required properties.

Deﬁnition 4.3. In the following, let X, Y , and Z, be messages, m an atomic

message or a MAC , A, B, and C, agents, and NA, NB, and NC nonce tags.
We deﬁne the following properties for a function F : msg × msg × msg → msg:
(P0)
(a) If X ∈ H, then F (X, Nonce A NA, Agent A) ∈ dm A H.
(b) If m ∈ parts({F (X, Y, Z)}), then m ∈ parts({X, Y, Z}).
(c) F (X, Nonce A NA, Agent A) = Nonce C NC .
(d) F (X, Nonce A NA, Agent A) = MACM KBC (Y, Nonce C NC , Agent C).
(P1) Nonce A NA ∈ subterms(F (Nonce A NA, X, Agent B))
(P2) Nonce B NB ∈ subterms(F (X, Nonce B NB , Agent B))
(P3) Agent B ∈ subterms(F (Nonce A NA, X, Agent B))
Let’s Get Physical: Models and Methods for Real-World Security Protocols 19

V I V P I
UUUU NV
/ NV
/
UUUU iiii
UUUUiiiiiii
(NI ,I)iiiiUUUUNV

iii iiii UUUU

UUU*
it o (NV ,NP ⊕P ) (NV ,NP ⊕P )
/
MACM K (NV,NV ⊕ NI,I) MACM K (NV ,NP ⊕P ⊕I,I)
o VI
o VI

Fig. 7. Jumping the Gun Attack Fig. 8. Impersonation Attack

The property (P0) speciﬁes well-formedness conditions on F : (a) F can be

computed from the challenge, (b) F neither introduces new atomic messages nor
MACs, which are not among its arguments, and (c)–(d) F cannot be confused
with the other protocol messages. Properties (P1)–(P3) state that arguments
in certain positions are always subterms of F ’s result as long as the remaining
arguments have the required types. Using these properties, we prove the following
lemmas that give sufficient conditions on F for hp-security and dp-security.
Theorem 4.4. For every function F with properties (P0)–(P2), the resulting
instance of the protocol pattern is an hp-secure distance bounding protocol.
In the proof of Theorem 4.4, (P1) is used to ensure that the nonce sent as a
challenge by the verifier must be involved in the computation of the prover’s
response. Analogously, (P2) ensures that the response can only be computed if
the fresh nonce contributed by the prover is known.
For the case of a dishonest prover, we additionally require (P3) to prevent a
dishonest prover from taking credit for a response sent by an honest one.
Theorem 4.5. For every function F with properties (P0)–(P3), the resulting
instance of the protocol pattern is a dp-secure distance bounding protocol.
We have proved that NV ⊕P, NP and NV , NP, P have properties (P0)–
(P3) and therefore the corresponding protocol instances are both hp-secure and
dp-secure.
The function F1 (NV , NP, P ) = NV ⊕ NP, P lacks (P1) and is not dp-secure.
To see that (P1) fails, consider F1 (NV , NV , P ) = 0, P , which does not contain
NV as a subterm. Remember that we have defined the subterms of a mes-
sage t as the subterms of t↓. A dishonest prover I can use this to execute the
“jumping the gun” attack given in Figure 7. The attack uses the equivalence
F1 (NV , NV ⊕ NI , I) = NV ⊕(NV ⊕ NI ), I = NI , I .
In contrast, the function F2 (NV , NP , P ) = NV , NP ⊕P lacks property (P3)
and is therefore hp-secure but not dp-secure. To see that (P3) fails, consider
F2 (NV , NI ⊕ Agent P, Agent P ) = NV , NI , which does not contain Agent P as
a subterm. This leads to the impersonation attack depicted in Figure 8 violating
the dp-security property. This attack uses the equivalence F2 (NV , NP, P ) =
NV , NP ⊕P = F2 (NV , NP ⊕I ⊕ P, I).
20 D. Basin et al.

Overall, proving the properties (P0)–(P4) for a given function and applying
Theorems 4.4 and 4.5 is much simpler than the corresponding direct proofs.
However, ﬁnding the correct properties and proving these theorems for the XOR
message theory turned out to be considerably harder than proofs for comparable
theorems about a ﬁxed protocol in the free message theory. This additional
complexity mainly stems from the untyped protocol formalization necessary to
realistically model the XOR operator.

5 Related Work and Conclusions

Our model of physical security protocols extends the Dolev-Yao model with dense
time, network topology, and node location. Each of these properties has been
handled individually in other approaches. For example, discrete and dense time
have been used to reason about security protocols involving timestamps or timing
issues like timeouts and retransmissions [15,16]. Models encompassing network
topology have been studied in the context of secure routing protocols for ad
hoc networks [17,18]. Node location and relative distance have been considered
in [4,5]. In [6], we compare our work with these approaches in more detail.
While these approaches address individual physical properties, to the best of
our knowledge our model is the first that provides a general foundation for
reasoning about all three of these properties and their interplay.
The protocol pattern we study comes from Meadows et al. [4]. The authors
give a condition on instances of F (called “simply stable”) under which the re-
sulting protocol is correct in the case of honest provers. They also investigate the
two function instances F1 and F2 that we presented above. They give the attack
on F1 . However, as they do not consider the possibility of dishonest provers in
their proof, they classify F2 as secure. Their correctness proofs are based on a
specialized authentication logic, tailored for distance-bounding protocols, that
is presented axiomatically. While they do not provide a semantics, we note that
their axioms can be suitably interpreted and derived within our setting.
From the specification side, our work builds on several strands of research. The
first is the modeling of security protocols as inductively-defined sets of traces.
Our work is not only inspired by Paulson’s inductive method [7], we were able
to reuse some of his theories, in particular his key theory and much of his free
message algebra.
The second strand is research on formalizing equational theories in theorem
provers. Courant and Monin [19] formalize a message algebra including XOR in
Coq, which they use to verify security APIs. They introduce an uninterpreted
normalization function with axiomatically-defined properties, which in turn is
used to define their equivalence relation. Since they do not use a quotient type to
represent equivalence classes, they must account for different representations of
equivalent messages. Paulson’s work on defining functions on equivalence classes
[20] uses a quotient type construction in Isabelle/HOL that is similar to ours,
but represents equivalence classes as sets instead of canonical elements.
The final strand concerns developing reusable results by proving theorems
about families of inductively-defined sets. In our work, we give generic
Let’s Get Physical: Models and Methods for Real-World Security Protocols 21

formalizations of sets of messages and traces, including lemmas, which hold

for different instances. The key idea is to use one parameterized protocol rule,
instead of a collection of individual rules, for the protocol steps. The inductive
definition is then packaged in a locale, where the locale parameter is the rule pa-
rameter, and locale assumptions formalize constraints on the protocol steps (such
as well-formedness). Note that this idea is related to earlier work on structuring
(meta)theory [21,22,23] using parameterized inductively-defined sets, where the
theorems themselves directly formalize the families of sets. Overall, our work
constitutes a substantial case study in using locales to structure reusable theo-
ries about protocols. Another case study, in the domain of linear arithmetic, is
that of [24].
In conclusion, our model has enabled us to formalize protocols, security prop-
erties, and environmental assumptions that are not amenable to formal analysis
using other existing approaches. As future work, we plan to extend our model
to capture additional properties of wireless security protocols. We also intend to
refine our model to capture message sizes and transmission rate, rapid bit ex-
change, and online guessing attacks, which would allow us to analyze protocols
such as those presented in [1].

References

1. Brands, S., Chaum, D.: Distance-bounding protocols. In: Helleseth, T. (ed.)

EUROCRYPT 1993. LNCS, vol. 765, pp. 344–359. Springer, Heidelberg (1994)
2. Capkun, S., Buttyan, L., Hubaux, J.P.: SECTOR: secure tracking of node encoun-
ters in multi-hop wireless networks. In: SASN 2003: Proceedings of the 1st ACM
Workshop on Security of Ad Hoc and Sensor Networks, pp. 21–32. ACM Press,
New York (2003)
3. Hancke, G.P., Kuhn, M.G.: An RFID distance bounding protocol. In:
SECURECOMM 2005: Proceedings of the 1st International Conference on Secu-
rity and Privacy for Emerging Areas in Communications Networks, Washington,
DC, USA, pp. 67–73. IEEE Computer Society, Los Alamitos (2005)
4. Meadows, C., Poovendran, R., Pavlovic, D., Chang, L., Syverson, P.: Distance
bounding protocols: Authentication logic analysis and collusion attacks. In: Secure
Localization and Time Synchronization for Wireless Sensor and Ad Hoc Networks,
pp. 279–298. Springer, Heidelberg (2006)
5. Sastry, N., Shankar, U., Wagner, D.: Secure veriﬁcation of location claims. In: WiSe
2003: Proceedings of the 2003 ACM workshop on Wireless security, pp. 1–10. ACM
Press, New York (2003)
6. Schaller, P., Schmidt, B., Basin, D., Capkun, S.: Modeling and verifying physi-
cal properties of security protocols for wireless networks. In: CSF-22: 22nd IEEE
Computer Security Foundations Symposium (to appear, 2009)
7. Paulson, L.C.: The inductive approach to verifying cryptographic protocols. Jour-
nal of Computer Security 6, 85–128 (1998)
8. Nipkow, T., Paulson, L., Wenzel, M.: Isabelle/HOL. LNCS, vol. 2283. Springer,
Heidelberg (2002)
9. Capkun, S., Hubaux, J.P.: Secure positioning of wireless devices with application
to sensor networks. In: INFOCOM, pp. 1917–1928. IEEE, Los Alamitos (2005)
22 D. Basin et al.

10. Perrig, A., Tygar, J.D.: Secure Broadcast Communication in Wired and Wireless
Networks. Kluwer Academic Publishers, Norwell (2002)
11. Ballarin, C.: Interpretation of locales in Isabelle: Theories and proof contexts. In:
Borwein, J.M., Farmer, W.M. (eds.) MKM 2006. LNCS (LNAI), vol. 4108, pp.
31–43. Springer, Heidelberg (2006)
12. Porter, B.: Cauchy’s mean theorem and the cauchy-schwarz inequality. The Archive
of Formal Proofs, Formal proof development (March 2006)
13. Clulow, J., Hancke, G.P., Kuhn, M.G., Moore, T.: So near and yet so far: Distance-
bounding attacks in wireless networks. In: Buttyán, L., Gligor, V.D., Westhoff, D.
(eds.) ESAS 2006. LNCS, vol. 4357, pp. 83–97. Springer, Heidelberg (2006)
14. Schmidt, B., Schaller, P.: Isabelle Theory Files: Modeling and Verifying Physical
Properties of Security Protocols for Wireless Networks,
http://people.inf.ethz.ch/benschmi/ProtoVeriPhy/
15. Delzanno, G., Ganty, P.: Automatic Verification of Time Sensitive Cryptographic
Protocols. In: Jensen, K., Podelski, A. (eds.) TACAS 2004. LNCS, vol. 2988, pp.
342–356. Springer, Heidelberg (2004)
16. Evans, N., Schneider, S.: Analysing Time Dependent Security Properties in CSP
Using PVS. In: Cuppens, F., Deswarte, Y., Gollmann, D., Waidner, M. (eds.)
ESORICS 2000. LNCS, vol. 1895, pp. 222–237. Springer, Heidelberg (2000)
17. Acs, G., Buttyan, L., Vajda, I.: Provably Secure On-Demand Source Routing in
Mobile Ad Hoc Networks. IEEE Transactions on Mobile Computing 5(11), 1533–
1546 (2006)
18. Yang, S., Baras, J.S.: Modeling vulnerabilities of ad hoc routing protocols. In:
SASN 2003: Proceedings of the 1st ACM Workshop on Security of Ad Hoc and
Sensor Networks, pp. 12–20. ACM, New York (2003)
19. Courant, J., Monin, J.: Defending the bank with a proof assistant. In: Proceedings
of the 6th International Workshop on Issues in the Theory of Security (WITS
2006), pp. 87–98 (2006)
20. Paulson, L.: Defining functions on equivalence classes. ACM Transactions on Com-
putational Logic 7(4), 658–675 (2006)
21. Basin, D., Constable, R.: Metalogical frameworks. In: Huet, G., Plotkin, G. (eds.)
Logical Environments, pp. 1–29. Cambridge University Press, Cambridge (1993);
Also available as Technical Report MPI-I-92-205
22. Basin, D., Matthews, S.: Logical frameworks. In: Gabbay, D., Guenthner, F. (eds.)
Handbook of Philosophical Logic, 2nd edn., vol. 9, pp. 89–164. Kluwer Academic
Publishers, Dordrecht (2002)
23. Basin, D., Matthews, S.: Structuring metatheory on inductive definitions. Infor-
mation and Computation 162(1–2) (October/November 2000)
24. Nipkow, T.: Reflecting quantifier elimination for linear arithmetic. Formal Logical
Methods for System Security and Correctness, 245 (2008)
VCC: A Practical System
for Verifying Concurrent C

Ernie Cohen1 , Markus Dahlweid2 , Mark Hillebrand3 , Dirk Leinenbach3 ,

Michal Moskal2 , Thomas Santen2 , Wolfram Schulte4 , and Stephan Tobies2
1
Microsoft Corporation, Redmond, WA, USA
[email protected]
2
European Microsoft Innovation Center, Aachen, Germany
{markus.dahlweid,michal.moskal,thomas.santen,
stephan.tobies}@microsoft.com
3
German Research Center for Artiﬁcial Intelligence (DFKI), Saarbrücken, Germany
{mah,dirk.leinenbach}@dfki.de
4
Microsoft Research, Redmond, WA, USA
[email protected]

Abstract. VCC is an industrial-strength veriﬁcation environment for

low-level concurrent system code written in C. VCC takes a program
(annotated with function contracts, state assertions, and type invariants)
and attempts to prove the correctness of these annotations. It includes
tools for monitoring proof attempts and constructing partial counterex-
ample executions for failed proofs. This paper motivates VCC, describes
our veriﬁcation methodology, describes the architecture of VCC, and
reports on our experience using VCC to verify the Microsoft Hyper-V
hypervisor.1

1 Introduction

The mission of the Hypervisor Veriﬁcation Project (part of Verisoft XT [1]) is to

develop an industrially viable tool-supported process for the sound veriﬁcation
of functional correctness properties of commercial, oﬀ-the-shelf, system software,
and to use this process to verify the Microsoft Hyper-V hypervisor. In this paper,
we describe the proof methodology and tools developed in pursuit of this mission.
Our methodology and tool design has been driven by the following challenges:

Reasoning Engine. In an industrial process, developers and testers must drive

the verification process (even if more specialized verification engineers architect
global aspects of the verification, such as invariants on types). Thus, verifica-
tion should be primarily driven by assertions stated at the level of code itself,
rather than by guidance provided to interactive theorem provers. The need for
mostly automatic reasoning led us to generate verification conditions that could
1
Work partially funded by the German Federal Ministry of Education and Research
(BMBF) in the framework of the Verisoft XT project under grant 01 IS 07 008.

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 23–42, 2009.

c Springer-Verlag Berlin Heidelberg 2009
24 E. Cohen et al.

be discharged automatically by an SMT (ﬁrst-order satisﬁability modulo theo-

ries) solver. The determination to stick to first-order methods means that the
only form of induction available is computational induction, which required de-
veloping methodological workarounds for inductive data structures.
Moreover, to allow users to understand failed verification attempts, we try
whenever possible to reflect information from the underlying reasoning engine
back to the level of the program. For example, countrexamples generated by
failed proofs in the SMT solver are reflected to the user as (partial) counterex-
ample traces by the VCC Model Viewer.

Weak Typing. Almost all critical system software today is written in C (or
C++). C has only a weak, easily circumvented type system and explicit memory
(de)allocation, so memory safety has to be explicitly verified. Moreover, address
arithmetic enables many nasty programming tricks that are absent from typesafe
code.
Still, most code in a well-written C system adheres to a much stricter type
discipline. The VCC memory model [2] leverages this by maintaining in ghost
memory a typestate that tracks where the “valid” typed memory objects are.
On each memory reference and pointer dereference, there is an implicit assertion
that resulting object is in the typestate. System invariants guarantee that valid
objects do not overlap in any state, so valid objects behave like objects in a
modern (typesafe) OO system. Well-behaved programs incur little additional
annotation overhead, but nasty code (e.g., inside of the memory allocator) may
require explicit manipulation of the typestate2 .
While C is flexible enough to be used in a very low-level way, we still want
program annotations to take advantage of the meaningful structure provided by
well-written code. Because C structs are commonly used to group semantically
related data, we use them by default like objects in OO verification methodolo-
gies (e.g., as the container of invariants). Users can introduce additional (ghost)
levels of structure to reflect additional semantic structure.

Concurrency. Most modern system software is concurrent. Indeed, the architec-

turally visible caching of modern processors means that even uniprocessor op-
erating systems are eﬀectively concurrent programs. Moreover, real system code
makes heavy use of lock-free synchronization. However, typical modular and
thread-local approaches to verifying concurrent programs (e.g., [3]) are based on
locks or monitors, and forbid direct concurrent access to memory.
As in some other concurrency methodologies (e.g., [4]), we use an ownership
discipline that allows a thread to perform sequential writes only to data that it
owns, and sequential reads only to data that it owns or can prove is not changing.
But in addition, we allow concurrent access to data that is marked as volatile in
the typestate (using operations guaranteed by the compiler to be atomic on the
2
Our slogan is “It’s no harder to functionally verify a typesafe program written in
an unsafe language than one written in a safe language.” Thus, veriﬁcation actually
makes unsafe languages more attractive.
VCC: A Practical System for Verifying Concurrent C 25

given platform), leveraging the observation that a correct concurrent program

typically can only race on volatile data (to prevent an optimizing compiler from
changing the program behavior). Volatile updates are required to preserve in-
variants but are otherwise unconstrained, and such updates do not have to be
reported in the framing of a function speciﬁcation.

Cross-Object Invariants. A challenge in modular veriﬁcation is to how to make

sure that updates don’t break invariants that are out of scope. This is usually ac-
complished by restricting invariants to data within the object, or mild relaxations
based on the ownership hierarchy. However, sophisticated implementations often
require coordination between unrelated objects.
Instead, we allow invariants to mention arbitrary parts of the state. To keep
modular veriﬁcation sound, VCC checks that no object invariant can be broken
by invariant-preserving changes to other objects. This admissibility check is done
based on the type declarations alone.

Simulation. A typical way to prove properties of a concurrent data type is to

show that it simulates some simpler type. In existing methodologies, simula-
tion is typically expressed as a theorem about a program, e.g., by introducing
an existentially quantiﬁed variable representing the simulated state. This is ac-
ceptable when verifying an abstract program (expressed, say, with a transition
relation), but is awkward when updates are scattered throughout the codebase,
and it violates our principle of keeping the annotations tightly integrated with
the code.
Instead, we prove concurrent simulation in the code itself, by representing
the abstract target with ghost state. The coupling invariant is expressed as an
ordinary (single state) invariant linking the concrete and ghost state, and the
speciﬁcation of the simulated system is expressed with a two-state invariant on
the ghost state. These invariants imply a (forward) simulation, with updates to
the ghost state providing the needed existential witnesses.

Claims. Concurrent programs implicitly deal with chunks of knowledge about

the state. For example, a program attempting to acquire a spin lock must “know”
that the spin lock hasn’t been destroyed. But such knowledge is ephemeral – it
could be broken by any write that is not thread local – so passing knowledge to
a function in the form of a precondition is too weak. Instead, we package the
knowledge as the invariant of a ghost object; these knowledge-containing objects
are called claims. Because claims are ﬁrst-class objects, they can be passed in
and out of functions and stored in data structures. They form a critical part of
our veriﬁcation methodology.

C and Assembly Code. System software requires interaction between C and

assembly code. This involves subtleties such as the semantics of calls between C
and assembly (in each direction), and consideration of which hardware resources
(e.g., general purpose registers) can be silently manipulated by compiled C code.
Assembly veriﬁcation in VCC is discussed in [5].
26 E. Cohen et al.

Weak Memory Models. Concurrent program reasoning methods usually tacitly

assume sequentially consistent memory. However, system software has to run
on modern processors which, in a concurrent setting, do not provide an effi-
cient implementation of sequentially consistent memory (primarily because of
architecturally visible store buffering). Additional proof obligations are needed
to guarantee that sequentially consistent reasoning is sound. We have developed
a suitable set of conditions for x64 memory, but VCC does not yet enforce the
corresponding verification conditions.

Content. Section 2 gives an overview of Hyper-V. Section 3 introduces the

VCC methodology. Section 4 looks at the main components of VCC’s tool suite.
Section 5 reﬂects on the past year’s experience on using VCC for verifying Hyper-
V. Section 6 concludes with related work.

2 The Microsoft Hypervisor

The development of our verification environment is driven by the verification of
the Microsoft Hyper-V hypervisor, which is an ongoing collaborative research
project between the European Microsoft Innovation Center, Microsoft Research,
the German Research Center for Artificial Intelligence, and Saarland University
in the Verisoft XT project. The hypervisor is a relatively thin layer of software
(100KLOC of C, 5KLOC of assembly) that runs directly on x64 hardware. The
hypervisor turns a single real multi-processor x64 machine (with AMD SVM
[6] or Intel VMX [7] virtualization extensions) into a number of virtual multi-
processor x64 machines. (These virtual machines include additional machine
instructions to create and manage other virtual machines.)
The hypervisor was not written with formal verification in mind. Verification
requires substantial annotations to the code, but these annotations are struc-
tured so that they can be easily removed by macro preprocessing, so the anno-
tated code can still be compiled by the standard C compiler. Our goal is that
the annotations will eventually be integrated into the codebase and maintained
by the software developers, evolving along with the code.
The hypervisor code consists of about 20 hierarchical layers, with essentially
no up-calls except to pure functions. The functions and data of each layer is
separated into public and private parts. These visibility properties are ensured
statically using compiler and preprocessor hacks, but the soundness of the veri-
fication does not depend on these properties. These layers are divided into two
strata. The lower layers form the kernel stratum which is a small multi-processor
operating system, complete with hardware abstraction layer, kernel, memory
manager, and scheduler (but no device drivers). The virtualization stratum runs
in each thread an “application” that simulates an x64 machine without the vir-
tualization features, but with some additional machine instructions, and running
under an additional level of memory address translation (so that each machine
can see 0-based, contiguous physical memory).
For the most part, a virtual machine is simulated by simply running the
real hardware; the extra level of virtual address translation is accomplished by
VCC: A Practical System for Verifying Concurrent C 27

using shadow page tables (SPTs). The SPTs, along with the hardware transla-
tion lookaside buffers (TLBs) (which asynchronously gather and cache virtual
to physical address translations), implement a virtual TLB. This simulation is
subtle for two reasons. First, the hardware TLB is architecturally visible, be-
cause (1) translations are not automatically flushed in response to edits to page
tables stored in memory, and (2) translations are gathered asynchronously and
nonatomically (requiring multiple reads and writes to traverse the page tables),
creating races with system code that operates on the page tables. Even the se-
mantics of TLBs are subtle, and the hypervisor verification required constructing
the first accurate formal models of the x86/x64 TLBs. Second, the TLB simula-
tion is the most important factor in system performance; simple SPT algorithms,
even with substantial optimization, can introduce virtualization overheads of
50% or more for some workloads. The hypervisor therefore uses a very large and
complex SPT algorithm, with dozens of tricky optimizations, many of which
leverage the freedoms allowed by the weak TLB semantics.

3 VCC Methodology

VCC extends C with annotations giving function pre- and post-conditions, asser-
tions, type invariants, and ghost code. Many of these annotations are similar to
those found in ESC/Java [8], Spec# [9], or Havoc [10]. With contracts in place,
VCC performs a static modular analysis, in which each function is verified in iso-
lation, using only the contracts of functions that it calls and invariants of types
used in its code. But unlike the aforementioned systems, VCC is geared towards
sound verification of functional properties of low-level concurrent C code.
We show VCC’s use by specifying hypervisor partitions; the data structure
which keeps state to execute a guest operating system. Listing 1 shows a much
simplified but annotated definition of the data structure. (The actual struct has
98 fields.)
typedef enum { Undefined, Initialized, Active, Terminating } LifeState;

typedef struct _Partition {

bool signaled;
LifeState lifeState;
invariant(lifeState == Initialized || lifeState == Active ||
lifeState == Terminating)
invariant(signaled ==> lifeState == Active)
} Partition;

void part_send_signal(Partition *part)

requires(part->lifeState == Active)
ensures(part->signaled)
maintains(wrapped(part))
writes(part)
{
unwrap(part);
part->signaled = 1;
wrap(part);
}

Listing 1. Sequential partition

28 E. Cohen et al.

Function Contracts. Every function can have a speciﬁcation, consisting of four

kinds of clauses. Preconditions, introduced by requires clauses, declare under
which condition the function is allowed to be called. Postconditions, introduced
by ensures clauses, declare under which condition the function is allowed to
return. A maintains clause combines a precondition followed by a postcondition
with the same predicate. Frame conditions, introduced by writes clauses, limit
the part of the program’s state that the function is allowed to modify.
So part_send_signal of Listing 1 is allowed to be called if the actual param-
eter’s lifeState is Active. When the function terminates it guarantees (1) that
the formal parameter’s signaled bit has been set and (2) that it modiﬁed at
most the passed partition object. We will discuss the notion of a wrapped object,
which is mentioned in the maintains clause, in Sect. 3.1.

Type Invariants. Type definitions can have type invariants, which are one- or
two-state predicates on data. Other specifications can refer to the invariant of
object o as inv(o) (or inv2(o)). VCC implicitly uses invariants at various
locations, as will be explained in the following subsections.
The invariant of the Partition struct of Listing 1 says that lifeState must
be one of the valid ones defined for a partition, and that if the signaled bit is
set, lifeState is Active.

Ghosts. A crucial concept in the VCC specification language is the division into
operational code and ghost code. Ghost code is seen only by the static veri-
fier, not the regular compiler. Ghost code comes in various forms: Ghost type
definitions are types, which can either be regular C types, or special types for
verification purposes like maps and claims (see Sect. 3.2). Ghost fields of ar-
bitrary type can be introduced as specially marked fields in operational types.
These fields do not interfere with the size and ordering of the operational fields.
Likewise, static or automatic ghost variables of arbitrary type are supported.
Like ghost fields, they are marked special and do not interfere with operational
variables. Ghost parameters of arbitrary type can pass additional ghost state
information in and out of the called function. Ghost state updates perform op-
erations on only the ghost memory state. Any flow of data from the ghost state
to the operational state of the software is forbidden.
One application of ghost code is maintaining shadow copies of implementation
data of the operational software. Shadow copies usually introduce abstractions,
e.g., representing a list as a set. They are also introduced to allow for atomic
update of the shadow, even if the underlying data structure cannot be updated
atomically. The atomic updates are required to enforce protocols on the overall
system using two-state invariants.

3.1 Verifying Sequential Programs

Ownership and Invariants. To deal with high-level aliasing, VCC implements

a Spec#-style [9] ownership model: The heap is organized into a forest of tree
VCC: A Practical System for Verifying Concurrent C 29

Sequential Concurrent

mutable wrap(o) wrapped0 claim(o,) wrapped

.
!closed(o) closed(o) closed(o) .
.
owner(o)==me() owner(o)==me() owner(o)==me()
ref_cnt(o)==0 unwrap(o) ref_cnt(o)==0 unclaim(,o,) ref_cnt(o)==1

unwrap(o ) where wrap(o ) where

o ∈ owns(o ) or o ∈ owns(o ) or
giveup closed owner(o,o ) set closed owner(o,o )

nested nested
claim(o,)
closed(o) closed(o) .
.
owner(o)==o owner(o)==o .
o !=me() unclaim(,o,) o !=me()
ref_cnt(o)==0 ref_cnt(o)==1

Fig. 1. Objects states, transitions, and access permissions

structures. The edges of the trees indicate ownership, that is, an aggregate / sub-
object relation. The roots of trees in the ownership forest are objects representing
threads of execution. The set of objects directly or transitively owned by an
object is called the ownership domain of that object.
We couple ownership and type invariants. Intuitively, a type invariant can
depend only on state in its ownership domain. We later relax this notion. Of
course ownership relationships change over time and type invariants cannot al-
ways hold. We thus track the status for each object o in meta-state: owner(o)
denotes the owner of an object, owns(o) speciﬁes the set of objects owned by o
(the methodology ensures that owner() and owns() stay in sync), closed(o)
guarantees that o’s invariant holds.
Figure 1 discusses the possible meta-states of an object (The Concurrent part
of this ﬁgure will be explained in Sect. 3.2):

– mutable(o) holds if o is not closed and is owned by the current thread

(henceforth called me). Allocated objects are always mutable and fresh (i.e.,
not previously present in the owns-set of the current thread).
– wrapped(o) holds if o is closed and owned by me. Non-volatile ﬁelds of
wrapped objects cannot change. (wrapped0(o) abbreviates wrapped(o) and
ref_cnt(o)==0).
– nested(o) holds if o is closed and owned by an object.

Ghost operations, like wrap, unwrap, etc. update the state as depicted in Fig. 1;
note that unwrapping an object moves its owned object from nested to wrapped,
wrapping the object moves them back.
The function part_send_signal() from Listing 1 respects this meta-state
protocol. The function precondition requires part to be wrapped, i.e., part’s in-
variant holds. The function body ﬁrst unwraps part, which suspends its
30 E. Cohen et al.

typedef struct vcc(dynamic_owns) _PartitionDB {

Partition *partitions[MAXPART];
invariant(forall(unsigned i; i < MAXPART;
partitions[i] != NULL ==> set_in(partitions[i], owns(this))))
} PartitionDB;

void db_send_signal(PartitionDB *db, unsigned idx)

requires(idx < MAXPART)
maintains(wrapped(db))
ensures((db->partitions[idx] != NULL) && (db->partitions[idx]->lifeState == Active)
==> db->partitions[idx]->signaled)
writes(db)
{
unwrap(db);
if ((db->partitions[idx] != NULL) && (db->partitions[idx]->lifeState == Active)) {
part_send_signal(db->partitions[idx]);
}
wrap(db);
}

Listing 2. Sequential partition database

invariant, next its fields are written to. To establish the postcondition, part
is wrapped again; at this point, VCC checks that all invariants of part hold.
The write clauses work accordingly: write access to the root of an owner-
ship domain enables writing to the entire ownership domain. In our example,
writes(part) gives the part_send_signal function write access to all fields
of part (and the objects part owns), and tells the caller, that state updates are
confined to the ownership domain of part. Additionally, one can always write
to objects that are fresh.

Conditional Ownership. The actual hypervisor implementation does not use

partition pointers but abstract partition identifiers to refer to partitions. This is
because partitions can be created and destroyed anytime during the operation
of the hypervisor, which might lead to dangling pointers. Listing 2 simulates
the hypervisor solution: before any operation on a partition can take place,
the pointer to the partition is retrieved from a partition database using the
partition identifier. The PartitionDB type contains an array partitions of
MAXPART entries of pointers to Partition. The index in partitions serves as
the partition identifier. The partition database invariant states that all (non-null)
elements of partitions are owned by the partition database.
The function db_send_signal() in Listing 2 attempts to send a signal to
the partition with index idx of the partition database db. Is uses the function
part_send_signal() from Listing 1, so we need to ensure that the precondi-
tions of that function are met: part->lifeState is Active follows from the
condition of the if-statement; wrapped(part) holds since part is contained in
the owns-set of db; unwrapping db transitions the partition from nested into the
wrapped state. It also makes the partition writable as it has not previously been
owned by the current thread and thus is also considered fresh from the point of
view of the current thread.
VCC: A Practical System for Verifying Concurrent C 31

#define ISSET(n, v) (((v) & (1ULL << (n))) != 0)

typedef Partition *PPartition;

typedef struct vcc(claimable) _PartitionDB {

volatile uint64_t allSignaled;
volatile PPartition partitions[MAXPART];
invariant(forall(unsigned i; i < MAXPART;
unchanged(partitions[i]) ||
old(partitions[i]) == NULL || !closed(old(partitions[i]))))
invariant(forall(unsigned i; i < MAXPART;
unchanged(ISSET(i, allSignaled)) ||
inv2(partitions[i])))
} PartitionDB;
Listing 3. Concurrent partition database

3.2 Verifying Concurrent Programs

We now proceed with a concurrent version of the partition database structure
from the previous example (cf. Listing 3). The array of partitions is declared as
volatile to mark the intent of allowing arbitrary threads to add and remove
partitions without unwrapping the partition database. The partitions are also
no longer owned by the database, instead we imagine that the thread currently
executing a partition owns it.3 The first two-state invariant of the database
prevents removal of closed partitions. The meaning of the invariant is: for any
two consecutive states of the machine either partitions[i] stays the same
(unchanged(x) is defined as old(x)==(x)), or it was NULL in the first state, or
the object pointed to by partitions[i] in the first state is open. VCC enforces
this two-state invariant on every write to the database. Thus, if one has a closed
partition at index i, one can rely on it staying there.
In the concurrent database, the individual signaled fields from the sequential
version have been collected into a bit mask in the database. This allows taking
an atomic snapshot of partitions currently being signaled. On the other hand,
the details of how these bits can change logically belong with the individual
partitions. This is stated by the second database invariant, saying that whenever
the i-th bit of allSignaled is changed, the invariant of the i-th partition shall
be preserved.
Listing 4 shows the updated version of the partition. The lifeState field
remains the same. Since it is not marked volatile, the partition must be un-
wrapped before changing its life state. Because we want to keep the signature of
the part_send_signal() function, the partition now needs to hold a pointer
to the database and its index. An invariant enforces that the current partition
is indeed stored at that index. This makes the invariant of the partition depend
on fields of the database; so without further precaution, we would need to check
the invariants of partitions when updating the database – but this would make
reasoning about invariants non-modular. Instead, VCC requires that invariants
are admissible, as described below.
3
In reality, if one takes a reference to a partition from the database, the database
needs to provide some guarantees that the partition will stick around long enough.
This is achieved using rundowns, but for brevity we skip it here.
32 E. Cohen et al.

typedef struct vcc(claimable) _Partition {

LifeState lifeState;
invariant(lifeState == Initialized || lifeState == Active ||
lifeState == Terminating)

struct _PartitionDB *db;

unsigned idx;
invariant(idx < MAXPART && db->partitions[idx] == this)

spec(volatile bool signaled;)

invariant(signaled <==> ISSET(idx, db->allSignaled))
invariant(signaled ==> lifeState == Active)

spec(claim_t db_claim;)
invariant(keeps(db_claim) && claims_obj(db_claim, db))
} Partition;
Listing 4. Admissibility, volatile ﬁelds, shadow ﬁelds

Admissibility. A state transition is legal iff, for every object o that is closed in the
transition’s prestate or poststate, if any field of o is updated (including the “field”
indicating closedness) the two-state invariant of o is preserved. An invariant of
an object o is admissible iff it is satisfied by every legal state transition. Stated
differently, an invariant is admissible if it is preserved by every transition that
preserves invariants of all modified objects. Note that admissibility depends only
on type definitions (not function specs or code), and is monotonic (i.e., if an
invariant has been proved admissible, the addition of further types or invariants
cannot make it inadmissible). VCC checks that all invariants are admissible.
Thus, when checking that a state update doesn’t break any invariant, VCC has
to check only the invariants of the updated objects.
Some forms of invariants are trivially admissible. In particular, an invari-
ant in object o that mentions only fields of o is admissible. This applies to
idx < MAXPART. For db->partitions[idx]==this, let us assume that db->
partitions[idx] changes across a transition (other components of that ex-
pression could not change). We know db->partitions[idx] was this in the
prestate. Assume for a moment, that we know db was closed in both the prestate
and the poststate. Then we know db->partitions[idx] was unchanged, it was
NULL in the prestate (but this != NULL), or this was open in the poststate:
all three cases are contradictory. But if we knew that db stays closed, then the
invariant would be admissible.

Claims. The required knowledge is provided by the claim object, owned by the
partition and stored in the ghost ﬁeld db_claim. A claim, as it is used here, can
be thought of as a handle that keeps its claimed object from opening. If an object
o has a type which is marked with vcc(claimable) the ﬁeld ref_cnt(o) tracks
the number of outstanding claims that claim o. An object cannot be unwrapped
if this count is positive, and a claim can only be created when the object is closed.
Thus, when a claim to an object exists, the object is known to be closed.4
4
Claims can actually be implemented using admissible two-state invariants. We
decided to build them into the annotation language for convenience.
VCC: A Practical System for Verifying Concurrent C 33

void part_send_signal(Partition *part spec(claim_t c))

requires(wrapped(c) && claims_obj(c, part))
{
PartitionDB *db = part->db;
uint64_t idx = part->idx;

if (part->lifeState != Active) return;

bv_lemma(forall(int i, j; uint64_t v; 0 <= i && i < 64 && 0 <= j && j < 64 ==>
i != j ==> (ISSET(j, v) <==> ISSET(j, v | (1ULL << i)))));

atomic(part, db, c) {
speconly(part->signaled = true;)
InterlockedBitSet(&db->allSignaled, idx);
}
}
Listing 5. Atomic operation

More generally, a claim is created by giving an invariant and a set of claimed

(claimable) objects on which the claim depends. At the point at which the claim
is created, the claimed objects must all be closed and the invariant must hold.
Moreover, the claim invariant, conjoined with the invariants of the claimed ob-
jects, must imply that the claim invariant cannot be falsiﬁed without opening
one of the claimed objects.
Pointers to claims are often passed as ghost arguments to functions (most of-
ten with the precondition that the claim is wrapped). In this capacity, the claim
serves as a stronger kind of precondition. Whereas an ordinary precondition can
only usefully constrain state local to the thread, a claim can constrain volatile
(shared) state. Moreover, unlike a precondition, the claim invariant is guaran-
teed to hold until the claim is destroyed. In a function speciﬁcation, the macro
always(o, P ) means that the function maintains wrapped(o) and that o points
to a valid claim, and that the invariant of the claim o implies the predicate P .
Thus, this contract guarantees that P holds throughout the function call (both
to the function and to its callers).

Atomic Blocks. Listing 5 shows how objects can be concurrently updated. The
signaling function now only needs a claim to the partition, passed as a ghost
parameter, and does not need to list the partition in its writes clause. In fact,
the writes clause of the signaling function is empty, reflecting the fact that from
the caller perspective, the actions could have been performed by another thread,
without the current thread calling any function. A thread can read its own non-
volatile state; it can also read non-volatile fields of closed objects, in particular
objects for which it holds claims. On the other hand, the volatile fields can
only be read and written inside of atomic blocks. Such a block identifies the
objects that will be read or written, as well as claims that are needed to establish
closedness of those objects. It can contain at most one physical state update or
read, which is assumed to be performed atomically by the underlying hardware.
In our example, we set the idx-th bit of allSignaled field, using a dedicated
CPU instruction (it also returns the old value, but we ignore it). On top of that,
the atomic block can perform any number of updates of the ghost state. Both
34 E. Cohen et al.

#define Write(state) ((state)&0x80000000)

#define Readers(state) ((state)&0x7FFFFFFF)

typedef struct vcc(claimable) vcc(volatile_owns) _LOCK {

volatile long state;
spec(obj_t protected_obj;)
spec(volatile bool initialized, writing;)
spec(volatile claim_t self_claim;)

invariant(old(initialized) ==> initialized && unchanged(self_claim))

invariant(initialized ==>
is_claimable(protected_obj) && is_non_primitive_ptr(protected_obj) &&
set_in(self_claim,owns(this)) && claims_obj(self_claim, this) &&
protected_obj != self_claim)
invariant(initialized && !writing ==>
set_in(protected_obj,owns(this)) &&
ref_cnt(protected_obj) == (unsigned) Readers(state) &&
ref_cnt(self_claim) == (unsigned)(Readers(state) + (Write(state)!=0)))
invariant(initialized && old(Write(state)) ==>
Readers(state) <= old(Readers(state)) && (!Write(state) ==> old(writing)))
invariant(initialized && writing ==>
Readers(state) == 0 && Write(state) && ref_cnt(self_claim) == 0)
} LOCK;
Listing 6. Annotated lock data structure

physical and ghost updates can only be performed on objects listed in the header
of the atomic block. The resulting state transition is checked for legality, i.e., we
check the two-state invariants of updated objects across the atomic block. The
beginning of the atomic block is the only place where we simulate actions of
other threads; technically this is done by forgetting everything we used to know
about volatile state. The only other possible state updates are performed on
mutable (and thus open) objects and thus are automatically legal.

3.3 Veriﬁcation of Concurrency Primitives

In VCC, concurrency primitives (other than atomic operations) are verified (or
just specified), rather than being built in. As an example we present the acqui-
sition of a reader-writer lock in exclusive (i.e., writing) mode.5 In this example,
claims are used to capture not only closedness of objects but also properties of
their fields.
The data structure LOCK (cf. Listing 6) contains a single volatile implemen-
tation variable called state. Its most significant bit holds the write flag that is
set when a client requests exclusive access. The remaining bits hold the number
of readers. Both values can be updated atomically using interlocked operations.
Acquiring a lock in exclusive mode proceeds in two phases. First, we spin on
setting the write bit of the lock atomically. After the write bit has been set, no
new shared locks may be taken. Second, we spin until the number of readers
reaches zero. This protocol is formalized using lock ghost fields and invariants.
The lock contains four ghost variables: a pointer protected_obj identify-
ing the object protected by the lock, a flag initialized that is set after
5
For details and full annotated source code see [11].
VCC: A Practical System for Verifying Concurrent C 35

claims lock self claim

owns

lock access claim protected object read access claims

Fig. 2. Ownership and claims structure (shared and exclusive access)

initialized
Write(state) X
writing X
Readers(state) X decreasing 0 X
ref_cnt(self_claim) X Readers(state) Readers(state)+1 0 Readers(state)

Fig. 3. Relation of lock implementation and ghost variables

initialization, a ﬂag writing that is one when exclusive access has been granted
(and no reader holds the lock), and a claim self_claim. The use of self_claim
is twofold. First, we tie its reference count to the implementation variables of
the lock. This allows restricting changes of these variables by maintaining claims
on self_claim. Second, it is used to claim lock properties, serving as a proxy
between the claimant and the lock. For this purpose it claims the lock and is
owned by it. It thus becomes writable and claimable in atomic operations on the
lock without requiring it or the lock to be listed in function writes clauses.
Figures 2 and 3 contain a graphical representation of the lock invariants.
Figure 2 shows the setup of ownership and claims. The lock access claim is cre-
ated after initialization. It ensures that the lock remains initialized and allocated,
and clients use it (or a derived claim) when calling lock functions. During non-
exclusive access each reader holds a read access claim on the protected object
and the lock, and the lock owns the protected object, as indicated in gray. Dur-
ing exclusive access the protected object is owned by the client and there may be
no readers. Figure 3 depicts the dynamic relation between implementation and
ghost variables. As long as the write bit is zero, shared locks may be acquired
and released, as indicated by the number of readers. The write bit is set when
the acquisition of an exclusive lock starts. In this phase the number of readers
must decrease. When it reaches zero, exclusive lock acquisition can complete by
activating the writing ﬂag. For each reader and each request for write access
(which is at most one) there is a reference on self_claim.
Listing 7 shows the annotated code for acquisition of an exclusive lock. The
macro claimp wrapped around the parameter lock_access_claim means that
lock_access_claim is a ghost pointer to a wrapped claim; the always clause
says that this claim is wrapped, is not destroyed by the function, and that its
invariant implies that the lock is closed and initialized (and hence, will remain
36 E. Cohen et al.

void AcquireExclusive(LOCK *lock claimp(lock_access_claim))

always(lock_access_claim, closed(lock) && lock->initialized)
ensures(wrapped0(lock->protected_obj) && is_fresh(lock->protected_obj))
{
bool done;
spec(claim_t write_bit_claim;)

bv_lemma(forall(long i; Write(i|0x80000000) &&

Readers(i|0x80000000) == Readers(i)));

do
atomic (lock, lock_access_claim) {
done = !Write(InterlockedOr(&lock->state, 0x80000000));
speconly(if (done) {
write_bit_claim = claim(lock->self_claim, lock->initialized &&
stays_unchanged(lock->self_claim) && Write(lock->state));
})
}
while (!done);
do
invariant(wrapped0(write_bit_claim))
atomic (lock, write_bit_claim) {
done = Readers(lock->state)==0;
speconly(if (done) {
giveup_closed_owner(lock->protected_obj, lock);
unclaim(write_bit_claim, lock->self_claim);
lock->writing = 1;
})
}
while (!done);
}

Listing 7. Acquisition of an exclusive lock

so during the function call). After the function returns it guarantees that the
protected object is unreferenced, wrapped, and fresh (and thus, writable).
In the first loop of the implementation we spin until the write bit could be
atomically set (via the InterlockedOr intrinsic), i.e., in an atomic block the
write bit has been seen as zero and then set to one. In the terminating loop case
we create a temporary claim write_bit_claim, which references the self claim
and states that the lock stays initialized, that the self claim stays, and that the
write bit of the lock has been set. VCC checks that the claimed property holds
initially and is stable against interference. The former is true by the passed-in
lock access claim and the state seen and updated in the atomic operation; the
latter is true because as long as there remains a reference to the self claim,
the writing flag cannot be activated and the write bit cannot be reset. Also,
the atomic update satisfies the lock invariant.
The second loop waits for the readers to disappear. If the number of readers
has been seen as zero, we remove the protected object from the ownership of
the lock, discard the temporary claim, and set the writing flag. All of this
can be justified by the claimed property and the lock’s invariant. Setting the
writing flag is allowed because the write bit is known to be set. Furthermore, the
writing flag is known to be zero in the pre-state of the atomic operation because
the reference count of the self claim, which is referenced by write_bit_claim,
cannot be zero. This justifies the remaining operations.
VCC: A Practical System for Verifying Concurrent C 37

4 VCC Tool Suite

VCC reuses the Spec# tool chain [9], which has allowed developing a compre-
hensive C verifier with limited effort. In addition we developed auxiliary tools
to support the process of verification engineering in a real-world effort.

4.1 The Static Veriﬁer

We base our veriﬁcation methodology on inline annotations in the, otherwise

unaltered, source code of the implementation. The C preprocessor is used to
eliminate these annotations for normal C compilation. For veriﬁcation, the out-
put of the preprocessor (with annotations still intact) is fed to the VCC compiler.

CCI. The VCC compiler is build using Microsoft Research’s Common Compiler
Infrastructure (CCI) libraries [12]. VCC reads annotated C and turns the input
into CCI’s internal representation to perform name resolution, type and error
check as any normal C compiler would do.

Source Transformations and Plugins. Next, the fully resolved input program un-
dergoes multiple source-to-source transformations. These transformations ﬁrst
simplify the source, and then add proof obligations stemming from the method-
ology. The last transformation generates the Boogie source.
VCC provides a plugin interface, where users can insert and remove trans-
formations, including the ﬁnal translation. Currently two plugins have been
implemented: to generate contracts for assembly functions from their C cor-
respondants; and to build a new methodology based on separation logic [13].

Boogie. Once the source code has been analyzed and found to be valid, it is
translated into a Boogie program that encodes the input program according to
our formalization of C. Boogie [14] is an intermediate language that is used by a
number of software verification tools including Spec# and HAVOC. Boogie adds
minimal imperative control flow, procedural and functional abstractions, and
types on top of first order predicated logic. The translation from annotated C to
Boogie encodes both static information about the input program, like types and
their invariants, and dynamic information like the control flow of the program
and the corresponding state updates. Additionally, a fixed axiomatization of C
memory, object ownership, type state, and arithmetic operations (the prelude)
is added. The resulting program is fed to the Boogie program verifier, which
translates it into a sequence of verification conditions. Usually, these are then
passed to an automated theorem prover to be proved or refuted. Alternatively,
they can be discharged interactively. The HOL-Boogie tool [15] provides support
for this approach based on the Isabelle interactive theorem prover.

Z3. Our use of Boogie targets Z3 [16], a state-of-the art ﬁrst order theorem
prover that supports satisﬁability modulo theories (SMT). VCC makes heavy use
38 E. Cohen et al.

of Z3’s fast decision procedures for linear arithmetic and uses the slower ﬁxed-
length bit vector arithmetic only when explicitely invoked by VCC’s bv_lemma()
mechanism (see Listing 5 for an example). These lemmas are typically used when
reasoning for overﬂowing arithmetic or bitwise operations.

4.2 Static Debugging

In the ideal case, the work flow described above is all there is to running VCC:
an annotated program is translated via Boogie into a sequence of verification
conditions that are successfully proved by Z3. Unfortunately, this ideal situa-
tion is encountered only seldomly during the process of verification engineering,
where most time is spent debugging failed verification attempts. Due to the un-
decidability of the underlying problem, these failures can either be caused by a
genuine error in either the code or the annotations, or by the inability of the
SMT solver to prove or refute a verification condition within available resources
like computer memory, time, or verification engineer’s patience.

VCC Model Viewer. In case of a refutation, Z3 constructs a counterexample that

VCC and Boogie can tie back to a location in the original source code. However
that is not that easy, since these counterexamples contain many artifacts of
the underlying axiomatization, and so are not well-suited for direct inspection.
The VCC Model Viewer translates the counterexample into a representation
that allows inspecting the sequence of program states that led to the failure,
including the value of local and global variables and the heap state.

Z3 Inspector. A diﬀerent kind of veriﬁcation failure occurs when the prover

takes an excessive amount of time to come up with a proof or refutation for
a verification condition. To counter this problem, we provide the Z3 Inspector,
a tool that allows to monitor the progress of Z3 tied to the annotations in
the source code. This allows to pinpoint those verification conditions that take
excessively long to be processed. There can be two causes for this: either the
verification condition is valid and the prover requires a long time to find the
proof, or the verification condition is invalid and the search for a counterexample
takes a very long time. In the latter case, the Z3 Inspector helps identifying the
problematic assertion quickly.

Z3 Axiom Profiler. In the former case a closer inspection of the quantifier in-
stantiation pattern can help to determine inefficiencies in the underlying axiom-
atization of C or the program annotations. This is facilitated by the Z3 Axiom
Profiler, which allows to analyze the quantifier instantiation patters to detect,
e.g., triggering cycles.

Visual Studio. All of this functionality is directly accessible from within the Vi-
sual Studio IDE, including verifying only individual functions. We have found
that this combination of tools enables the veriﬁcation engineer to eﬃciently de-
velop and debug the annotations required to prove correctness of the scrutinized
codebase.
VCC: A Practical System for Verifying Concurrent C 39

5 VCC Experience

The methodology presented in this paper was implemented in VCC in late 2008.
Since this methodology differs significantly from earlier approaches, the annota-
tion of the hypervisor codebase had to start from scratch. As of June 2009, four-
teen verification engineers are working on annotating the codebase and verifying
functions. Since November 2008 approx. 13 500 lines of annotations have been
added to the hypervisor codebase. About 350 functions have been successfully
verified resulting in an average of two verified functions per day. Additionally,
invariants for most public and private data types (consisting of about 150 struc-
tures or groups) have been specified and proved admissable. This means that
currently about 20% of the hypervisor codebase has been successfully verified
using our methodology.
A major milestone in the verification effort is having the specifications of
all public functions from all layers so that the verification of the different layers
require no interaction of the verification engineers, since all required information
has been captured in the contracts and invariants. This milestone has been
reached or will be reached soon for seven modules. Also for three modules already
more than 50% of the functions have been successfully verified.
We have found that having acceptable turnaround times for verify-and-fix
cycles is crucial to maintain productivity of the verification engineers. Currently
VCC verifies most functions in 0.5 to 500 seconds with an average of about 25
seconds. The longest running function needs ca. 2 000 seconds to be verified.
The all-time high was around 50 000 seconds for a successful proof attempt.
In general failing proof attempts tend to take longer than successfully verifying a
function. A dedicated test suite has been created to constantly monitor verifica-
tion performance. Performance has improved by one to two orders of magnitude.
Many changes have contributed to these improvements, ranging from changes in
our methodology, the encoding of type state, our approach to invariant checking,
the support of out parameters, to updates in the underlying tools Boogie and
Z3. With these changes, we have, for example, reduced the verification time for
the 50 000s function down to under 1 000s.
Still, in many cases the verification performance is unacceptable. Empirically,
we have found that verification times of over a minute start having an impact
on the productivity of the verification engineer, and that functions that require
one hour or longer are essential intractable. We are currently working on many
levels to alleviate these problems: improvements in the methodology, grid-style
distribution of verification condition checking, parallelization of proof search for a
single verification condition, and other improvements of SMT solver technology.

6 Related Work

Methodology. The approach of Owicki and Gries [17] requires annotations to be

stable with respect to every atomic action of the other threads, i.e., that the
assertions are interference free. This dependency on the other threads makes the
40 E. Cohen et al.

Owicki-Gries method non-modular and the number of interference tests grows

quadratically with the number of atomic actions. Ashcroft [18] proposed to just
use a single big state invariant to verify concurrent programs. This gets rid of
the interference check, and makes verification thread-modular.
Jones developed the more modular rely/guarantee method [19] which ab-
stracts the possible interleavings with other threads to rely on and guarantee
assertions. Now, it suffices to check that each thread respects these assertions
locally and that the rely and guarantee assertions of all threads fit together.
Still, their approach (and also the original approach of Owicki and Gries) do
not support data modularity: there is no hiding mechanism, a single bit change
requires the guarantees of all threads to be checked.
Flanagan et al. [3] describe a rely/guarantee based prover for multi-threaded
Java programs. They present the verification of three synchronization primitives
but do not report on larger verification examples. The approach is thread mod-
ular (as it is based on rely/guarantee) but not function modular (they simulate
function calls by inlining).
In contrast to rely/guarantee, concurrent separation logic exploits that large
portions of the program state may be operated on mutually exclusive [20, 21].
Thus, like in our approach, interference is restricted to critical regions and verifi-
cation can be completely sequential elsewhere. Originally, concurrent separation
logic was restricted to exclusive access (and atomic update) of shared resources.
Later, Bornat proposed a fractional ownership scheme to allow for shared read-
only state also [22]. Recently, Vafeiadis and Parkinson [23] have worked on com-
bining the ideas of concurrent separation logic with rely/guarantee reasoning.
Our ownership model, with uniform treatment of objects and threads, is very
similar to the one employed in Concurrent Spec# [4]. Consequently, the visi-
ble specifications of locks, being the basis of Concurrent Spec# methodology,
is essentially the same. We however do not treat locks as primitives, and al-
low for verifying implementation of various concurrency primitives. The Spec#
ideas have also permeated into recent work by Leino and Mueller [24]. They
use dynamic frames and fractional permissions for verifying fine grained locking.
History invariants [25] are two-state object invariants, requiring admissibility
check similar to ours. These invariants are however restricted to be transitive
and are only used in the sequential context.

Systems Veriﬁcation. Klein [26] provides a comprehensive overview of the history

and current state of the art in operating systems verification, which is supple-
mented by a recent special issue of the Journal of Automated Reasoning on op-
erating system verification [27]. The VFiasco [28] project, followed by the Robin
projects attempted the verification of a micro kernel, based on a translation of
C++ code into its corresponding semantics in the theorem prover PVS. While
these projects have been successful in providing a semantic model for C++,
no significant portions of the kernel implementation has been verified. Recent
related projects in this area include the project L4.verified [29] (verification of
an industrial microkernel), the FLINT project [30] (verification of an assembly
kernel), and the Verisoft project [31] (the predecessor project of Verisoft XT
VCC: A Practical System for Verifying Concurrent C 41

focusing on the pervasive veriﬁcation of hardware-software systems). All three

projects are based on interactive theorem proving (with Isabelle or Coq). Our
hypervisor veriﬁcation attempt is signiﬁcantly more ambitious, both with re-
spect to size (ca. 100KLOC of C) and complexity (industrial code for a modern
multiprocessor architecture with a weak memory model).

Acknowledgments. Thanks to everyone in the project: Artem Alekhin, Eyad

Alkassar, Mike Barnett, Nikolaj Bjørner, Sebastian Bogan, Sascha Böhme, Matko
Botinĉan, Vladimir Boyarinov, Ulan Degenbaev, Lieven Desmet, Sebastian Fil-
linger, Tom In der Rieden, Bruno Langenstein, K. Rustan M. Leino, Wolf-
gang Manousek, Stefan Maus, Leonardo de Moura, Andreas Nonnengart, Steven
Obua, Wolfgang Paul, Hristo Pentchev, Elena Petrova, Norbert Schirmer, Sabine
Schmaltz, Peter-Michael Seidel, Andrey Shadrin, Alexandra Tsyban, Sergey
Tverdyshev, Herman Venter, and Burkhart Wolﬀ.

References
1. Verisoft XT: The Verisoft XT project (2007), http://www.verisoftxt.de
2. Cohen, E., Moskal, M., Schulte, W., Tobies, S.: A precise yet efficient memory
model for C. In: SSV 2009. ENTCS. Elsevier Science B.V., Amsterdam (2009)
3. Flanagan, C., Freund, S.N., Qadeer, S.: Thread-modular verification for shared-
memory programs. In: Le Métayer, D. (ed.) ESOP 2002. LNCS, vol. 2305, pp.
262–277. Springer, Heidelberg (2002)
4. Jacobs, B., Piessens, F., Leino, K.R.M., Schulte, W.: Safe concurrency for aggregate
objects with invariants. In: Aichernig, B.K., Beckert, B. (eds.) SEFM 2005, pp.
137–147. IEEE, Los Alamitos (2005)
5. Maus, S., Moskal, M., Schulte, W.: Vx86: x86 assembler simulated in C powered
by automated theorem proving. In: Meseguer, J., Roşu, G. (eds.) AMAST 2008.
LNCS, vol. 5140, pp. 284–298. Springer, Heidelberg (2008)
6. Advanced Micro Devices (AMD), Inc.: AMD64 Architecture Programmer’s Man-
ual: Vol. 1-3 (2006)
7. Intel Corporation: Intel 64 and IA-32 Architectures Software Developer’s Manual:
Vol. 1-3b (2006)
8. Flanagan, C., Leino, K.R.M., Lillibridge, M., Nelson, G., Saxe, J.B., Stata, R.:
Extended static checking for Java. SIGPLAN Notices 37(5), 234–245 (2002)
9. Barnett, M., Leino, K.R.M., Schulte, W.: The Spec# programming system: An
overview. In: Barthe, G., Burdy, L., Huisman, M., Lanet, J.-L., Muntean, T. (eds.)
CASSIS 2004. LNCS, vol. 3362, pp. 49–69. Springer, Heidelberg (2005)
10. Microsoft Research: The HAVOC property checker,
http://research.microsoft.com/projects/havoc
11. Hillebrand, M.A., Leinenbach, D.C.: Formal verification of a reader-writer lock
implementation in C. In: SSV 2009. ENTCS, Elsevier Science B.V., Amsterdam
(2009); Source code, http://www.verisoftxt.de/PublicationPage.html
12. Microsoft Research: Common compiler infrastructure,
http://ccimetadata.codeplex.com/
13. Botinĉan, M., Parkinson, M., Schulte, W.: Separation logic verification of C pro-
grams with an SMT solver. In: SSV 2009. ENTCS. Elsevier Science B.V., Amster-
dam (2009)
42 E. Cohen et al.

14. Barnett, M., Chang, B.Y.E., Deline, R., Jacobs, B., Leino, K.R.M.: Boogie: A mod-
ular reusable verifier for object-oriented programs. In: de Boer, F.S., Bonsangue,
M.M., Graf, S., de Roever, W.-P. (eds.) FMCO 2005. LNCS, vol. 4111, pp. 364–387.
Springer, Heidelberg (2006)
15. Böhme, S., Moskal, M., Schulte, W., Wolff, B.: HOL-Boogie: An interactive prover-
backend for the Verifiying C Compiler. Journal of Automated Reasoning (to ap-
pear, 2009)
16. de Moura, L., Bjørner, N.: Z3: An efficient SMT solver. In: Ramakrishnan, C.R.,
Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg
(2008)
17. Owicki, S., Gries, D.: Verifying properties of parallel programs: An axiomatic ap-
proach. Communications of the ACM 19(5), 279–285 (1976)
18. Ashcroft, E.A.: Proving assertions about parallel programs. Journal of Computer
and System Sciences 10(1), 110–135 (1975)
19. Jones, C.B.: Tentative steps toward a development method for interfering pro-
grams. ACM Transactions on Programming Languages and Systems 5(4), 596–619
(1983)
20. O’Hearn, P.W.: Resources, concurrency, and local reasoning. Theoretical Computer
Science 375(1-3), 271–307 (2007)
21. Reynolds, J.C.: Separation logic: A logic for shared mutable data structures. In:
LICS 2002, pp. 55–74. IEEE, Los Alamitos (2002)
22. Bornat, R., Calcagno, C., O’Hearn, P.W., Parkinson, M.J.: Permission accounting
in separation logic. In: Palsberg, J., Abadi, M. (eds.) POPL 2005, pp. 259–270.
ACM, New York (2005)
23. Vafeiadis, V., Parkinson, M.J.: A marriage of rely/guarantee and separation logic.
In: Caires, L., Vasconcelos, V.T. (eds.) CONCUR 2007. LNCS, vol. 4703, pp. 256–
271. Springer, Heidelberg (2007)
24. Leino, K.R.M., Müller, P.: A basis for verifying multi-threaded programs. In:
Castagna, G. (ed.) ESOP 2009. LNCS, vol. 5502, pp. 378–393. Springer, Heidelberg
(2009)
25. Leino, K.R.M., Schulte, W.: Using history invariants to verify observers. In: De
Nicola, R. (ed.) ESOP 2007. LNCS, vol. 4421, pp. 80–94. Springer, Heidelberg
(2007)
26. Klein, G.: Operating system verification – An overview. Sādhanā: Academy Pro-
ceedings in Engineering Sciences 34(1), 27–69 (2009)
27. Journal of Automated Reasoning: Operating System Verification 42(2–4) (2009)
28. Hohmuth, M., Tews, H.: The VFiasco approach for a verified operating system. In:
2nd ECOOP Workshop in Programming Languages and Operating Systems (2005)
29. Heiser, G., Elphinstone, K., Kuz, I., Klein, G., Petters, S.M.: Towards trustworthy
computing systems: Taking microkernels to the next level. SIGOPS Oper. Syst.
Rev. 41(4), 3–11 (2007)
30. Ni, Z., Yu, D., Shao, Z.: Using XCAP to certify realistic systems code: Machine
context management. In: Schneider, K., Brandt, J. (eds.) TPHOLs 2007. LNCS,
vol. 4732, pp. 189–206. Springer, Heidelberg (2007)
31. Alkassar, E., Hillebrand, M.A., Leinenbach, D.C., Schirmer, N.W., Starostin, A.,
Tsyban, A.: Balancing the load: Leveraging a semantics stack for systems verifica-
tion. Journal of Automated Reasoning: Operating System Verification 27, 389–454
Without Loss of Generality

John Harrison

Intel Corporation, JF1-13

2111 NE 25th Avenue, Hillsboro OR 97124, USA
[email protected]

Abstract. One sometimes reads in a mathematical proof that a certain assump-

tion can be made ‘without loss of generality’ (WLOG). In other words, it is
claimed that considering what first appears only a special case does neverthe-
less suffice to prove the general result. Typically the intuitive justification for this
is that one can exploit symmetry in the problem. We examine how to formalize
such ‘WLOG’ arguments in a mechanical theorem prover. Geometric reasoning
is particularly rich in examples and we pay special attention to this area.

1 Introduction
Mathematical proofs sometimes state that a certain assumption can be made ‘without
loss of generality’, often abbreviated to ‘WLOG’. The phase suggest that although mak-
ing the assumption at first sight only proves the theorem in a more restricted case, this
does nevertheless justify the theorem in full generality. What is the intuitive justification
for this sort of reasoning? Occasionally the phrase covers situations where we neglect
special cases that are obviously trivial for other reasons. But more usually it suggests
the exploitation of symmetry in the problem. For example, consider Schur’s inequality,
which asserts that for any nonnegative real numbers a, b and c and integer k ≥ 0 one
has 0 ≤ ak (a − b)(b − c) + bk (b − a)(b − c) + ck (c − a)(c − b). A typical proof might
begin:
Without loss of generality, let a ≤ b ≤ c.
If asked to spell this out in more detail, we might say something like:
Since ≤ is a total order, the three numbers must be ordered somehow, i.e. we
must have (at least) one of a ≤ b ≤ c, a ≤ c ≤ b, b ≤ a ≤ c, b ≤ c ≤ a,
c ≤ a ≤ b or c ≤ b ≤ a. But the theorem is completely symmetric between
a, b and c, so each of these cases is just a version of the other with a change of
variables, and we may as well just consider one of them.
Suppose that we are interested in formalizing mathematics in a mechanical theorem
prover. Generally speaking, for an experienced formalizer it’s rather routine to take an
existing proof and construct a formal counterpart, even though it may require a great
deal of work to get things just right and encourage the proof assistant check all the
details. But with such ‘without loss of generality’ constructs, it’s not immediately ob-
vious what the formal counterpart should be. We can plausibly suggest two possible
formalizations:

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 43–59, 2009.
c Springer-Verlag Berlin Heidelberg 2009
44 J. Harrison

– The phrase may be an informal shorthand saying ‘we should really do 6 very similar
proofs here, but if we do one, all the others are exactly analogous and can be left to
the reader’.
– The phrase may be asserting that ‘by a general logical principle, the apparently
more general case and the special WLOG case are in fact equivalent (or at least the
special case implies the general one)’.
The former point of view can be quite natural in a computer proof assistant. If we
have a proof script covering one of the 6 cases, we might simply perform a 6-way
case-split and for each case use a duplicate of the initial script, changing the names of
variables systematically in an editor. Indeed, if we have a programmable proof assistant,
it would be more elegant to write a general parametrized proof script that we could use
for all 6 cases with different parameters. This sort of programming is exactly the kind
of thing that LCF-style systems [3] like HOL [2] are designed to make easy via their
‘metalanguage’ ML, and sometimes its convenience makes it irresistible. However, this
approach is open to criticism on at least three grounds:
– Ugly/clumsy
– Inefficient
– Not faithful to the informal proof.
Indeed, it seems unnatural, even with the improvement of using a parametrized script, to
perform essentially the same proof 6 different times, and if each proof takes a while to
run, it could waste computer resources. And it is arguably not what the phrase ‘without
loss of generality’ is meant to conjure up. If the book had intended that interpretation, it
would probably have said something like ‘the other cases are similar and are left to the
reader’. So let us turn to how we might formalize and use a general logical principle.

2 A HOL Light Proof of Schur’s Inequality

In fact, in HOL Light there is already a standard theorem with an analogous principle
for a property of two real numbers:
REAL_WLOG_LE =
|- (∀x y. P x y ⇔ P y x) ∧
(∀x y. x <= y ⇒ P x y)
⇒ (∀x y. P x y)

This asserts that for any property P of two real numbers, if the property is symmetric
between those two numbers (∀x y. P x y ⇔ P y x) and assuming x ≤ y the property
holds (∀x y. x ≤ y ⇒ P x y), then we can conclude that it holds for all real numbers
(∀x y. P x y). In order to tackle the Schur inequality we will prove a version for
three variables. Our chosen formulation is quite analogous, but using a more minimal
formulation of symmetry between all three variables:
REAL_WLOG_3_LE =
|- (∀x y z. P x y z ⇒ P y x z ∧ P x z y) ∧
(∀x y z. x <= y ∧ y <= z ⇒ P x y z)
⇒ (∀x y z. P x y z)
Without Loss of Generality 45

The proof is relatively straightforward following the informal intuition: we observe

that one of the six possible ordering sequences must occur, and in each case we can
deduce the general case from the more limited one and symmetry. The following is the
tactic script to prove REAL_WLOG_3_LE:

REPEAT STRIP_TAC THEN (STRIP_ASSUME_TAC o REAL_ARITH)

‘x <= y ∧ y <= z ∨ x <= z ∧ z <= y ∨ y <= x ∧ x <= z ∨
y <= z ∧ z <= x ∨ z <= x ∧ x <= y ∨ z <= y ∧ y <= x‘ THEN
ASM_MESON_TAC[]

Now let us see how to use this to prove Schur’s inequality in HOL Light, which we
formulate as follows:

|- ∀k a b c. &0 <= a ∧ &0 <= b ∧ &0 <= c

⇒ &0 <= a pow k * (a - b) * (a - c) +
b pow k * (b - a) * (b - c) +
c pow k * (c - a) * (c - b)

The first step in the proof is to strip off the additional variable k (which will not
play a role in the symmetry argument), use backwards chaining with the WLOG the-
orem REAL_WLOG_3_LE, and then break the resulting goal into two subgoals, one
corresponding to the symmetry and the other to the special case.

GEN_TAC THEN MATCH_MP_TAC REAL_WLOG_3_LE THEN CONJ_TAC

The first subgoal, corresponding to symmetry of the problem, is the following:

‘∀a b c. (&0 <= a ∧ &0 <= b ∧ &0 <= c

⇒ &0 <= a pow k * (a - b) * (a - c) +
b pow k * (b - a) * (b - c) +
c pow k * (c - a) * (c - b))
⇒ (&0 <= b ∧ &0 <= a ∧ &0 <= c
⇒ &0 <= b pow k * (b - a) * (b - c) +
a pow k * (a - b) * (a - c) +
c pow k * (c - b) * (c - a)) ∧
(&0 <= a ∧ &0 <= c ∧ &0 <= b
⇒ &0 <= a pow k * (a - c) * (a - b) +
c pow k * (c - a) * (c - b) +
b pow k * (b - a) * (b - c))‘

Although this looks rather large, the proof simply exploits the fact that addition and
multiplication are associative and commutative via routine logical reasoning, so we can
solve it by:

MESON_TAC[REAL_ADD_AC; REAL_MUL_AC]
46 J. Harrison

We have now succeeded in reducing the original goal to the special case:

‘∀a b c. a <= b ∧ b <= c

⇒ &0 <= a ∧ &0 <= b ∧ &0 <= c
⇒ &0 <= a pow k * (a - b) * (a - c) +
b pow k * (b - a) * (b - c) +
c pow k * (c - a) * (c - b)‘

and so we can claim that the foregoing proof steps correspond almost exactly to the
informal WLOG principle. We now rewrite the expression into a more convenient form:

REPEAT STRIP_TAC THEN ONCE_REWRITE_TAC[REAL_ARITH

‘a pow k * (a - b) * (a - c) +
b pow k * (b - a) * (b - c) +
c pow k * (c - a) * (c - b) =
(c - b) * (c pow k * (c - a) - b pow k * (b - a)) +
a pow k * (c - a) * (b - a)‘]

The form of this expression is now congenial, so we can simply proceed by repeat-
edly chaining through various monotonicity theorems and then use linear arithmetic
reasoning to finish the proof:

REPEAT(FIRST(map MATCH_MP_TAC
[REAL_LE_ADD; REAL_LE_MUL; REAL_LE_MUL2]) THEN
ASM_SIMP_TAC[REAL_POW_LE2; REAL_POW_LE; REAL_SUB_LE] THEN
REPEAT CONJ_TAC) THEN
ASM_REAL_ARITH_TAC

We have therefore succeeded in deploying WLOG reasoning in a natural way and

following a standard textbook proof quite closely. However, a remaining weak spot is
the proof of the required symmetry for the particular problem. In this case, we were
just able to use standard first-order automation (MESON_TAC) to deduce this symmetry
from the associativity and commutativity of the two main operations involved (real ad-
dition and multiplication). However, we can well imagine that in more complicated sit-
uations, this kind of crude method might be tedious or impractical. We will investigate
how to approach this more systematically using reasoning from a somewhat different
domain.

3 WLOG Reasoning in Geometry

Geometry is particularly rich in WLOG principles, perhaps reflecting the fundamen-

tal importance in geometry of property-preserving transformations. The modern view
of geometry has been heavily influenced by Klein’s “Erlanger Programm” [7], which
emphasizes the role of transformations and invariance under classes of transformations,
while modern physical theories usually regard conservation laws as manifestations of
Without Loss of Generality 47

invariance properties: the conservation of angular momentum arises from invariance un-
der rotations, while conservation of energy arises from invariance under shifts in time,
and so on [8].
One of the most important ways in which such invariances are used in proofs is to
make a convenient choice of coordinate system. In our formulation of Euclidean space
in HOL Light [6], geometric concepts are all defined in analytic terms using vectors,
which in turn are expressed with respect to a standard coordinate basis. For example,
the angle formed by three points is defined in terms of the angle between two vectors:

|- angle(a,b,c) = vector_angle (a - b) (c - b)

which is defined in terms of norms and dot products using the inverse cosine function
acs (degenerating to π/2 if either vector is zero):

|- vector_angle x y =
if x = vec 0 ∨ y = vec 0 then pi / &2
else acs((x dot y) / (norm x * norm y))

where norms are defined in terms of dot products:

|- ∀x. norm x = sqrt(x dot x)

products in RN are defined in terms of the N components in the usual

and finally dot
way as x · y = N i=1 xi yi , or in HOL Light:

|- x dot y = sum(1..dimindex(:N)) (ı. x$i * y$i)

This means that whenever we state geometric theorems, most of the concepts ul-
timately rest on a particular choice of coordinate system and standard basis vectors.
When we are performing high-level reasoning, we can often reason about geometric
concepts directly using lemmas established earlier without ever dropping down to the
ultimate representation with respect to the standard basis. But when we do need to
reason algebraically in terms of coordinates, we often find that a different choice of
coordinate system would make the reasoning much more tractable.
The simplest example is probably choosing the origin of the coordinate system. If a
proposition ∀x. P [x] is invariant under spatial translation, i.e. changing x to any a + x,
then it suffices to prove the special case P [0], or in other words, to assume without loss
of generality that x is the origin. The reasoning is essentially trivial: if we have P [0]
and also ∀a x. P [x] ⇒ P [a + x], then we can deduce P [x + 0] and so P [x]. In HOL
Light we can state this as the following general theorem, asserting that if P is invariant
under translation and we have the special case P [0], then we can conclude ∀x. P [x]:

WLOG_ORIGIN =
|- (∀a x. P(a + x) ⇔ P x) ∧ P(vec 0) ⇒ (∀x. P x)

Thus, when confronted with a goal, we can simply rearrange the universally quanti-
fied variables so that the one we want to take as the origin is at the outside, then apply
48 J. Harrison

this theorem, giving us the special case P [0] together with the invariance of the goal
under translation. For example, suppose we want to prove that the angles of a triangle
that is not completely degenerate all add up to π radians (180 degrees):

‘∀A B C. ˜(A = B ∧ B = C ∧ A = C)
⇒ angle(B,A,C) + angle(A,B,C) + angle(B,C,A) = pi‘

If we apply our theorem by MATCH_MP_TAC WLOG_ORIGIN and split the result-

ing goal into two conjuncts, we get one subgoal corresponding to the special case when
A is the origin:

‘∀B C.
˜(vec 0 = B ∧ B = C ∧ vec 0 = C)
⇒ angle(B,vec 0,C) + angle(vec 0,B,C) + angle(B,C,vec 0) =
pi‘

and another goal for the invariance of the property under translation by a:

‘∀a A. (∀B C.
˜(a + A = B ∧ B = C ∧ a + A = C)
⇒ angle(B,a + A,C) +
angle(a + A,B,C) + angle(B,C,a + A) = pi) ⇔
(∀B C.
˜(A = B ∧ B = C ∧ A = C)
⇒ angle(B,A,C) + angle(A,B,C) + angle(B,C,A) = pi)‘

We will not dwell more on the detailed proof of the theorem in the special case where
A is the origin, but will instead focus on the invariance proof. In contrast to the case of
Schur’s inequality, this is somewhat less easy and can’t obviously be deferred to basic
first-order automation. So how do we prove it?
At first sight, things don’t look right: it seems that we ought to have translated not
just A but all the variables A, B and C together. However, note that for any given a the
translation mapping x → a + x is surjective: for any y there is an x such that a + x = y
(namely x = y − a). That means that we can replace universal quantifiers over vec-
tors, and even existential ones too, by translated versions. This general principle can be
embodied in the following HOL theorem, easily proven automatically by MESON_TAC:

QUANTIFY_SURJECTION_THM =
|- ∀f:A->B.
(∀y. ∃x. f x = y)
⇒ (∀P. (∀x. P x) ⇔ (∀x. P (f x))) ∧
(∀P. (∃x. P x) ⇔ (∃x. P (f x))) ∧

We can apply it with a bit of instantiation and higher-order rewriting to all the uni-
versally quantified variables on the left-hand side of the equivalence in the goal and
obtain:
Without Loss of Generality 49

‘∀a A.
(∀B C.
˜(a + A = a + B ∧ a + B = a + C ∧ a + A = a + C)
⇒ angle(a + B,a + A,a + C) +
angle(a + A,a + B,a + C) +
angle(a + B,a + C,a + A) = pi) ⇔
(∀B C.
˜(A = B ∧ B = C ∧ A = C)
⇒ angle(B,A,C) + angle(A,B,C) + angle(B,C,A) = pi)‘

Now things are becoming better. First of all, it is clear that a + A = a + B ⇔

A = B etc. just by general properties of vector addition. As for angles, recall that
angle(x,y,z) is defined as the vector angle between the two differences x − y and
z − y. Because it is defined in terms of such differences, it again follows from basic
properties of vector addition that, for example, (a + B) − (a + A) = A − B, and so we
can deduce the invariance property that we seek.
This is all very well, but the process is quite laborious. We have to carefully apply
translation to all the quantified variables just once so that we don’t get into an infinite
loop, and then we have to appeal to suitable basic invariance theorems for pretty much
all the concepts that appear in our theorems. Even in this case, doing so is not entirely
trivial, and for more involved theorems it can be worse, as Hales [5] notes:

[. . . ] formal proofs by symmetry are much harder than anticipated. It was nec-
essary to give a total of nearly a hundred lemmas, showing that the symmetries
preserve all of the relevant structures, all the way back to the foundations.

Indeed, this process seems unpleasant enough that we should consider automating it,
and for geometric invariants this is just what we have done.

4 Tactics Using Invariance under Translation

Our WLOG tactic for choosing the origin is based on a list of theorems asserting in-
variance under translation for various geometric concepts, stored in a reference variable
invariant_under_translation. The vision is that each time a new geomet-
ric concept (angle, collinear, etc.) is defined, one proves a corresponding invariance
theorem and adds it to this list, so that thereafter the invariance will be exploitable au-
tomatically by the WLOG tactic. For example, the entry corresponding to angle is

|- ∀a b c d. angle(a + b,a + c,a + d) = angle(b,c,d)

While we usually aim to prove that numerical functions of vectors (e.g. distances
or angles) or predicates on vectors (e.g. collinearity) are completely invariant under
translation, for operations returning more vectors, we normally want to prove that the
translation can be ‘pulled outside’, e.g.

|- ∀a x y. midpoint(a + x,a + y) = a + midpoint (x,y)

50 J. Harrison

Then a translated formula can be systematically mapped into its untranslated form
by applying these transformations in a bottom-up fashion, pulling the translation up
through vector-producing functions like midpoint and then systematically eliminat-
ing them when they reach the level of predicates or numerical functions of vectors.
Our setup is somewhat more ambitious in that it applies not only to properties of
vectors but also to properties of sets of vectors, many of which are also invariant under
translation. For example, recall that a set is convex if whenever it contains the points x
and y it also contains each intermediate point between x and y, i.e. each ux + vy where
0 ≤ u, v and u + v = 1:

|- ∀s. convex s ⇔
(∀x y u v.
x IN s ∧ y IN s ∧ &0 <= u ∧ &0 <= v ∧ u + v = &1
⇒ u % x + v % y IN s)

This is invariant under translation in the following sense:

|- ∀a s. convex (IMAGE (λx. a + x) s) ⇔ convex s

as are many other geometric or topological predicates (bounded, closed, compact, path-
connected, . . . ) and numerical functions on sets such as measure (area, volume etc.
depending on dimension):

|- ∀a s. measure (IMAGE (λx. a + x) s) = measure s

As with points, for functions that return other sets of vectors, our theorems state
rather that the ‘image under translation’ operation can be pulled up through the function,
e.g.

|- ∀a s. convex hull IMAGE (λx. a + x) s =

IMAGE (λx. a + x) (convex hull s)

We include in the list other theorems of the same type for the basic set operations, so
that they can be handled as well, e.g.

|- ∀a s t. IMAGE (λy. a + y) s UNION IMAGE (λy. a + y) t =

IMAGE (λy. a + y) (s UNION t)

|- ∀a s t. IMAGE (λy. a + y) s SUBSET IMAGE (λy. a + y) t ⇔

s SUBSET t

Our conversion (GEOM_ORIGIN_CONV) and corresponding tactic that is defined

in terms of it (GEOM_ORIGIN_TAC) work by automating the process sketched in the
special case in the previous section. First, they apply the basic reduction so that we
need to prove equivalence when one nominated variable is translated. Then the other
quantifiers are modified to apply similar translation to the other variables, even if quan-
tification is nested in a complicated way. We use an enhanced version of the theorem
Without Loss of Generality 51

QUANTIFY_SURJECTION_THM which applies a similarly systematic modification to

quantifiers over sets of vectors and set comprehensions such as {x | angle(a, b, x) =
π/3}:

|- ∀f. (∀y. ∃x. f x = y)

⇒ (∀P. (∀x. P x) ⇔ (∀x. P (f x))) ∧
(∀P. (∃x. P x) ⇔ (∃x. P (f x))) ∧
(∀Q. (∀s. Q s) ⇔ (∀s. Q (IMAGE f s))) ∧
(∀Q. (∃s. Q s) ⇔ (∃s. Q (IMAGE f s))) ∧
(∀P. {x | P x} = IMAGE f {x | P (f x)})

With this done, it remains only to rewrite with the invariance theorems taken from
the list invariant_under_translation in a bottom-up sweep. If the intended
result uses only these properties in a suitable fashion, then this should automatically
reduce the invariance goal to triviality. The user does not even see it, but is presented
instead with the special case. (If the process of rewriting does not solve the invariance
goal, then that is returned as an additional subgoal so that the user can either help the
proof along manually or perhaps observe that a concept is used for which no invariance
theorem has yet been stored.) For example, if we set out to prove the formula for the
volume of a ball:
‘∀z:realˆ3 r. &0 <= r
⇒ measure(cball(z,r)) = &4 / &3 * pi * r pow 3‘

a simple application of GEOM_ORIGIN_TAC ‘z:realˆ3‘ reduces it to the special

case when the ball is centered at the origin:

‘∀r. &0 <= r

⇒ measure(cball(vec 0,r)) = &4 / &3 * pi * r pow 3‘

Here is an example with a more complicated quantifier structure and a mix of sets
and points. We want to prove that for any point a and nonempty closed set s there is a
closest point of s to a. (A set is closed if it contains all its limit points, i.e. all points that
can be approached arbitrarily closely by a member of the set.) We set up the goal:

g ‘∀s a:realˆN.
closed s ∧ ˜(s = {})
⇒ ∃x. x IN s ∧
(∀y. y IN s ⇒ dist(a,x) <= dist(a,y))‘;;

and with a single application of our tactic, we can suppose the point in question is the
origin:

# e(GEOM_ORIGIN_TAC ‘a:realˆN‘);;
val it : goalstack = 1 subgoal (1 total)

‘∀s. closed s ∧ ˜(s = {})

⇒ (∃x. x IN s ∧
(∀y. y IN s ⇒ dist(vec 0,x) <= dist(vec 0,y)))‘
52 J. Harrison

5 Tactics Using Invariance under Orthogonal Transformations

This is just one of several analogous tactics that we have defined. Many other tactics
also exploit the invariance of many properties under orthogonal transformations. These
are essentially maps f : RN → RN that are linear and preserve dot products:

|- ∀f. orthogonal_transformation f ⇔
linear f ∧ (∀v w. f v dot f w = v dot w)

where linearity of a function f : RM → RN is defined as

|- ∀f. linear f ⇔
(∀x y. f(x + y) = f x + f y) ∧
(∀c x. f(c % x) = c % f x)

Orthogonal transformations can be characterized in various other ways. For exam-

ple, a linear map f is an orthogonal transformation iff its corresponding matrix is an
orthogonal matrix:

|- orthogonal_transformation f ⇔
linear f ∧ orthogonal_matrix(matrix f)

where an N × N matrix Q is orthogonal if its transpose is also its inverse, i.e. QT Q =

QQT = 1:

|- orthogonal_matrix(Q) ⇔
transp(Q) ** Q = mat 1 ∧ Q ** transp(Q) = mat 1‘;;

It is easy to prove that the determinant of an orthogonal matrix is either 1 or −1,

and this gives a classification of orthogonal transformations into ‘rotations’, where the
matrix has determinant 1 and ‘rotoinversions’ where the matrix has determinant −1.
Intuitively, rotations do indeed correspond to rotation about the origin in n-dimensional
space, while rotoinversions involve additional reflections. For example, in two dimen-
sions, each rotation matrix is of the form

cos(θ) − sin(θ)
sin(θ) cos(θ)

where θ is the (anticlockwise) angle of rotation. Invariance under orthogonal transfor-

mation is used in several tactics that allow us to transform a particular nonzero vector
into another more convenient one of the same magnitude. The following theorem guar-
antees us that given any two vectors a and b in RN of the same magnitude, there exists
an orthogonal transformation that maps one into the other:

|- ∀a b:realˆN.
norm(a) = norm(b)
⇒ ∃f. orthogonal_transformation f ∧ f a = b
Without Loss of Generality 53

If we furthermore want f : RN → RN be a rotation, then almost the same theorem is

true, except that we need the dimension to be at least 2. (An orthogonal transformation
taking a vector into its negation in R1 must have a matrix with determinant −1.)

|- ∀a b:realˆN.
2 <= dimindex(:N) ∧ norm(a) = norm(b)
⇒ ∃f. orthogonal_transformation f ∧
det(matrix f) = &1 ∧
f a = b

Just as a reference variable invariant_under_translation is used to store

theorems asserting the invariance of various concepts under translation, we use a sec-
ond reference variable invariant_under_linear to store analogous theorems
for invariance under linear transformations. These in general apply to slightly different
classes of linear transformations, almost all of which are more general than orthogonal
transformations. For each concept we try to use the most general natural class of linear
mappings. Some theorems apply to all linear maps, e.g. the one for convex hulls:

|- ∀f s. linear f
⇒ convex hull IMAGE f s = IMAGE f (convex hull s)

Some apply to all injective linear maps, e.g. those for closedness of a set:

|- ∀f s. linear f ∧ (∀x y. f x = f y ⇒ x = y)
⇒ (closed (IMAGE f s) ⇔ closed s)

Some apply to all bijective (injective and surjective) linear maps, e.g. those for
openness of a set:

|- ∀f s. linear f ∧
(∀x y. f x = f y ⇒ x = y) ∧ (∀y. ∃x. f x = y)
⇒ (open (IMAGE f s) ⇔ open s)

Some apply to all norm-preserving linear maps, e.g. those for angles:

|- ∀f a b c. linear f ∧ (∀x. norm(f x) = norm x)

⇒ angle(f a,f b,f c) = angle(a,b,c)

Note that a norm-preserving linear map is also injective, so this property also suffices
for all those requiring injectivity. For a function f : RN → RN this property is precisely
equivalent to being an orthogonal transformation:

|- ∀f:realˆN->realˆN.
orthogonal_transformation f ⇔
linear f ∧ (∀v. norm(f v) = norm v)
54 J. Harrison

However, it is important for some other related applications (an example is below)
that we make theorems applicable to maps where the dimensions of the domain and
codomain spaces are not necessarily the same.
Finally, the most restrictive requirement applies to just one theorem, the one for the
vector cross product. This has a kind of chirality, so may have its sign changed by a
general orthogonal transformation. Its invariance theorem requires a rotation of type
R3 → R3 :

|- ∀f x y. linear f ∧
(∀x. norm(f x) = norm x) ∧ det(matrix f) = &1
⇒ (f x) cross (f y) = f(x cross y)

We actually store the theorem in a slightly peculiar form, which makes it easier to
apply uniformly in a framework where we can assume a transformation is a rotation
except in dimension 1:

|- ∀f x y. linear f ∧ (∀x. norm(f x) = norm x) ∧

(2 <= dimindex(:3) ⇒ det(matrix f) = &1)
⇒ (f x) cross (f y) = f(x cross y)

We can implement various tactics that exploit our invariance theorems to make vari-
ous simplifying transformations without loss of generality:

– GEOM_BASIS_MULTIPLE_TAC chooses an orthogonal transformation or rota-

tion to transform a vector into a nonnegative multiple of a chosen basis vector.
– GEOM_HORIZONTAL_PLANE_TAC chooses a combination of a translation and
orthogonal transformation to transform a plane p in R3 into a ‘horizontal’ one
{(x, y, z) | z = 0}.
– PAD2D3D_TAC transforms a problem in R3 where all points have zero third
coordinate into a corresponding problem in R2 .

The first two work in much the same way as the earlier tactic for choosing the ori-
gin. We apply the general theorem, modify all the other quantified variables and then
rewrite with invariance theorems. We can profitably think of the basic processes in
such cases as instances of general HOL theorems, though this is not actually how they
are implemented. For example, we might say that if for each x we can find a ‘trans-
form’ (e.g. translation, or orthogonal transformation) f such that f (x) is ‘nice’ (e.g. is
zero, or a multiple of some basis vector), and can also deduce for any ‘transform’ that
P (f (x)) ⇔ P (x), then proving P (x) for all x is equivalent to proving it for ‘nice’ x.
(The theorem that follows is automatically proved by MESON.)

|- ∀P. (∀x. ∃f. transform(f) ∧ nice(f x)) ∧

(∀f x. transform(f) ⇒ (P(f x) ⇔ P x))
⇒ ((∀x. P x) ⇔ (∀x. nice(x) ⇒ P(x)))

However, in some more general situations we don’t exactly want to show that P (f (x))
⇔ P (x), but rather that P (f (x)) ⇔ P (x) for some related but not identical property
Without Loss of Generality 55

P , for example if we want to transfer a property to a different type. For this reason, it
is actually more convenient to observe that we can choose a ‘transform’ from a ‘nice’
value rather than to it, i.e. rely on the following:

|- ∀P P’. (∀x. ∃f y. transform(f) ∧ nice(y) ∧ f y = x) ∧

(∀f x. transform(f) ∧ nice x ⇒ (P(f x) ⇔ P’ x))
⇒ ((∀x. P x) ⇔ (∀y. nice(y) ⇒ P’(y)))‘

The advantage of this is that in our approach based on rewriting by applying in-
variance theorems, the new property P can emerge naturally from the rewriting of
P (f (x)), instead of requiring extra code for its computation. Even in cases where the
generality is not needed, we typically use this structure, i.e. choose our mapping from a
‘nice’ value.

6 An Extended Example
Let us see a variety of our tactics at work on a problem that was, in fact, the original
motivation for most of the work described here.

‘∀u1:realˆ3 u2 p a b.
˜(u1 = u2) ∧
plane p ∧
{u1,u2} SUBSET p ∧
dist(u1,u2) <= a + b ∧
abs(a - b) < dist(u1,u2) ∧
&0 <= a ∧
&0 <= b
⇒ (∃d1 d2. {d1,d2} SUBSET p ∧
&1 / &2 % (d1 + d2) IN affine hull {u1, u2} ∧
dist(d1,u1) = a ∧
dist(d1,u2) = b ∧
dist(d2,u1) = a ∧
dist(d2,u2) = b)‘

The first step is to assume without loss of generality that the plane p is {(x, y, z) |
z = 0}, i.e. the set of points whose third coordinate is zero, following which we man-
ually massage the goal so that the quantifiers over u1 , u2 , d1 and d2 carry explicit
restrictions:

# e(GEOM_HORIZONTAL_PLANE_TAC ‘p:realˆ3->bool‘ THEN

ONCE_REWRITE_TAC[TAUT
‘a ∧ b ∧ c ∧ d ⇒ e ⇔ c ∧ a ∧ b ∧ d ⇒ e‘] THEN
REWRITE_TAC[INSERT_SUBSET; EMPTY_SUBSET] THEN
REWRITE_TAC[IMP_CONJ; RIGHT_FORALL_IMP_THM] THEN
REWRITE_TAC[GSYM CONJ_ASSOC; RIGHT_EXISTS_AND_THM] THEN
REWRITE_TAC[IN_ELIM_THM]);;
56 J. Harrison

which produces the result:

‘∀u1. u1$3 = &0

⇒ (∀u2. u2$3 = &0
⇒ ˜(u1 = u2)
⇒ plane {z | z$3 = &0}
⇒ (∀a b.
dist(u1,u2) <= a + b
⇒ abs(a - b) < dist(u1,u2)
⇒ &0 <= a
⇒ &0 <= b
⇒ (∃d1. d1$3 = &0 ∧
(∃d2. d2$3 = &0 ∧
&1 / &2 % (d1 + d2) IN
affine hull {u1, u2} ∧
dist(d1,u1) = a ∧
dist(d1,u2) = b ∧
dist(d2,u1) = a ∧
dist(d2,u2) = b))))‘

Now we apply another WLOG tactic to reduce the problem from R3 to R2 , and again
make a few superficial rearrangements:

# e(PAD2D3D_TAC THEN
SIMP_TAC[RIGHT_IMP_FORALL_THM; IMP_IMP; GSYM CONJ_ASSOC]);;

resulting in:

‘∀u1 u2 a b.
˜(u1 = u2) ∧
plane {z | z$3 = &0} ∧
dist(u1,u2) <= a + b ∧
abs(a - b) < dist(u1,u2) ∧
&0 <= a ∧
&0 <= b
⇒ (∃d1 d2.
&1 / &2 % (d1 + d2) IN affine hull {u1, u2} ∧
dist(d1,u1) = a ∧
dist(d1,u2) = b ∧
dist(d2,u1) = a ∧
dist(d2,u2) = b)‘

Although HOL Light does not by default show the types, all the vector variables
are now in R2 instead of R3 (except for the bound variable z in the residual planarity
hypothesis, which is no longer useful anyway). Having collapsed the problem from 3
dimensions to 2 in this way, we finally choose u1 as the origin:
Without Loss of Generality 57

# e(GEOM_ORIGIN_TAC ‘u1:realˆ2‘);;
val it : goalstack = 1 subgoal (1 total)

‘∀u2 a b.
˜(vec 0 = u2) ∧
plane {z | z$3 = &0} ∧
dist(vec 0,u2) <= a + b ∧
abs(a - b) < dist(vec 0,u2) ∧
&0 <= a ∧
&0 <= b
⇒ (∃d1 d2.
&1 / &2 % (d1 + d2) IN affine hull {vec 0, u2} ∧
dist(d1,vec 0) = a ∧
dist(d1,u2) = b ∧
dist(d2,vec 0) = a ∧
dist(d2,u2) = b)‘

and now u2 as a multiple of the first standard basis vector:

# e(GEOM_BASIS_MULTIPLE_TAC 1 ‘u2:realˆ2‘);;
val it : goalstack = 1 subgoal (1 total)

‘∀u2. &0 <= u2

⇒ (∀a b.
˜(vec 0 = u2 % basis 1) ∧
plane {z | z$3 = &0} ∧
dist(vec 0,u2 % basis 1) <= a + b ∧
abs(a - b) < dist(vec 0,u2 % basis 1) ∧
&0 <= a ∧
&0 <= b
⇒ (∃d1 d2.
&1 / &2 % (d1 + d2) IN
affine hull {vec 0, u2 % basis 1} ∧
dist(d1,vec 0) = a ∧
dist(d1,u2 % basis 1) = b ∧
dist(d2,vec 0) = a ∧
dist(d2,u2 % basis 1) = b))‘

We have thus reduced the original problem to a nicely oriented situation where the
points we consider live in 2-dimensional space and are of the form (0, 0) and (0, u2 ).
The final coordinate geometry is now relatively straightforward.

7 Future Work
Our battery of tactics so far is already a great help in proving geometric theorems. There
are several possible avenues for improvement and further development.
58 J. Harrison

One is to make use of still broader classes of transformations when handling theo-
rems about correspondingly narrower classes of concepts. For example, some geometric
properties, e.g. those involving collinearity and incidence but not distances and angles,
are invariant under still broader classes of transformations, such as shearing, and this
can be of use in choosing an even more convenient coordinate system — see for exam-
ple the proof of Pappus’s theorem given by Chou [1]. Other classes of theorems behave
nicely under scaling, so we can freely turn some point (0, a) = (0, 0) into just (0, 1) and
so eliminate another variable. Indeed, for still more restricted propositions, e.g. those
involving only topological properties, we can consider continuous maps that may not
be linear.
It would also be potentially interesting to extend the process to additional ‘higher-
order’ properties. To some extent, we already do this with our support for sets of vectors,
but we could take it much further, e.g. considering properties of sequences and series
and their limits. A nice example where we would like to exploit a higher-order invari-
ance arises in proving that every polygon has a triangulation. The proof given in [4]
says: ‘Pick the coordinate axis so that no two vertices have the same y coordinate’. It
should not be difficult to extend the methods here to prove invariance of notions like
‘triangulation of’, and we could then pick a suitable orthogonal transformation to force
the required property (there are only finitely many vertices but uncountably many angles
of rotation to choose).
Another interesting idea would be to reformulate the process in a more ‘metalogical’
or ‘reflective’ fashion, by formalizing the class of problems for which our transforma-
tions suffice once and for all, instead of rewriting with the current selection of theorems
and then either succeeding or failing. From a practical point of view, we think our
current approach is usually better. It is actually appealing not to delimit the class of
permissible geometric properties, but have that class expand automatically as new in-
variance theorems are added. Moreover, to use the reflective approach we would need to
map into some formal syntax, which needs similar transformations anyway. However,
there may be some situations where it would be easier to prove general properties in a
metatheoretic fashion. For example, a first-order assertion over vectors with M vector
variables, even if the pattern of quantification is involved, can be reduced to spaces of
dimension ≤ M [9]. It should be feasible to handle important special cases (e.g. purely
universal formulas) within our existing framework, but exploiting the full result might
be a good use for metatheory.

Acknowledgements
The author is grateful to Truong Nguyen, whose stimulating questions on the Flyspeck
project mailing list were the inspiration for most of this work.

References
1. Chou, S.-C.: Proving elementary geometry theorems using Wu’s algorithm. In: Bledsoe, W.W.,
Loveland, D.W. (eds.) Automated Theorem Proving: After 25 Years. Contemporary Mathe-
matics, vol. 29, pp. 243–286. American Mathematical Society, Providence (1984)
Without Loss of Generality 59

2. Gordon, M.J.C., Melham, T.F.: Introduction to HOL: a theorem proving environment for
higher order logic. Cambridge University Press, Cambridge (1993)
3. Gordon, M., Wadsworth, C.P., Milner, R.: Edinburgh LCF. LNCS, vol. 78. Springer,
Heidelberg (1979)
4. Hales, T.C.: Easy pieces in geometry (2007),
http://www.math.pitt.edu/˜thales/papers/
5. Hales, T.C.: The Jordan curve theorem, formally and informally. The American Mathematical
Monthly 114, 882–894 (2007)
6. Harrison, J.: A HOL theory of Euclidean space. In: Hurd, J., Melham, T. (eds.) TPHOLs 2005.
LNCS, vol. 3603, pp. 114–129. Springer, Heidelberg (2005)
7. Klein, F.: Vergleichende Betrachtungen ber neuere geometrische Forschungen. Mathematische
Annalen 43, 63–100 (1893); Based on the speech given on admission to the faculty of the
Univerity of Erlang in 1872. English translation “A comparative review of recent researches
in geometry” in Bulletin of the New York Mathematical Society 2, 460–497 (1892-1893)
8. Noether, E.: Invariante Variationsprobleme. Nachrichten von der Königlichen Gesellschaft der
Wissenschaften zu Gttingen: Mathematisch-physikalische Klasse, 235–257 (1918); English
translation “Invariant variation problems” by M.A. Travel in ‘Transport Theory and Statistical
Physics’, 1, 183–207 (1971)
9. Solovay, R.M., Arthan, R., Harrison, J.: Some new results on decidability for elementary alge-
bra and geometry. ArXiV preprint 0904.3482 (2009); submitted to Annals of Pure and Applied
Logic, http://arxiv.org/PS_cache/arxiv/pdf/0904/0904.3482v1.pdf
HOL Light: An Overview

John Harrison

Intel Corporation, JF1-13

2111 NE 25th Avenue
Hillsboro OR 97124
[email protected]

Abstract. HOL Light is an interactive proof assistant for classical

higher-order logic, intended as a clean and simpliﬁed version of Mike
Gordon’s original HOL system. Theorem provers in this family use a
version of ML as both the implementation and interaction language; in
HOL Light’s case this is Objective CAML (OCaml). Thanks to its ad-
herence to the so-called ‘LCF approach’, the system can be extended
with new inference rules without compromising soundness. While retain-
ing this reliability and programmability from earlier HOL systems, HOL
Light is distinguished by its clean and simple design and extremely small
logical kernel. Despite this, it provides powerful proof tools and has been
applied to some non-trivial tasks in the formalization of mathematics
and industrial formal veriﬁcation.

1 LCF, HOL and HOL Light

Both HOL Light and its implementation language OCaml can trace their origins
back to Edinburgh LCF, developed by Milner and his research assistants in the
1970s [6]. The LCF approach to theorem proving involves two key ideas:

– All proofs are ultimately performed in terms of a small set of primitive

inferences, so provided this small logical ‘kernel’ is correct the results should
be reliable.
– The entire system is embedded inside a powerful functional programming
language, which can be used to program new inference rules. The type dis-
cipline of the programming language is used to ensure that these ultimately
reduce to the primitives.

The original Edinburgh LCF was a theorem prover for Scott’s Logic of Com-
putable Functions [16], hence the name LCF. But as emphasized by Gordon [4],
the basic LCF approach is applicable to any logic, and now there are descen-
dents implementing a variety of higher order logics, set theories and constructive
type theories. In particular, members of the HOL family [5] implement a ver-
sion of classical higher order logic, hence the name HOL. They take the LCF
approach a step further in that all theory developments are pursued ‘deﬁnition-
ally’. New mathematical structures, such as the real numbers, may be deﬁned
only by exhibiting a model for them in the existing theories (say as Dedekind

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 60–66, 2009.

c Springer-Verlag Berlin Heidelberg 2009
HOL Light: An Overview 61

cuts of rationals). New constants may only be introduced by deﬁnitional ex-

tension (roughly speaking, merely being a shorthand for an expression in the
existing theory). This ﬁts naturally with the LCF style, since it ensures that all
extensions, whether of the deductive system or the mathematical theories, are
consistent per construction.

2 HOL Light’s Logical Foundations

HOL Light’s logic is simple type theory [1,2] with polymorphic type variables.
The terms of the logic are those of simply typed lambda calculus, with formulas
being terms of boolean type, rather than a separate category. Every term has a
single welldefined type, but each constant with polymorphic type gives rise to
an infinite family of constant terms. There are just two primitive types: bool
(boolean) and ind (individuals), and given any two types σ and τ one can form
the function type σ → τ .1
For the core HOL logic, there is essentially only one predefined logical con-
stant, equality (=) with polymorphic type α → α → bool. However to state one
of the mathematical axioms we also include another constant ε : (α → bool) →
α, explained further below. For equations, we use the conventional concrete syn-
tax s = t, but this is just surface syntax for the λ-calculus term ((=)s)t, where
juxtaposition represents function application. For equations between boolean
terms we often use s ⇔ t, but this again is just surface syntax.
The HOL Light deductive system governs the deducibility of one-sided se-
quents Γ p where p is a term of boolean type and Γ is a set (possibly empty)
of terms of boolean type. There are ten primitive rules of inference, rather similar
to those for the internal logic of a topos [14].
REFL
t=t

Γ s=t Δt=u
TRANS
Γ ∪Δs=u

Γ s=t Δu=v
MK COMB
Γ ∪ Δ s(u) = t(v)

Γ s=t
ABS
Γ (λx. s) = (λx. t)

BETA
(λx. t)x = t

ASSUME
{p} p
1
In Church’s original notation, also used by Andrews, these are written o, ι and τ σ
respectively. Of course the particular concrete syntax has no logical signiﬁcance.
62 J. Harrison

Γ p⇔q Δp
EQ MP
Γ ∪Δ q

Γ p Δq
DEDUCT ANTISYM RULE
(Γ − {q}) ∪ (Δ − {p}) p ⇔ q

Γ [x1 , . . . , xn ] p[x1 , . . . , xn ]
INST
Γ [t1 , . . . , tn ] p[t1 , . . . , tn ]

Γ [α1 , . . . , αn ] p[α1 , . . . , αn ]
INST TYPE
Γ [γ1 , . . . , γn ] p[γ1 , . . . , γn ]

In MK COMB it is necessary for the types to agree so that the composite terms
are well-typed, and in ABS it is required that the variable x not be free in any of
the assumptions Γ , while our notation for term and type instantiation assumes
capture-avoiding substitution. All the usual logical constants are deﬁned in terms
of equality. The conventional syntax ∀x. P [x] for quantiﬁers is surface syntax for
(∀)(λx. P [x]), and we also use this ‘binder’ notation for the ε operator.

=def (λp. p) = (λp. p)

∧ =def λp. λq. (λf. f p q) = (λf. f )
=⇒ =def λp. λq. p ∧ q ⇔ p
∀ =def λP. P = λx.
∃ =def λP. ∀q. (∀x. P (x) =⇒ q) =⇒ q
∨ =def λp. λq. ∀r. (p =⇒ r) =⇒ (q =⇒ r) =⇒ r
⊥ =def ∀p. p
¬ =def λp. p =⇒ ⊥
∃! =def λP. ∃P ∧ ∀x. ∀y. P x ∧ P y =⇒ x = y

These deﬁnitions allow us to derive all the usual (intuitionistic) natural de-
duction rules for the connectives in terms of the primitive rules above. All of the
core ‘logic’ is derived in this way. But then we add three mathematical axioms:

– The axiom of extensionality, in the form of an eta-conversion axiom ETA AX:

(λx. t x) = t. We could have considered this as part of the core logic rather
than a mathematical axiom; this is largely a question of taste.
– The axiom of choice SELECT AX, asserting that the Hilbert operator ε is a
choice operator: P x =⇒ P ((ε)P ). It is only from this axiom that we can
deduce that the HOL logic is classical [3].
– The axiom of inﬁnity INFINITY AX, which implies that the type ind is
inﬁnite.
HOL Light: An Overview 63

In addition, HOL Light includes two principles of definition, which allow one
to extend the set of constants and the set of types in a way guaranteed to
preserve consistency. The rule of constant definition allows one to introduce
a new constant c and an axiom c = t, subject to some conditions on free
variables and polymorphic types in t, and provided no previous definition for
c has been introduced. All the definitions of the logical connectives above are
introduced in this way. Note that this is ‘object-level’ definition: the constant
and its defining axiom exists in the object logic. Nevertheless, the definitional
principles are designed so that they always give a conservative (in particular
consistency-preserving) extension of the logic.

3 The HOL Light Implementation

Like other LCF provers, HOL Light is in essence simply a large ML program that
deﬁnes data structures to represent logical entities, together with a suite of func-
tions to manipulate them in a way guaranteeing soundness. The most important
data structures belong to one of the datatypes hol type, term and thm, which
represent types, terms (including formulas) and theorems respectively. The user
can write arbitrary programs to manipulate these objects, and it is by creating
new objects of type thm that one proves theorems. HOL’s notion of an ‘inference
rule’ is simply a function with return type thm.
In order to guarantee logical soundness, however, all these types are encapsu-
lated as abstract types. In particular, the only way of creating objects of type
thm is to apply one of the 10 very simple inference rules listed above or to make
a new term or type deﬁnition. Thus, whatever the circuitous route by which one
arrives at it, the validity of any object of type thm rests only on the correctness
of the rather simple primitive rules (and of course the correctness of OCaml’s
type checking etc.).
To illustrate how inference rules are represented as functions in OCaml, sup-
pose that two theorems of the form Γ s = t and Δ t = u have already been
proved and bound to the OCaml variables th1 and th2 respectively. In abstract
logical terms, the rule TRANS ensures that the theorem Γ ∪ Δ s = u is deriv-
able. In terms of the HOL implementation, one can apply the OCaml function
TRANS, of type thm -> thm -> thm, to these two theorems as arguments, and
hence bind name th3 to that theorem Γ ∪ Δ s = u:

let th3 = TRANS th1 th2;;

One doesn’t normally use such low-level rules much, but instead interacts with
HOL via a series of higher-level derived rules, using built-in parsers and printers
to read and write terms in a more natural syntax. For example, if one wants to
bind the name th6 to the theorem of real arithmetic that when |c − a| < e and
|b| ≤ d then |(a + b) − c| < d + e, one simply does:

let th6 = REAL_ARITH

‘abs(c - a) < e ∧ abs(b) <= d =⇒ abs((a + b) - c) < d + e‘;;
64 J. Harrison

If the purported fact in quotations turns out not to be true, then the rule
will fail by raising an exception. Similarly, any bug in the derived rule (which
represents several dozen pages of code written by the present author) would lead
to an exception.2 But we can be rather conﬁdent in the truth of any theorem
that is returned, since it must have been created via applications of primitive
rules, even though the precise choreographing of these rules is automatic and of
no concern to the user. What’s more, users can write their own special-purpose
proof rules in the same style when the standard ones seem inadequate — HOL
is fully programmable, yet retains its logical trustworthiness when extended by
ordinary users.
Among the facilities provided by HOL is the ability to organize proofs in a
mixture of forward and backward steps, which users often ﬁnd more congenial.
The user invokes so-called tactics to break down the goal into more manageable
subgoals. For example, in HOL’s inbuilt foundations of number theory, the proof
that addition of natural numbers is commutative is written as follows (the symbol
∀ means ‘for all’):

let ADD_SYM = prove

(‘∀m n. m + n = n + m‘,
INDUCT_TAC THEN
ASM_REWRITE_TAC[ADD_CLAUSES]);;

The tactic INDUCT TAC uses mathematical induction to break the original
goal down into two separate goals, one for m = 0 and one for m + 1 on the
assumption that the goal holds for m. Both of these are disposed of quickly
simply by repeated rewriting with the current assumptions and a previous, even
more elementary, theorem about the addition operator. The identifier THEN is
a so-called tactical, i.e. a function that takes two tactics and produces another
tactic, which applies the first tactic then applies the second to any resulting
subgoals (there are two in this case).
For another example, we can prove that there is a unique x such that x =
f (g(x)) if and only if there is a unique y with y = g(f (y)) using a single stan-
dard tactic MESON TAC, which performs model elimination [15] to prove theorems
about first order logic with equality. As usual, the actual proof under the surface
happens by the standard primitive inference rules.

let WISHNU = prove

(‘(∃!x. x = f (g x)) ⇔ (∃!y. y = g(f y))‘,
MESON_TAC[]);;

These and similar higher-level rules certainly make the construction of proofs
manageable whereas it would be almost unbearable in terms of the primitive rules
alone. Nevertheless, we want to dispel any false impression given by the simple ex-
amples above: proofs often require long and complicated sequences of rules. The
2
Or possibly to a true but diﬀerent theorem being returned, but this is easily guarded
against by inserting sanity checks in the rules.
HOL Light: An Overview 65

construction of these proofs often requires considerable persistence. Moreover, the

resulting proof scripts can be quite hard to read, and in some cases hard to mod-
ify to prove a slightly diﬀerent theorem. One source of these diﬃculties is that the
proof scripts are highly procedural — they are, ultimately, OCaml programs, al-
beit of a fairly stylized form. There are arguments in favour of a more declarative
style for proof scripts, but the procedural approach has its merits too, particularly
in applications using specialized derived inference rules [9].

4 HOL Light Applications

Over the years, HOL Light has been used for a wide range of applications, and in
concert with this its library of pre-proved formalized mathematics and its stock
of more powerful derived inference rules have both been expanded. As well as
the usual battery of automated techniques like first-order reasoning and linear
arithmetic, HOL Light has been used to explore and apply unusual and novel
decision procedures [12,17].
In verification, HOL Light has been used at Intel to verify a number of com-
plex floating-point algorithms including division, square root and transcendental
functions [11]. HOL Light seems well-suited to applications like this. It has a
substantial library of formalized real analysis, which is used incessantly when
justifying the correctness of such algorithms. The flexibility and programmabil-
ity that the LCF approach affords are also important here since one can write
custom derived rules for special tasks like accumulating bounds on rounding
errors or enumerating the solutions to Diophantine equations of special kinds.
As for the formalization of mathematics, HOL Light has from the very be-
ginning had a useful formalization of real analysis [10]. More recently this has
been substantially developed to cover multivariate analysis in Euclidean space
and complex analysis. As well as the miscellany of theorems noted in the list
at http://www.cs.ru.nl/~freek/100/, HOL Light has been used to formalize
some particularly significant results such as the Jordan Curve Theorem [8] and
the Prime Number Theorem [13]. HOL Light is also heavily used in the Fly-
speck Project [7] to formalize the proof of the Kepler sphere-packing conjecture,
possibly the most ambitious formalization project to date.

References
1. Andrews, P.B.: An Introduction to Mathematical Logic and Type Theory: To Truth
Through Proof. Academic Press, London (1986)
2. Church, A.: A formulation of the Simple Theory of Types. Journal of Symbolic
Logic 5, 56–68 (1940)
3. Diaconescu, R.: Axiom of choice and complementation. Proceedings of the Ameri-
can Mathematical Society 51, 176–178 (1975)
4. Gordon, M.J.C.: Representing a logic in the LCF metalanguage. In: Néel, D. (ed.)
Tools and notions for program construction: an advanced course, pp. 163–185.
Cambridge University Press, Cambridge (1982)
66 J. Harrison

5. Gordon, M.J.C., Melham, T.F.: Introduction to HOL: a theorem proving environ-

ment for higher order logic. Cambridge University Press, Cambridge (1993)
6. Gordon, M.J.C., Milner, R., Wadsworth, C.P.: Edinburgh LCF. LNCS, vol. 78.
Springer, Heidelberg (1979)
7. Hales, T.C.: Introduction to the Flyspeck project. In: Coquand, T., Lombardi, H.,
Roy, M.-F. (eds.) Mathematics, Algorithms, Proofs. Dagstuhl Seminar Proceed-
ings, vol. 05021. Internationales Begegnungs- und Forschungszentrum fuer Infor-
matik (IBFI), Schloss Dagstuhl, Germany (2006)
8. Hales, T.C.: The Jordan curve theorem, formally and informally. The American
Mathematical Monthly 114, 882–894 (2007)
9. Harrison, J.: Proof style. In: Giménez, E., Paulin-Mohring, C. (eds.) TYPES 1996.
LNCS, vol. 1512, pp. 154–172. Springer, Heidelberg (1998)
10. Harrison, J.: Theorem Proving with the Real Numbers. Springer, Heidelberg
(1998); Revised version of author’s PhD thesis
11. Harrison, J.: Floating-point veriﬁcation using theorem proving. In: Bernardo, M.,
Cimatti, A. (eds.) SFM 2006. LNCS, vol. 3965, pp. 211–242. Springer, Heidelberg
(2006)
12. Harrison, J.: Verifying nonlinear real formulas via sums of squares. In: Schnei-
der, K., Brandt, J. (eds.) TPHOLs 2007. LNCS, vol. 4732, pp. 102–118. Springer,
Heidelberg (2007)
13. Harrison, J.: Formalizing an analytic proof of the Prime Number Theorem (dedi-
cated to Mike Gordon on the occasion of his 60th birthday). Journal of Automated
Reasoning (to appear, 2009)
14. Lambek, J., Scott, P.J.: Introduction to higher order categorical logic. Cambridge
studies in advanced mathematics, vol. 7. Cambridge University Press, Cambridge
(1986)
15. Loveland, D.W.: Mechanical theorem-proving by model elimination. Journal of the
ACM 15, 236–251 (1968)
16. Scott, D.: A type-theoretical alternative to ISWIM, CUCH, OWHY. Theoretical
Computer Science 121, 411–440 (1993); Annotated version of a 1969 manuscript
17. Solovay, R.M., Arthan, R., Harrison, J.: Some new results on decidability for el-
ementary algebra and geometry. ArXiV preprint 0904.3482 (2009); submitted to
Annals of Pure and Applied Logic,
http://arxiv.org/PS_cache/arxiv/pdf/0904/0904.3482v1.pdf
A Brief Overview of Mizar

Adam Naumowicz and Artur Kornilowicz

Institute of Informatics
University of Bialystok, Poland
{adamn,arturk}@math.uwb.edu.pl

Abstract. Mizar is the name of a formal language derived from in-

formal mathematics and computer software that enables proof-checking
of texts written in that language. The system has been actively devel-
oped since 1970s, growing into a popular proof assistant accompanied
with a huge repository of formalized mathematical knowledge. In this
short overview, we give an outline of the key features of the Mizar lan-
guage, the ideas and theory behind the system, its main applications,
and current development.

1 Introduction

The original goal of Mizar [8], as conceived by its inventor, Andrzej Trybulec in
the early 1970s, was to construct a formal language close to the mathematical jar-
gon used in publications, but at the same time simple enough to enable computer-
ized processing, in particular verification of full logical correctness. The historical
description of the first 30 years of Mizar presented in [7] outlines the evolution
of the project, from its relatively modest initial implementations constrained by
the capabilities of computers available at that time, to the current proof assistant
system successfully used for practical formalization of mathematics.
In late 1980s Mizar developers started to systematically collect formaliza-
tions, which gave rise to the Mizar Mathematical Library - MML. Since then
the development of MML has been the central activity in the Mizar project,
as it has been believed that only substantial experience may help in improv-
ing the system. When in 1993 there emerged the QED initiative [12] to devise a
computer-based database of all mathematical knowledge, strictly formalized and
with all proofs having been checked automatically, Mizar was ready to actively
implement that ideology. Although the QED project has not been continued, to
some extent the development of Mizar is still driven in the spirit of its main
goals.
Nowadays, when it has been demonstrated by Mizar and numerous other sys-
tems that the computer mechanization of mathematics can be done in practice,
the important field for research is how to do it in a relatively easy and com-
fortable way. Therefore useful constructs that occur in informal mathematics
are still being incorporated into the linguistic layer to extend the expressiveness
of the Mizar language, and at the same time the efforts of Mizar developers

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 67–72, 2009.

c Springer-Verlag Berlin Heidelberg 2009
68 A. Naumowicz and A. Kornilowicz

concentrate on strengthening the computational power and providing more au-

tomation on the side of the verifying software. Both directions in the development
are intended to support and intensify further growth of MML.

2 The Mizar Language

The idea of the Mizar language being as close as possible to the language used
in mathematical papers and being automatically verifiable is achieved by se-
lecting a set of English words and phrases which occur most often in informal
mathematics. In fact, Mizar is intended to be close to the mathematical vernac-
ular on the semantic level even more than on the level of the actual grammar.
Therefore the syntax of Mizar is much simplified compared to the natural lan-
guage, stylistic variants are not distinguished and instead of English words in
some cases their abbreviations are used.
The Mizar language includes the standard set of first order logical connec-
tives and quantifiers for forming formulas and also provides means for using free
second order variables for forming schemes of theorems (infinite families of theo-
rems, e.g. the induction scheme). The rest of Mizar syntactic constructs is used
for writing proofs and defining new mathematical objects.
By its design, Mizar supports writing proofs in a declarative way (i.e mostly
forward reasoning), resembling the standard mathematical practice. The proofs
written in Mizar are constructed according to the rules of the Jaśkowski style of
natural deduction [6], or similar systems developed independently by F.B. Fitch
[4] or K. Ono [11]. It is this part of the Mizar language that has had the biggest
influence on other systems and became the inspiration to develop similar proof
layers on top of several procedural systems. To name the most important ones,
there was the system Declare by D. Syme [13], the Mizar mode for HOL by J.
Harrison [5], the Isar language for Isabelle by M. Wenzel [16], Mizar-light for
HOL-light by F. Wiedijk [18] and most recently the declarative proof language
(DPL) for Coq by P. Corbineau [3]. The Mizar way of writing proofs was also
the model for the notion of ‘formal proof sketches’ developed by F. Wiedijk [17].
Following the mathematical practice, Mizar offers a number of definitional fa-
cilities to enable introducing notions of various linguistic categories. Each Mizar
definition defines a new constructor later used in syntactic constructions, and
gives its syntax and meaning. In Mizar terminology, predicates are constructors
of (atomic) formulas, modes are constructors of types, functors are constructors
of terms, and attributes are constructors of adjectives. The syntactic format of
a constructor specifies the symbol of the constructor and the place and number
of arguments. The format of a constructor together with information about the
types of arguments is called a pattern. The formats are used for parsing and the
patterns for identifying constructors.
A constructor may be represented by different patterns as synonyms and
antonyms are allowed. The language allows to define prefix, postfix, infix, and
also circumfix (for various kinds of brackets) operators. Moreover, Mizar sup-
ports operator overloading to enable using the same ‘natural’ symbols with a
different meaning in different contexts.
A Brief Overview of Mizar 69

3 The Mizar Proof-Checker

The checker of Mizar is a disprover based on classical logic. An inference of the
1 k 1 k
form α ,...,α
β is transformed to α ,...,α
⊥
,¬β
. A disjunctive normal form (DNF) of
the premises is then created and the system tries to refute it

([¬]α1,1 ∧ . . . ∧ [¬]α1,k1 ) ∨ . . . ∨ ([¬]αn,1 ∧ . . . ∧ [¬]αn,kn )

⊥
where αi,j are atomic or universal sentences (negated or not) - for the inference
to be accepted, all disjuncts must be refuted.
Internally, all Mizar formulas are expressed in a simplified “canonical” form
(semantic correlates) using only the verum constant, negation, conjunction and
universal quantifier together with atomic formulas. Thanks to that all inferences
valid on the grounds of classical propositional reasoning are automatically ac-
cepted by the checker.
The checker is not based on a set of inference rules, but rather on M. Davis’s
concept of “obviousness w.r.t an algorithm”. Its deductive power is still be-
ing strengthened by adding new computation mechanisms [10]. The algorithm
includes processing of a hierarchy of dependent types ordered by a widening
relation and extended by adjectives as type modifiers. Mizar also supports a
notion of polymorphic structure types to facilitate abstract developments in the
Bourbaki style.
The Mizar proof-checking software and a suite of additional utilities can be
freely downloaded from the project’s web site in a pre-compiled form for most
popular hardware and OS combinations. Currently the supported platforms in-
clude Intel-based Linux, Solaris, FreeBSD, Darwin/Mac OS X and Microsoft
Windows, and also Darwin/Mac OS X and Linux on PowerPC. There are also
test releases available for palmtops running Linux on ARM processors. Techni-
cally, Mizar processes its input files as a batch-like compiler. Its output contains
marks of unaccepted fragments of the source text, so the user may proceed filling
the gaps in reasoning until no errors are flagged. Essentially the system can be
run from the command line, but the preferred method for interactive use is Josef
Urban’s Emacs-lisp Mizar mode that provides a fully functional interface to the
system.

4 Mizar Mathematical Library

When the systematic collecting of Mizar formalizations started around 1989,
there were plans to build several libraries with diﬀerent axiomatics, e.g. based
on various set theories or on the Peano arithmetic. However, when it became
apparent that simultaneous maintaining several such libraries was a non-trivial
task, the decision was made to support only one centralized repository, which
has evolved into today’s MML.
Since then all Mizar developments have been created in a steady fashion on
top of the chosen axiomatics and previously formalized data. MML is today
70 A. Naumowicz and A. Kornilowicz

a collection of interrelated texts (articles) fully checked for correctness by the

Mizar checker, based on the axioms of the Tarski-Grothendieck set theory, which
basically is the Zermelo-Fraenkel set theory extended with Tarski’s axiom that
provides the existence of arbitrarily large strongly inaccessible cardinals [14].
The most recent distribution of MML (version 4.117.1046) includes 1047 ar-
ticles written by 219 authors and comprises 48199 theorems, 9262 definitions
(using 7001 different symbols), 757 schemes and 8573 registrations (statements
about relations between adjectives that can be processed automatically). Of
course the facts collected as ‘theorems’ vary in their importance. From a point
of view of a mathematician, most of them are rather simple lemmas. However,
several fields of mathematics are relatively well-developed and significant math-
ematical results have been formalized and included in MML. For example, the
library contains many proofs of advanced topological theorems, like the Jordan
Curve Theorem, the Brouwer Fixed Point Theorem, Urysohn’s Lemma, the Ti-
chonov Theorem, or the Tietze Extension Theorem, and also such fundamental
statements in other domains like the Gödel Completeness Theorem, the Funda-
mental Theorem of Arithmetic, the Fundamental Theorem of Algebra, or Sylow’s
Theorems to name just a few.
MML is subject to continuous revisions performed by the Library Commit-
tee or Development Committee – two agendas of the Association of Mizar Users
working on maintaining and optimizing the contents of the repository. The revi-
sions are most often yielded by the strengthening of the Mizar checker, reformu-
lations of statements using new syntactic constructs, elimination of repetitions
or ‘weaker’ statements, better solutions or improved ways of formalization, en-
hancement of proofs, or reorganization of items among articles. Whenever an
article or a sequence of articles is revised, the rest of articles must always be
checked, and improved if necessary, to keep the whole repository coherent.

5 Main Applications of Mizar

Apart from the long-term goal of developing MML into a database for math-
ematics, the most important applications of Mizar today are playing the role
of a proof assistant to support creating rigorous mathematics, in mathematics
education and in software and hardware veriﬁcation.
To facilitate the whole process of writing formal mathematics, several exter-
nal systems have been developed that complement the Mizar proof checker. For
example, eﬀective semantic-based information retrieval, i.e., searching, browsing
and presentation of MML can be done with the MML Query system developed
by G. Bancerek [1]. Several sites provide an on-line Mizar processor, writing
proofs may also be assisted by the systems MoMM (a matching and interreduc-
tion tool) and the Mizar Proof Advisor developed by J. Urban. The contents
of MML as well as newly created documents can be presented in various user-
friendly formats, including a semantically-linked XML-based web pages [15] or
an automatically generated translation into English in the form of an electronic
and printed journal, Formalized Mathematics.
A Brief Overview of Mizar 71

For several decades Mizar has been used for educational purposes on various
levels: from secondary school to doctoral studies. Usually the teaching was orga-
nized as Mizar-aided courses, most typically on introduction to logic, topology,
lattice theory, general and universal algebra, category theory, etc. Recent appli-
cations in regular university-level courses being part of the obligatory curriculum
for CS students at the University of Bialystok are presented in [2,9].
Mizar has been used to deﬁne mathematical models of computers and prove
properties of their programs. One approach which is well-developed in MML
is based on the theory of random access Turing machines. There are also other
formalized attempts to model and analyze standalone algorithms. Numerous
MML articles are also devoted to the construction and analysis of gates and
digital circuits.

6 Current Development
Despite its origins and initial implementations in 1970s, Mizar is still being
actively developed. The development concerns both the language and the proof-
checking software. The evolution of the Mizar language goes into the direction
of best possible expressiveness, and still new useful language constructs are iden-
tified in mathematical texts and transformed into the formal setting of Mizar.
Much work in this area has been concentrated on the processing of attributes,
which in the most recent implementation can be expressed with their own visi-
ble arguments (e.g. n-dimensional, X-valued, etc.) in much the same way types
have been constructed. As the Mizar type checking mechanism uses quite pow-
erful automation techniques based on adjectives, the change makes it possible
to formalize many concepts in a more natural and, what is maybe even more
important, automatic way.
The capabilities of the proof-checker has recently been strengthened by pro-
viding means for a more complete adjective processing and the use of global
choice (selecting unique representatives of types) to enable eliminating the so
called ‘permissive’ definitions. The system has also been equipped with an effi-
cient method of identifying semantically equivalent operations defined in differ-
ent contexts, e.g the addition of numbers and the corresponding operation in the
field of real numbers. The system has also been extended with more powerful
automation of numerical computations.
Between the planned and currently considered future enhancements there are
several forms of ellipsis (the ubiquitous ‘...’ notation) and a syntactic extension
to support binding operators like the sum, product or integral.

7 Miscellanea
More information on Mizar can be found on the project’s web page [8] or its
several mirrors. The site contains information on the Mizar language (e.g. the
formal syntax, available manuals and other bibliographic links) and provides
downloading of the system and its library. There are also pointers to other
72 A. Naumowicz and A. Kornilowicz

Mizar-related facilities, e.g. Mizar-Forum (a general-purpose Mizar mailing

list), Mizar User Service (an e-mail-based troubleshooting helpdesk), Mizar
TWiki (a collaboration platform) or the Association of Mizar Users (an interna-
tional organization of active users and developers).

References
1. Bancerek, G., Rudnicki, P.: Information retrieval in MML. In: Asperti, A., Buch-
berger, B., Davenport, J.H. (eds.) MKM 2003. LNCS, vol. 2594, pp. 119–132.
Springer, Heidelberg (2003)
2. Borak, E., Zalewska, A.: Mizar course in logic and set theory. In: Kauers, M.,
Kerber, M., Miner, R., Windsteiger, W. (eds.) MKM/CALCULEMUS 2007. LNCS,
vol. 4573, pp. 191–204. Springer, Heidelberg (2007)
3. Corbineau, P.: A declarative language for the Coq proof assistant. In: Miculan,
M., Scagnetto, I., Honsell, F. (eds.) TYPES 2007. LNCS, vol. 4941, pp. 69–84.
Springer, Heidelberg (2008)
4. Fitch, F.B.: Symbolic Logic. An Introduction. The Ronald Press Company (1952)
5. Harrison, J.: A Mizar Mode for HOL. In: von Wright, J., Harrison, J., Grundy, J.
(eds.) TPHOLs 1996. LNCS, vol. 1125, pp. 203–220. Springer, Heidelberg (1996)
6. Jaśkowski, S.: On the rules of supposition in formal logic. Studia Logica 1 (1934)
7. Matuszewski, R., Rudnicki, P.: Mizar: the ﬁrst 30 years. Mechanized Mathematics
and Its Applications 4(1), 3–24 (2005)
8. Mizar home page: http://mizar.org
9. Naumowicz, A.: Teaching How to Write a Proof. In: Formed 2008: Formal Methods
in Computer Science Education, pp. 91–100 (2008)
10. Naumowicz, A., Byliński, C.: Improving Mizar texts with properties and require-
ments. In: Asperti, A., Bancerek, G., Trybulec, A. (eds.) MKM 2004. LNCS,
vol. 3119, pp. 290–301. Springer, Heidelberg (2004)
11. Ono, K.: On a practical way of describing formal deductions. Nagoya Mathematical
Journal 21 (1962)
12. QED Manifesto: http://www.rbjones.com/rbjpub/logic/qedres00.htm
13. Syme, D.: Three tactic theorem proving. In: Bertot, Y., Dowek, G., Hirschowitz,
A., Paulin, C., Théry, L. (eds.) TPHOLs 1999. LNCS, vol. 1690, pp. 203–220.
Springer, Heidelberg (1999)
14. Trybulec, A.: Tarski Grothendieck set theory. Formalized Mathematics 1(1), 9–11
(1990)
15. Urban, J.: XML-izing Mizar: Making Semantic Processing and Presentation of
MML Easy. In: Kohlhase, M. (ed.) MKM 2005. LNCS, vol. 3863, pp. 346–360.
Springer, Heidelberg (2006)
16. Wenzel, M., Wiedijk, F.: A comparison of Mizar and Isar. Journal of Automated
Reasoning 29(3-4), 389–411 (2002)
17. Wiedijk, F.: Formal Proof Sketches. In: Berardi, S., Coppo, M., Damiani, F. (eds.)
TYPES 2003. LNCS, vol. 3085, pp. 378–393. Springer, Heidelberg (2004)
18. Wiedijk, F.: Mizar Light for HOL Light. In: Boulton, R.J., Jackson, P.B. (eds.)
TPHOLs 2001. LNCS, vol. 2152, pp. 378–393. Springer, Heidelberg (2001)
A Brief Overview of Agda –
A Functional Language with Dependent Types

Ana Bove, Peter Dybjer, and Ulf Norell

Chalmers University of Technology, Gothenburg, Sweden

{bove,peterd,ulfn}@chalmers.se

Abstract. We give an overview of Agda, the latest in a series of depen-

dently typed programming languages developed in Gothenburg. Agda
is based on Martin-Löf’s intuitionistic type theory but extends it with
numerous programming language features. It supports a wide range of
inductive data types, including inductive families and inductive-recursive
types, with associated ﬂexible pattern-matching. Unlike other proof as-
sistants, Agda is not tactic-based. Instead it has an Emacs-based in-
terface which allows programming by gradual reﬁnement of incomplete
type-correct terms.

1 Introduction
A dependently typed programming language and proof assistant. Agda is a func-
tional programming language with dependent types. It is an extension of Martin-
Löf’s intuitionistic type theory [12,13] with numerous features which are useful
for practical programming. Agda is also a proof assistant. By the Curry-Howard
identification, we can represent logical propositions by types. A proposition is
proved by writing a program of the corresponding type. However, Agda is pri-
marily being developed as a programming language and not as a proof assistant.
Agda is the latest in a series of implementations of intensional type theory
which have been developed in Gothenburg (beginning with the ALF-system)
since 1990. The current version (Agda 2) has been designed and implemented by
Ulf Norell and is a complete redesign of the original Agda system. Like its prede-
cessors, the current Agda supports a wide range of inductive data types, pattern
matching, termination checking, and comes with an interface for programming
and proving by direct manipulation of proof terms. On the other hand, the new
Agda goes beyond the earlier systems in several respects: flexibility of pattern-
matching, more powerful module system, flexible and attractive concrete syntax
(using unicode), etc.
A system for functional programmers. A programmer familiar with a standard
functional language such as Haskell or OCaml will find it easy to get started
with Agda. Like in ordinary functional languages, programming (and proving)
consists of defining data types and recursive functions. Moreover, users familiar
with Haskell’s generalised algebraic data types (GADTs) will find it easy to use
Agda’s inductive families [5].

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 73–78, 2009.

c Springer-Verlag Berlin Heidelberg 2009
74 A. Bove, P. Dybjer, and U. Norell

The Agda wiki. More information about Agda can be found on the Agda wiki
[1]. There are tutorials [3,15], a guide to editing, type checking, and compiling
Agda code, a link to the standard library, and much else. There is also a link to
Norell’s PhD thesis [14] with a language deﬁnition and detailed discussions of
the features of Agda.

2 Agda Features

We begin by listing the logically signiﬁcant parts of Agda.

Logical framework. The core of Agda is Martin-Löf’s logical framework [13]

which gives us the type Set and dependent function types (x : A) → B (us-
ing Agda’s syntax). Agda’s logical framework also provides record types and a
countable sequence of larger universes Set = Set0 , Set1 , Set2 , . . ..

Data type deﬁnitions. Agda supports a rich family of strictly positive inductive
and inductive-recursive data types and families. Agda checks that the data type
deﬁnitions are well-formed according to a discipline similar to that in [6,7].

Recursive function definitions. One of Agda’s main features is its flexible pattern
matching for inductive families. A coverage checker makes sure the patterns cover
all possible cases. As in Martin-Löf type theory, all functions definable in Agda
must terminate, which is ensured by the termination checker.

Codata. The current version of Agda also provides coinductive data types. This
feature is however somewhat experimental and not yet stable.

Agda also provides several features to make it useful in practice:

Concrete syntax. The concrete syntax of Agda is much inspired by Haskell, but
also contains a few distinctive features such as mixﬁx operators and full support
for unicode identiﬁers and keywords.

Implicit arguments. The mechanism for implicit arguments allows the omission
of parts of the programs that can be inferred by the typechecker.

Module system. Agda’s module system supports separate compilation and allows
parametrised modules. Together with Agda’s record types, the module system
provides a powerful mechanism for structuring larger developments.

Compilation. There is a simple compiler that compiles Agda programs via

Haskell and allows Haskell functions to be called from within Agda.

Emacs interface. Using Agda’s Emacs interface, programs can be developed

incrementally, leaving parts of the program unfinished. By type checking the
unfinished program, the programmer can get useful information on how to fill
in the missing parts. The Emacs interface also provides syntax highlighting and
code navigation facilities.
A Brief Overview of Agda – A Functional Language with Dependent Types 75

3 Agda and Some Related Languages

Agda and Martin-Löf type theory. Agda is an extension of Martin-Löf’s type
theory. An implementation of the latter in Agda can be found on the Agda wiki
[1]. Meaning explanations of foundational interest for type theory have been
provided by Martin-Löf [11,12], and all constructions in Agda (except codata)
are intended to satisfy them. Agda is thus a predicative theory.
Agda and Coq. The most well-known system with dependent types which is based
on the Curry-Howard identification is Coq [2]. Coq is an implementation of the
Calculus of Inductive Constructions, an extension of the Calculus of Construc-
tions [4] with inductive (but not inductive-recursive) types. Unlike Agda, Coq
has an impredicative universe Prop. Moreover, for the purpose of program ex-
traction, there is a distinction between Prop and Set in Coq which is not present
in Agda. There are many other differences between Agda and Coq. For example,
Agda’s pattern matching for inductive families is more flexible than Coq’s. On
the other hand, Coq supports tactical theorem proving in the tradition of LCF
[10], but Agda does not.
Agda and Haskell. Haskell has GADTs, a feature which mimics inductive families
by representing them by type-indexed types. A fundamental difference is that
Haskell allows partial general recursive functions and non-strictly positive data
types. Hence, logic cannot be obtained by the Curry-Howard correspondence.
Other languages with dependent types. There are nowadays a number of func-
tional languages with dependent types (some with and some without general
recursion). Among these McBride’s Epigram [8] is closest in spirit to Agda.

4 Example: Equational Proofs in Commutative Monoids

We will now show some of the code for a module which decides equality in
commutative monoids. This is an example of reflection, a technique which makes
it possible to program and use efficient verified decision procedure inside the
system. Reflection was for example used extensively by Gonthier in his proof of
the four colour theorem [9].
An example of a commutative monoid is the natural numbers with addition.
Thus our decision procedure can automatically prove arithmetic equations such
as
∀ n m → (n + m) + n ≡ m + (n + n).

The above is a valid type in Agda syntax. To prove it in Agda we create a ﬁle
Example.agda with the following content:
module Example where

open import Data.Nat

open import Relation.Binary.PropositionalEquality
76 A. Bove, P. Dybjer, and U. Norell

prf : ∀ n m → (n + m) + n ≡ m + (n + n)
prf n m = ?

Natural numbers and propositional equality are imported from the standard
library and opened to make their content available. Finally, we declare a proof
object prf, the type of which represents the proposition to be proved; here
∀ x → B is an abbreviation of (x : A) → B which does not explicitly mention
the argument type. The final line is the incomplete definition of prf: it is a
function of two arguments, but we do not yet know how to build a proof of the
equation so we leave a “?” in the right hand side. The “?” is a placeholder that
can be stepwise refined to obtain a complete proof.
In this way we can manually build a proof of the equation from associativity
and commutativity of +, and basic properties of equality which can be found in
the standard library. Manual equational reasoning however can become tedious
for complex equations. We shall therefore write a general procedure for equa-
tional reasoning in commutative monoids, and show how to use it for proving
the equation above.
Decision procedure for commutative monoids. First we define monoid expressions
as an inductive family indexed by the number of variables:
data Expr n : Set where
var : Fin n → Expr n
_⊕_ : Expr n → Expr n → Expr n
zero : Expr n

Fin n is a finite set with n elements; there are at most n variables. Note that
infix (and mixfix) operators are declared by using underscores to indicate where
the arguments should go.
To decide whether two monoid expressions are equal we normalise them and
compare the results. The normalisation function is
norm : ∀ {n} → Expr n → Expr n

Note that the first argument (the number of variables) is enclosed in braces,
which signifies that it is implicit. To define this function we employ normali-
sation by evaluation, that is, we first interpret the expressions in a domain of
“values”, and then reify these values into normal expressions. Below, we omit
the definitions of eval and reify and give only their types:
norm = reify ◦ eval

eval : ∀ {n} → Expr n → NF n

reify : ∀ {n} → NF n → Expr n

The values in NF n are vectors recording the number of occurrences of each

variable:
NF : → Set
NF n = Vec n
A Brief Overview of Agda – A Functional Language with Dependent Types 77

Next we deﬁne the type of equations between monoid expressions:

data Eqn n : Set where
_==_ : Expr n → Expr n → Eqn n

We can deﬁne our arithmetic equation above as follows:

eqn1 : Eqn 2
eqn1 = build 2 λ a b → a ⊕ b ⊕ a == b ⊕ (a ⊕ a)

where we have used an auxiliary function build which builds an equation in Eqn
n from an n-place curried function by applying it to variables.
Equations will be proved by normalising both sides:
simpl : ∀ {n} → Eqn n → Eqn n
simpl (e1 == e2 ) = norm e1 == norm e2

We are now ready to deﬁne a general decision procedure for arbitrary commu-
tative monoids (the complete deﬁnition is given later):
prove : ∀ {n} (eqn : Eqn n) ρ → Prf (simpl eqn) ρ → Prf eqn ρ

The function takes an equation and an environment in which to interpret it, and
builds a proof of the equation given a proof of its normal form. The deﬁnition
of Prf will be given below.
We can instantiate this procedure to the commutative monoid of natural
numbers and apply it to our equation, an environment with the two variables,
and a proof of the normalised equation. Since the two sides of the equation will
be equal after normalisation we prove it by reﬂexivity:
prf : ∀ n m → (n + m) + n ≡ m + (n + n)
prf n m = prove eqn1 (n :: m :: []) ≡-refl

The prove function is deﬁned in a module Semantics which is parametrised

over an arbitrary commutative monoid

module Semantics (CM : CommutativeMonoid) where

open CommutativeMonoid CM renaming (carrier to C)

Opening the CommutativeMonoid module brings into scope the carrier C with
its equality relation _≈_ and the monoid operations _•_ and ε. A monoid ex-
pression is interpreted as a function from an environment containing values for
the variables to an element of C.

Env : → Set
Env n = Vec C n

_ : ∀ {n} → Expr n → Env n → C

var i ρ = lookup i ρ
a ⊕ b ρ = a ρ • b ρ
zero ρ = ε
78 A. Bove, P. Dybjer, and U. Norell

Equations are also interpreted:

Prf : ∀ {n} → Eqn n → Env n → Set
Prf (e1 == e2 ) ρ = e1 ρ ≈ e2 ρ
One can prove that the normalisation function is sound in the sense that the
normal form is semantically equal to the original expression in any environment:
sound : ∀ {n} (e : Expr n) ρ → e ρ ≈ norm e ρ
Hence, to prove an equation it suﬃces to prove the normalised version. The proof
uses the module Relation.Binary.EqReasoning from the standard library for
the equational reasoning:
prove : ∀ {n} (eqn : Eqn n) ρ → Prf (simpl eqn) ρ → Prf eqn ρ
prove (e1 == e2 ) ρ prf =
begin e1 ρ ≈ sound e1 ρ
norm e1 ρ ≈ prf
norm e2 ρ ≈ sym (sound e2 ρ)
e2 ρ
The complete code is available on the Agda wiki [1].

References
1. Agda wiki page, http://wiki.portal.chalmers.se/agda/
2. Bertot, Y., Castéran, P.: Interactive Theorem Proving and Program Development.
In: Coq’Art: The Calculus of Inductive Constructions. Springer, Heidelberg (2004)
3. Bove, A., Dybjer, P.: Dependent types at work. In: Barbosa, L., Bove, A., Pardo,
A., Pinto, J.S. (eds.) LerNet ALFA Summer School 2008. LNCS, vol. 5520, pp.
57–99. Springer, Heidelberg (to appear, 2009)
4. Coquand, T., Huet, G.: The calculus of constructions. Information and Computa-
tion 76, 95–120 (1988)
5. Dybjer, P.: Inductive families. Formal Aspects of Computing 6, 440–465 (1994)
6. Dybjer, P.: A general formulation of simultaneous inductive-recursive deﬁnitions
in type theory. Journal of Symbolic Logic 65(2) (June 2000)
7. Dybjer, P., Setzer, A.: Indexed induction-recursion. Journal of Logic and Algebraic
Programming 66(1), 1–49 (2006)
8. Epigram homepage, http://www.e-pig.org
9. Gonthier, G.: The four colour theorem: Engineering of a formal proof. In: Kapur,
D. (ed.) ASCM 2007. LNCS, vol. 5081, p. 333. Springer, Heidelberg (2008)
10. Gordon, M., Milner, R., Wadsworth, C.: Edinburgh LCF. In: Kahn, G. (ed.) Se-
mantics of Concurrent Computation. LNCS, vol. 70. Springer, Heidelberg (1979)
11. Martin-Löf, P.: Constructive mathematics and computer programming. In: Logic,
Methodology and Philosophy of Science, VI, 1979, pp. 153–175. North-Holland,
Amsterdam (1982)
12. Martin-Löf, P.: Intuitionistic Type Theory. Bibliopolis, Napoli (1984)
13. Nordström, B., Petersson, K., Smith, J.M.: Programming in Martin-Löf’s Type
Theory. An Introduction. Oxford University Press, Oxford (1990)
14. Norell, U.: Towards a practical programming language based on dependent type
theory. PhD thesis, Chalmers University of Technology (2007)
15. Norell, U.: Dependently typed programming in Agda. In: Lecture Notes from the
Summer School in Advanced Functional Programming (2008) (to appear)
The Twelf Proof Assistant

Carsten Schürmann

IT University of Copenhagen
[email protected]

Logical framework research is based on the philosophical point of view that it

should be possible to capture mathematical concepts such as proofs, logics, and
meaning in a formal system — directly, adequately (in the sense that there are no
spurious or exotic witnesses), and without having to commit to a particular logi-
cal theory. Instead of working with one general purpose representation language,
we design special purpose logical frameworks for capturing reoccurring concepts
for special domains, such as, for example, variable renaming, substitution appli-
cation, and resource management for programming language theory. Most logi-
cal frameworks are based on constructive type theories, such as Isabelle (on the
simply-typed λ-calculus), LF [HHP93] (on the dependently typed λ-calculus),
and LLF (on a linearly typed λ-calculus). The representational strength of the
logical framework stems from the choice of deﬁnitional equality on terms. For ex-
ample, α-conversion models the tacit renaming of variables, β-contraction models
substitution application, and η-expansion guarantees the adequacy of encodings.
The Twelf system [PS99] is an implementation of the logical framework LF
and its equational theory based on αβη. It was originally released by Pfenning
and Schürmann in 1999 and supersedes the Elf system. Twelf is a proof assistant,
also called a meta-logical framework, that excels at representing deductive sys-
tems with side conditions (such as the Eigen variable condition) and judgments
with contexts that have the usual intuitionistic properties, such as weakening,
contraction, and exchange. Twelf provides a logic programming interpreter to ex-
periment with encodings and a reasoning engine to verify their meta-theoretic
properties, such as, for example, cut-elimination, semantical equivalence, type
soundness. Furthermore, it provides a module system to organize large
developments.
Since its release, the Twelf system has become a popular tool for reasoning
about the design and properties of modern programming languages and logics.
It has been used, for example, to verify the soundness of typed assembly lan-
guage [Cra03] and Standard ML [LCH07], for checking cut-elimination proofs for
intuitionistic and classical logic [Pfe95], and for specifying and validating logic
morphisms, for example, between HOL and Nuprl [SS06].
In this paper, we illustrate a few of Twelf’s features by example. In particular,
we show how to use Twelf to represent deductive systems in Section 1, how to
reason with Twelf in Section 2, and how Twelf supports the modular design of
code in Section 3.

This work is supported in part by NABITT grant 2106-07-0019 of the Danish Strate-
gic Research Council.

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 79–83, 2009.

c Springer-Verlag Berlin Heidelberg 2009
80 C. Schürmann

1 Representation

In order to use Twelf, one has to subscribe to the judgments-as-types representa-

tion paradigm, which means that one should appreciate that judgments, such as,
for example, “A ::= p | A1 ⊃ A2 | ¬A is a formula” and “A is derivable in a logic
written as A” are best represented as types or families of types. Type fami-
lies may then be declared in Twelf as constant o : type and |- : o -> type,
respectively. Consider the following Twelf signature that gives an adequate rep-
resentation of a natural deduction calculus for the fragment of propositional logic
deﬁned by implication and negation.

%sig IL = {
o : type. %name o A.
|- : o -> type. %prefix 9 |-.
=> : o -> o -> o. %infix right 10 =>.
~ : o -> o.
=>I : (|- A -> |- B) -> |- A => B.
=>E : |- A => B -> |- A -> |- B.
~I : ({p:o} |- A -> |- p) -> |- ~ A.
~E : |- ~ A -> |- A -> |- B.
n = [p:o] ~ (~ p).
nI : |- A -> |- n A = [D] ~I [p:o] [u: |- ~ A] ~E u D.
}.

Twelf is an implementation of LF type theory, and therefore certain syn-

tactical connectives are primitive. type stands for the kind type, .->. for the
non-dependent function arrow, {.}. for the dependent function arrow, and [.].
for λ-abstraction. Twelf has a powerful type inference algorithm based on higher-
order unification that can often reconstruct omitted arguments and type labels.
Working with Twelf corresponds to declaring and defining object and type level
constants interleaved by the occasional Twelf specific instruction to check meta
theoretic properties, record extra logical information, or execute queries. Exam-
ples of these instructions include %name, %prefix, and %infix that record the
user’s preference for name and fixity information. We will encounter a few more
instructions below.
Returning to the signature above, the third and fourth line declare the con-
nectives => for ⊃, and ~ for ¬. The fifth to eighth line declare the respective
standard introduction and elimination rules (whose mathematical depiction we
omit in the interest of space). We point out that every capitalized variable, such
as A or B, should be thought of as implicitly Π-quantified. Furthermore, note
how the LF function arrow is used to encode the hypothetical derivations in the
premiss of =>I and ~I. It shamelessly uses the LF context to represent hypothe-
ses. This particular technique of representation illustrates the true strength of
the logical framework. It is sound because hypotheses in our logic enjoy the
same weakening, contraction, exchange, and substitution properties as the con-
text in LF. Moreover, LF’s equational theory provides us with a free substitution
The Twelf Proof Assistant 81

principle for hypotheses. As we rest the representation on the higher-order fea-

tures of LF, this technique is also called higher-order abstract syntax.
The bottom two deﬁnitions (as opposed to declarations) in the Twelf signature
above introduce n as an abbreviation for the double negation ~~ and nI as a
derived rule of inference stating soundness.

2 Reasoning
Besides being able to reconstruct and check LF types, Twelf is designed as a proof
assistant that allows Twelf users to reason about the meta-theoretic properties
of their encodings. In Twelf we separate cleanly the logical framework LF for
representation from a meta-logic Mω for reasoning. It is well-known that in LF
every term reduces to a unique β-normal η-long form that is also called canonical
form. These forms are inductively defined and give rise to induction principles
that are built into the meta-logic Mω . These principles allow the Twelf user to
reason about LF encodings even though they might be defined using higher-order
abstract syntax.
If we restrict ourselves to the Π2 -fragment of Mω , meta proofs can thankfully
be encoded as relations in LF — with the only caveat being that we need to check
that those relations behave as total functions (when executed on the Twelf logic
programming engine). For every well-typed input, those functions must compute
well-typed outputs, which means that computation must be terminating and
may not get stuck. As an illustrative example of such a meta proof, consider
the following signature that defines the Hilbert calculus and gives a proof of the
deduction theorem.

%sig HILBERT = {
o : type. %name o A.
|- : o -> type. %prefix 9 |-.
=> : o -> o -> o. %infix right 10 =>.
K : |- A => B => A.
S : |- (A => B => C) => (A => B) => A => C.
MP : |- A => B -> |- A -> |- B.

ded :
(|- A -> |- B) -> |- A => B -> type. %mode ded +D -E.
aK :
ded ([x] K) (MP K K).
aS :
ded ([x] S) (MP K S).
aID :
ded ([x] x) (MP (MP S (K : |- A => (A => A) => A)) K).
aMP :
ded ([x] MP (D x) (E x)) (MP (MP S D’) E’)
<- ded ([x] D x) D’ <- ded ([x] E x) E’.
%worlds () (ded _ _).
%total D (ded D _).
}.

The ﬁrst six lines in this signature deﬁne the syntax of formulas, and the
Hilbert calculus for the implicational fragment of propositional logic. The
82 C. Schürmann

remainder of this signature deﬁnes the meta-proof as a relation between hypo-

thetical derivations B under the assumption A and the internalized version
A ⊃ B. The input is represented as a function from |- A -> |- B, and be-
cause every term of this type has a canonical form, there are only four cases that
one needs to consider: [x] K, [x] S, [x] x and [x] MP (D x) (E x), where
D and E are canonical as well. No matter which input of type |- A -> |- B is
given, Twelf’s operational semantic will always find a case that matches.
Twelf offers various tools to inspect the properties of this relation, first and
foremost a mode checker %mode that checks that for ground1 inputs one can
expect ground outputs, a world checker %worlds that checks that the implicit
LF context is regularly built2 and a totality checker %total that checks that the
relation is indeed a total function, i.e. terminating and all cases are covered. In
this example, the world is empty as indicated by () and the termination uses
the subterm ordering on the first argument of ded, which is named D.

3 Organization
Last but not least, Twelf offers a deceptively simple but useful module system.
The module system provides the user with the ability to manage name spaces
but does not extend LF. In fact, every Twelf development that contains module
system features can be elaborated into an equivalent and pure LF signature.
Using structures, one may embed one signature into another, and using views
one may define maps from one signature to another.
Recall the definition of intuitionistic logic from Section 1. The following Twelf
development defines classical logic as an extension of intuitionistic logic by the
law of the excluded middle.
%sig CL = {
%struct IL : IL %open o |- => =>I =>E ~ n ~I ~E nI.
exm : |- ~ (~ A) => A.
}.
%struct imports IL into CL, and the %open allows the user to refer unqualified
to the subsequent list of constant names.
Next, we give the Kolmogorov translation from classical logic into intuition-
istic logic. To get this to work, we need to think of the usual turnstyle as A as
¬¬A. This is possible by defining a view in two steps. First, we define a view
from IL to IL.
%view KOLMIL : IL -> IL = {
o := o.
|- := [x] |- n x.
=> := [x][y] (n x) => (n y).
~ := [x] ~ x.
1
A term is ground if it doesn’t contain free logic variables.
2
Recall that terms may be open when using higher-order abstract syntax.
The Twelf Proof Assistant 83

=>I := [A][B][D] nI (=>I D).

=>E := [A][B][D][E] ~I [p][u] ~E D (~I [q][v] ~E (=>E v E) u).
~I := [A][D] ~I [q][u] ~E (D (~ A) u) u.
~E := [A][C][D][E] ~I [p][u] ~E D E .
}.
For example, =>I maps to a term representing a derivation that under the as-
sumption that ¬¬A implies ¬¬B the following holds: ¬¬(A ⊃ B). Second,
we extend this view to to a view for classical logic from CL to IL.

%view KOLM : CL -> IL = {

%struct IL := KOLMIL.
exm := [A] ~I [p] [u] ~E u (=>I [u] ~I [p] [v]
~E u (~I [q][w] ~E w v)).
}.

In this view, the substructure IL is mapped to the view KOLMIL, and the law of
the excluded middle to a term representing a derivation of ¬¬(¬¬A ⊃ A).
The Twelf system and documentation can be accessed from our homepage at
http://www.twelf.org. More information about the module system is available
from http://www.twelf.org/~mod.

References
[Cra03] Crary, K.: Toward a foundational typed assembly language. In: Morrisett, G.
(ed.) Proceedings of the 30th ACM Symposium on Principles of Program-
ming Languages, New Orleans, Louisiana. SIGPLAN Notices, vol. 38(1), pp.
198–212. ACM Press, New York (2003)
[HHP93] Harper, R., Honsell, F., Plotkin, G.: A framework for deﬁning logics. Journal
of the Association for Computing Machinery 40(1), 143–184 (1993)
[LCH07] Lee, D.K., Crary, K., Harper, R.: Towards a mechanized metatheory of stan-
dard ML. In: Proceedings of the 34th Annual Symposium on Principles of
Programming Languages, pp. 173–184. ACM Press, New York (2007)
[Pfe95] Pfenning, F.: Structural cut elimination. In: Kozen, D. (ed.) Proceedings of
the Tenth Annual Symposium on Logic in Computer Science, San Diego,
California, pp. 156–166. IEEE Computer Society Press, Los Alamitos (1995)
[PS99] Pfenning, F., Schürmann, C.: System description: Twelf — a meta-logical
framework for deductive systems. In: Ganzinger, H. (ed.) CADE 1999. LNCS
(LNAI), vol. 1632, pp. 202–206. Springer, Heidelberg (1999)
[SS06] Schürmann, C., Stehr, M.O.: An executable formalization of the HOL/Nuprl
connection in the meta-logical framework Twelf. In: Hermann, M., Voronkov,
A. (eds.) LPAR 2006. LNCS, vol. 4246, pp. 150–166. Springer, Heidelberg
(2006)
Hints in Uniﬁcation

Andrea Asperti, Wilmer Ricciotti, Claudio Sacerdoti Coen, and Enrico Tassi

Department of Computer Science, University of Bologna

Mura Anteo Zamboni, 7 — 40127 Bologna, Italy
{asperti,ricciott,sacerdot,tassi}@cs.unibo.it

Abstract. Several mechanisms such as Canonical Structures [14], Type

Classes [13,16], or Pullbacks [10] have been recently introduced with the
aim to improve the power and ﬂexibility of the type inference algorithm
for interactive theorem provers. We claim that all these mechanisms are
particular instances of a simpler and more general technique, just con-
sisting in providing suitable hints to the uniﬁcation procedure underlying
type inference. This allows a simple, modular and not intrusive imple-
mentation of all the above mentioned techniques, opening at the same
time innovative and unexpected perspectives on its possible applications.

1 Introduction
Mathematical objects commonly have multiple, isomorphic representations or
can be seen at different levels of an algebraic hierarchy, according to the kind
or amount of information we wish to expose or emphasise. This richness is a
major tool in mathematics, allowing to implicitly pass from one representation
to another depending on the user needs. This operation is much more difficult
for machines, and many works have been devoted to the problem of adding
syntactic facilities to mimic the abus de notation so typical of the mathematical
language. The point is not only to free the user by the need of typing redundant
information, but to switch to a more flexible linkage model, by combining, for
instance, resolution of overloaded methods, or supporting multiple views of a
same component.
All these operations, in systems based on type theories, are traditionally per-
formed during type-inference, by a module that we call “refiner”. The refiner is
not only responsible for inferring types that have not been explicitly declared:
it must synthesise or constrain terms omitted by the user; it must adjust the
formula, for instance by inserting functions to pass from one representation to
another one; it may help the user in identifying the minimal algebraic structure
providing a meaning to the formula.
From the user point of view, the refiner is the primary source of “intelligence”
of the system: the more effective it is, the easier becomes the communication
with the system. Thus, a natural trend in the development of proof assistants
consists in constantly improving the functionalities of this component, and in
particular to move towards a tighter integration between the refiner and the
modules in charge of proof automation.

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 84–98, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Hints in Uniﬁcation 85

Among the mechanisms which have been recently introduced in the litera-
ture with the aim to improve the power and flexibility of the refiner, we recall,
in Section 2, Canonical Structures [14], Type Classes [13], and Pullbacks [10].
Our claim is that all these mechanisms are particular instances of a simpler and
more general technique presented in Section 3, just consisting in providing suit-
able hints to the unification procedure underlying the type inference algorithm.
This simple observation paves the way to a light, modular and not intrusive
implementation of all the above mentioned techniques, and looks suitable to
interesting generalisations as discussed in Section 4.
In the rest of the paper we shall use the notation ≡ to express the type
equivalence relation of the given calculus. A unification problem will be expressed
?
as A ≡ B, resulting in a substitution σ such that Aσ ≡ Bσ. Metavariables will
be denoted with ?i , and substitutions are described as lists of assignments of the
form ?i := t.

2 Type Inference Heuristics

In this section, we recall some heuristics for type reﬁnement already described
in the literature and implemented in interactive provers like Coq, Isabelle and
Matita.

2.1 Canonical Structures

A canonical structure is declaration of a particular instance of a record to be

used by the type checker to solve uniﬁcation problems. For example, consider
the record type of groups, and its particular instance over integers (Z).

Z : Group := {gcarr := Z; gunit := 0; gop := Zplus; . . .}

The user inputs the following formula, where 0 is of type Z.

0+x= x (1)
Suppose that the notation (x + y) is associated with gop ? x y where gop is the
projection of the group operation with type:

gop : ∀g : Group.gcarr g → gcarr g → gcarr g

and gcarr is of type Group → T ype (i.e. the projection of the group carrier).
After notation expansion equation (1) becomes

gop ?1 0 x = x
where ?1 is a metavariable. For (1) to be well typed the arguments of gop have
to be of type gcarr g for some group g. In particular, the ﬁrst user provided
argument 0 is of type Z, generating the following uniﬁcation problem:
86 A. Asperti et al.

?
gcarr ?1 ≡ Z
If the user declared Z as the canonical group structure over Z, the system finds
the solution ?1 := Z. This heuristic is triggered only when the unification prob-
lem involves a record projection πi applied to a metavariable versus a constant
c. Canonical structures S := {c1 ; . . . ; cn } can be easily indexed using as keys all
the pairs of the form πi , ci .
This device was introduced by A.Saibi in the Coq system [14] and is exten-
sively used in the formalisation of finite group theory by Gonthier et al. [2,6].

2.2 Type Classes

Type classes were introduced in the context of programming languages to prop-
erly handle symbol overloading in [7,15], and they have been later adopted in
interactive provers [13,16].
In a programming language with explicit polymorphism, dispatching an over-
loaded method amounts to suitably instantiate a type variable. This generalises
canonical structures exploiting a Prolog-like mechanism to search for type class
instances.
For instance we show how to deﬁne the group theoretical construction of the
Cartesian product using a simpliﬁcation of the Coq syntax.

Class Group (A : Type) := { unit : A; gop : A → A → A; . . .}
Instance Z : Group Z := { unit := 0; gop := Zplus; . . .}
Instance × (A,B: Type) (G: Group A) (H: Group B) : Group (A × B) := {
unit := unit G, unit H;
gop x1,x2 y1,y2 := gop G x1 y1, gop H x2 y2;
...
}

With this device a slightly more complicated formula than (1) can be accepted
by the system, such as:
0, 0 + x = x
Unfolding the + notation we obtain
gop ?1 ?2 0, 0 x = x
where the type of gop and the type of ?2 are:
gop : ∀T : Type.∀g : Group T.T → T → T
?2 : Group ?1
After ?1 is instantiated with Z × Z proof automation is used to inhabit ?2 whose
type has become Group (Z × Z). Automation is limited to a Prolog-like search
whose clauses are the user declared instances. Notice that the user has not deﬁned
a type class instance (i.e. a canonical structure) over the group Z × Z.
Hints in Uniﬁcation 87

2.3 Coercions Pullback

The coercions pullback device was introduced as part of the manifesting coercions
technique by Sacerdoti Coen and Tassi in [10] to ease the encoding of algebraic
structures in type theory (see [11] for a formalisation explicating that technique).
This devices comes to play in a setting with a hierarchy of structures, some
of which are built combining together simpler structures. The carrier projection
is very frequently declared as a coercion [8], allowing the user to type formulas
like ∀g : Group.∀x : g.P (x) omitting to apply gcarr to g (i.e. the system is able
to insert the application of coercions when needed [12]).

ringI
r groupvvvv II
IrImonoid
vv II
v v II
v
{
group $
HH monoid
HH tt
HH t
tt
gcarr HHH$ zt tttmcarr
Type

The algebraic structure of rings is composed by a multiplicative monoid and an

additive group, respectively projected out by the coercions r group and r monoid
so that a ring can be automatically seen by the system as a monoid or a group.
The ring structure can be built when the carriers of the two structures are
compatible (that in intensional type theories can require some non trivial eﬀorts,
see [10] for a detailed explanation).
When the operations of both structures are used in the same formula, the sys-
tem has to solve a particular kind of uniﬁcation problems. For example, consider
the usual distributivity law of the ring structure:

x ∗ (y + z) = x ∗ y + x ∗ z

Expanding the notation we obtain as the left hand side the following

mop ?1 x (gop ?2 y z)

The second argument of mop has type gcarr ?2 but is expected to have type
mcarr ?1 , corresponding to the uniﬁcation problem:
?
gcarr ?2 ≡ mcarr ?1

The system should infer the minimal algebraic structure in which the formula
can be interpreted, and the coercions pullback devices amounts to the calculation
of the pullback (in categorical sense) of the coercions graph for the arrows gcarr
and mcarr. The solution, in our example, is the following substitution:

?2 := r group ?3 ?1 := r monoid ?3
88 A. Asperti et al.

The solution is correct since the carriers of the structures composing the ring
structure are compatible w.r.t. equivalence (i.e. the two paths in the coercions
graph commute), that corresponds to the following property: for every ring r

gcarr (r group r) ≡ mcarr (r monoid r)

3 A Unifying Framework: Uniﬁcation Hints

In higher order logic, or also in first order logic modulo sufficiently powerful
rewriting, unification U is undecidable. To avoid divergence and to manage the
complexity of the problem, theorem provers usually implement a simplified, de-
cidable unification algorithm Uo , essentially based on first order logic, sometimes
extended to cope with reduction (two terms t1 and t2 are unifiable if they have
reducts t1 and t2 - usually computed w.r.t. a given reduction strategy - which
are first order unifiable). Unification hints provide a way to easily extend the
system’s unification algorithm Uo (towards U) with heuristics to choose solu-
tions which can be less than most general, but nevertheless constitute a sensible
default instantiation according to the user.
The general structure of a hint is

→ →
?x := H
myhint
P ≡ Q
→ → →
where P ≡ Q is a linear pattern with free variable F V (P, Q) =?v , ?x ⊆?v , all
→
variables in ?x are distinct and Hi cannot depend on ?xi , . . . , ?xn . A hint is ac-
→ → → →
ceptable if P [H / ?x ] ≡ Q[H / ?x ], i.e. if the two terms obtained by telescopic
substitution, are convertible. Since convertibility is (typically) a decidable rela-
tion, the system is able to discriminate acceptable hints.
Hints are supposed to be declared by the user, or automatically generated by
the systems in peculiar situation. Formally a unification hint induces a schematic
→
unification rule over the schematic variables ?v to reduce unification problems
to simpler ones:

→ ? →
?x ≡ H
?
myhint
P ≡ Q
→
Since ?x are schematic variables, when the rule is instantiated, the unification
→ ? →
problems ?x ≡ H become non trivial.
When a hint is acceptable, the corresponding schematic rule for unification is
→ ? → → →
sound (proof: a solution to ?x ≡ H is a substitution σ such that ?x σ ≡ H σ and
→ → → → ?
thus P σ ≡ P [H / ?x ]σ ≡ Q[H / ?x ]σ ≡ Qσ; hence σ is also a solution to P ≡ Q).
Hints in Unification 89

From the user perspective, the intuitive reading is that, having a unification
? → →
problem of the kind P ≡ Q, then the “hinted” solution is ?x :=H .
The intended use of hints is upon failure of the basic unification algorithm
Uo : the recursive definition unif that implements Uo

let rec unif m n = body

is meant to be simply replaced by

let rec unif m n =

try body
with failure -> try_hints m n

The function try hints simply matches the two terms m and n against the hints
patterns (in a ﬁxed order decided by the user) and returns the ﬁrst solution
found:

and try_hints m n =
match m,n with
| ...
| P,Q when unif(x,H) as sigma -> sigma (* myhint *)
| ...

This simple integration excludes the possibility of backtracking on hints, but

is already expressive enough to cover, as we shall see in the next Section, all the
cases discussed in Section 2.
Due to the lack of backtracking, hints are particularly useful when they are
invertible, in the sense that the hinted solution is also unique, or at least “canon-
ical” from the user point of view. However, even when hints are not canonical,
they provide a strict and sound extension to the basic unification algorithm.
Hints may be easily indexed with efficient data structures investigated in the
field of automatic theorem proving, like discrimination trees.

3.1 Implementing Canonical Structures

Every canonical structure declaration that declares T as the canonical solution

?
for a uniﬁcation problem πi ?S ≡ t →?S := T can be simply turned in the
corresponding uniﬁcation hint:

?S := T
πi ?S ≡ t

3.2 Implementing Type Classes

?
Like canonical structures, type classes are used to solve problems like πi ? ≡ t,
where πi is a projection for a record type R. This kind of uniﬁcation problem
can be seen as inhabitation problems of the form “? : R with πi := t”. Because
90 A. Asperti et al.

of the lack of the with construction in the Calculus of Inductive Constructions,

Sozeau encodes the problem abstracting the record type over t, thus reducing
the problem to the inhabitation of the type R t. Since the the structure of t is
explicit in the type, parametric type class instances like the Cartesian product
described in Section 2.2 can be effectively used as Prolog-like clauses to solve
the inhabitation problem. This approach forces a particular encoding of algebraic
structures, where all the fields that are used to drive inhabitation search have
to be abstracted out. This practice has a nasty impact on the modularity of non
trivial algebraic hierarchies, as already observed in [9,10].
Unification hints can be employed to implement type classes without requiring
an ad-hoc representation of algebraic structures. The following hint schema

?R := {π1 :=?1 . . . πi :=?i . . . πn :=?n }

h-struct-i
πi ?R ≡?i
?
allows to reduce unification problems of the form πi ? ≡ t to the inhabitation
of the fields ?1 . . .?n . Moreover, if we dispose of canonical inhabitants for these
fields we may already express them in the hint. Note that the user is not required
to explicitly declare classes and instances.
Unification hints are flexible enough to also support a different approach that
does not rely on inhabitation but reduces the unification problem to simpler
problems of the same kind.
For example, the unification problem
?
gcarr ?1 ≡ Z × Z

can be solved by the following hint:

?1 := gcarr ?3 ?2 := gcarr ?4 ?0 :=?3 ×?4

h-prod
gcarr ?0 ≡?1 ×?2

Intuitively, the hint says that, if the carrier of a group ?0 is a product ?1 ×?2 ,
where ?1 is the carrier of a group ?3 and ?2 is the carrier of a group ?4 then
we may guess that ?0 is the group product of ?3 and ?4 . This is not the only
possible solution but, in lack of alternatives, it is a case worth to be explored.

3.3 Implementing Coercions Pullback

Coercions are usually represented as arrows between type schemes in a DAG.
A type scheme is a type that can contain metavariables. So, for instance, it is
possible to declare a coercion from the type scheme Vect ?A to the type scheme
List ?A . Since coercions form a DAG, there may exist multiple paths between
two nodes, i.e. alternative ways to map inhabitants of one type to inhabitants of
another type. Since an arc in the graph is a function, a path corresponds to the
functional composition of its arcs. A coercion graph is coherent [8] when every
two paths, seen as composed functions p1 and p2 , are equivalent, i.e. p1 ≡ p2 .
Hints in Uniﬁcation 91

Table 1. Uniﬁcation problems solved by coercion hints

Problem Solution
?
gcarr ?1 ≡ mcarr ?2 ?1 := r group ?3 , ?2 := r monoid ?3
?
gcarr ?1 ≡ mcarr (r monoid ?2 ) ?1 := r group ?2
?
gcarr (r group ?1 ) ≡ mcarr ?2 ?2 := r monoid ?1
?
gcarr (r group ?1 ) ≡ mcarr (r monoid ?2 ) ?2 :=?1

In a coherent dag, any pair of cofinal coercions defines a hint pattern, and the
corresponding pullback projections (if they exist) are the hinted solution.
Consider again the example given in Sect. 2.3. The generated hint is
?1 := r group ?3 ?2 := r monoid ?3
gcarr ?1 ≡ mcarr ?2
This hint is enough to solve all the unification problems listed in Table 1, that
occur often when formalising algebraic structures (e.g. in [11]).

4 Extensions
All the previous examples are essentially based on simple conversions involv-
ing records and projections. A natural idea is to extend the approach to more
complex cases involving arbitrary, possibly recursive functions.
As we already observed, the natural use of hints is in presence of invertible
reductions, where we may infer part of the structure of a term from its reduct.
A couple of typical situations borrowed from arithmetics could be the follow-
ing, where plus and times are defined be recursion on the first argument, in the
obvious way:
?1 := 0 ?2 := 0 ?1 := 1 ?2 := 1
plus0 times1
?1 +?2 ≡ 0 ?1 ∗?2 ≡ 1
To understand the possible use of these hints, suppose for instance to have
the goal
1≤a∗b
under the assumptions 1 ≤ a and 1 ≤ b; we may directly apply the monotonicity
of times
∀x, y, w, z.x ≤ w → y ≤ z → x ∗ y ≤ w ∗ z
that will succeed unifying (by means of the hint) both x and y with 1, w with a
and z with b.
Even when patterns do not admit a unique solution we may nevertheless
identify an “intended” hint.
Consider for instance the unification problem
?
?n +?m ≡ S ?p
In this case there are two possible solutions:
92 A. Asperti et al.

1) ?n := 0 and ?m := S ?p
2) ?n := S ?q and ?p :=?q +?m
however, the ﬁrst one can be considered as somewhat degenerate, suggesting to
keep the second one as a possible hint.
?n := S ?q ?p :=?q +?m
plus-S
?n +?m ≡ S ?p
This would for instance allow to apply the lemma le plus : ∀x, y : N.x ≤ y + x
to prove that m ≤ S(n + m).
The hint can also be used recursively: the uniﬁcation problem
?
?j + ?i ≡ S(S(n + m))

will result in two subgoals,

? ?
?j ≡ S ?q S(n + m) ≡?q +?i
?
plus-S
?j +?i ≡ S(S(n + m))
and the second one will recursively call the hint, resulting in the instantiation
?j := S(S n) and ?i := m (other possible solutions, not captured by the hint,
would instantiate ?j with 0, 1 and 2).

4.1 Simple Reﬂexive Tactics Implementation

Reflexive tactics [1,3] are characterised by an initial phase in which the problem
to be processed is interpreted in an abstract syntax, that is later fed to a normal-
isation function on the abstract syntax that is defined inside the logic. This step
needs to be performed outside the logic, since there is no way to perform pattern
matching on the primitive CIC constructors (i.e. the λ-calculus application).
Let us consider a simple reflexive tactic performing simplification in a semi-
group structure (that amounts to eliminating all parentheses thanks to the
associativity property).
The abstract syntax that will represent the input of the reflexive tactic is
encoded by the following inductive type, where EOp represents the binary semi-
group operation and EVar a semi-group expression that is opaque (that will be
treated as a black box by the reflexive tactic).

inductive Expr (S : semigroup) : Type :=
| EVar : sgcarr S → Expr S
| EOp : Expr S → Expr S → Expr S.

We call sgcarr the projection extracting the carrier of the semi-group structure,
and semigroup the record type representing the algebraic structure under anal-
ysis. Associated to that abstract syntax there is an interpretation function [[·]]S
mapping an abstract term of type Expr S to a concrete one of type sgcarr S.
Hints in Uniﬁcation 93

let rec [[e : Expr S]](S:semigroup) : sgcarr S :=
match e with
[ EVar x ⇒ x
| Eop x y ⇒ sgop S [[x]]S [[y]]S
].

The normalisation function simpl is given the following type and is proved
sound:

let rec simpl (e: Expr S) : Expr S := . . .
lemma soundness:
∀ S:semigroup.∀ P:sgcarr S → Prop.∀ x:Expr S. P [[simpl x]]S →P [[x]]S

Given the following sample goal, imagine the user applies the soundness lemma
(where P is instantiated with λx.x = d).

a + (b + c) = d
yielding the unification problem
?
[[?1 ]]g ≡ a + (b + c) (2)
This is exactly what the extra-logical initial phase of every reflexive tactic has
to do: interpret a given concrete term into an abstract syntax.
We now show how the unification problem is solved declaring the two following
hints, where h-add is declared with higher precedence.
?a := Eop ?S ?x ?y ?m := [[?x ]]?S ?n := [[?y ]]?S
h-add
[[?a ]]?S ≡?m +?n
?a := EVar ?S ?z
h-base
[[?a ]]?S ≡?z
Hint h-add can be applied to problem (2), yielding three new recursive unification
problems. H-base is the only hint that can be applied to the second problem,
while the third one is matched by h-add, yielding three more problems whose
last two can be solved by h-base:
.. ..
. .
? ? ? ?
?x ≡ EVar g a ?y ≡ Eop g ?x ?y b ≡ [[?x ]]g c ≡ [[?y ]]g
? ?
h-base ?
?1 ≡ Eop g ?x ?y a ≡ [[?x ]]g b + c ≡ [[?y ]]g
?
h-add
[[?1 ]]g ≡ a + b + c

The leaves of the tree are all trivial instantiations of metavariables that together
form a substitution that instantiates ?1 with the following expected term:
Eop g (EVar g a) (Eop g (EVar g b) (EVar g c))
94 A. Asperti et al.

4.2 Advanced Reﬂexive Tactic Implementation

The reflexive tactic to put a semi-group expression in canonical form is made
easy by the fact that the mathematical property on which it is based has linear
variable occurrences on both sides of the equation:
∀g : semigroup.∀a, b, c : sgcarr g.a + (b + c) = (a + b) + c
If we consider a richer structure, like groups, we immediately have properties
that are characterised by non linear variable occurrences, for example
∀g : group.∀x : gcarr g.x ∗ x−1 = 1
To apply the simplification rule above, the data type for abstract terms must
support a decidable comparison function. We represent concrete terms external
to the group signature by pointers (De Bruijn indexes) to a heap (represented
as a context Γ ). Thanks to the heap, we can share convertible concrete terms so
that the test for equality is reduced to testing equality of pointers.

record group : Type :={
gcarr : Type;
1 : gcarr;
∗ : gcarr → gcarr → gcarr;
−1
: gcarr → gcarr
}.

The abstract syntax for expressions is encoded in the following inductive type:

inductive Expr : Type :=
| Eunit : Expr
| Emult : Expr → Expr → Expr
| Eopp : Expr → Expr
| Evar : N → Expr.

The interpretation function takes an additional argument that is the heap Γ .
Lookup in Γ is written Γ (m) and returns a dummy value when m is a dandling
pointer.

let rec [[e : Expr; Γ : list (gcarr g)]](g:group) on e : gcarr g :=
match e with
[ Eunit ⇒1
| Emult x y ⇒[[x; Γ ]]g ∗ [[y; Γ ]]g
| Eopp x ⇒[[x; Γ ]]−1
g
| Evar n ⇒Γ (n) ].

For example:
[[Emult (Evar O) (Emult (Eopp (Evar O)) (Evar (S O)))); [x; y]]]g ≡ x∗ (x−1 ∗ y)
The unification problem generated by the application of the reflexive tactic is
of the form
Hints in Unification 95

?
[[?1 ; ?2 ]]?3 ≡ x ∗ (x−1 ∗ y)
and admits multiple solutions (corresponding to permutations of elements in the
heap).
To be able to interpret the whole concrete syntax of groups in the abstract
syntax described by the Expr type, we need the following hints:
?a := Emult ?x ?y ?m := [[?x ; ?Γ ]]?g ?n := [[?y ; ?Γ ]]?g
h-times
[[?a ; ?Γ ]]?g ≡?m ∗?n

?a := Eunit ?a := Eopp ?z ?o := [[?z ; ?Γ ]]?g

h-unit h-opp
[[?a ; ?Γ ]]?g ≡ 1 [[?a ; ?Γ ]]?g ≡?−1
o

To identify equal variables, and give them the same abstract representation,
we need two hints, implementing the lookup operation in the heap (or better,
the generation of a duplicate free heap by means of explicit sharing).
?a := Evar 0 ?Γ :=?r ::?Θ
h-var-base
[[?a ; ?Γ ]]?g ≡?r

?a := Evar (S ?p ) ?Γ :=?s ::?Δ ?q := [[Evar ?p ; ?Δ ]]?g

h-var-rec
[[?a ; ?Γ ]]?g ≡?q

To understand the former rule, consider the following uniﬁcation problem:

?
[[Evar 0; ?t ::?Γ ]]?g ≡ x
Since the first context item is a metavariable, unification (unfolding and com-
puting the definition of [[Evar 0; ?t ::?Γ ]]?g to ?t ) instantiates ?t with x, that
amounts to reserving the first heap position for the concrete term x.
In case the first context item has been already reserved for a different variable,
unification falls back to hint h-var-rec, skipping that context item, and possibly
instantiating the tail of the context ?Γ with x ::?Δ for some fresh metavariable
?Δ .
?
We now go back to our initial example [[?1 ; ?2 ]]?3 ≡ x ∗ (x−1 ∗ y) and follow
step by step how unification is able to find a solution for ?1 and ?2 using hints.
The algorithm starts by applying the hint h-times, yielding one trivial and two
non trivial recursive unification problems:
? ? ?
?1 ≡ Emult ?x ?y x ≡ [[?x ; ?2 ]]?g x−1 ∗ y ≡ [[?y ; ?2 ]]?g
?
h-times
[[?1 ; ?2 ]]?g ≡ x ∗ (x−1 ∗ y)

The second recursive uniﬁcation problem can be solved applying hint h-var-base:
? ?
?x ≡ Evar 0 ?2 ≡ x ::?Θ
?
h-var-base
x ≡ [[?x ; ?2 ]]
96 A. Asperti et al.

The application of the hint h-var-base forces the instantiation of ?2 with x ::

?Θ , thus fixing the first entry of the context to x, but still allowing the free
instantiation of the following elements.
Under the latter instantiation, the third unification problem to be solved
?
becomes x−1 ∗ y ≡ [[?y ; x ::?Θ ]] that requires another application of hint h-times
followed by h-opp on the first premise.
? ? ?
?y ≡ Emult (Evar 0) ?y x−1 ≡ [[?x ; x ::?Θ ]]?g y ≡ [[?y ; x ::?Θ ]]?g
?
h-times
x−1 ∗ y ≡ [[?y ; x ::?Θ ]]?g
?
The first non-trivial recursive unification problem is x−1 ≡ [[?x ; x ::?Θ ]]?g and
can be solved applying hint h-opp first and then h-var-base. The second problem
is more interesting, since it requires an application of h-var-rec:
? ? ?
?y ≡ Evar (S ?p ) x ::?Θ ≡?s ::?Δ y ≡ [[Evar ?p ; ?Δ ]]?g
?
h-var-rec
y ≡ [[?y ; x ::?Θ ]]?g
The two unification problems on the left are easy to solve and lead to the fol-
lowing instantiation
?y := Evar (S ?p ) ?s := x; ?Δ :=?Θ
?
The unification problem left is thus y ≡ [[Evar ?p ; ?Θ ]]?g and can be solved using
hint h-var-base. It leads to the instantiation
?p := Evar 0 ?Θ := y ::?Θ
for a fresh metavariable ?Θ . Note that hint h-var-base was not applicable in place
of h-var-rec since it leads to an unsolvable unification problem that requires the
first item of the context to be equal to both x and y:
? ?
?y ≡ Evar 0 x ::?Θ ≡ y ::?Θ
h-var-base
y ≡ [[?y ; x ::?Θ ]]?g
The solution found for the initial unification problem is thus:

?1 := Emult (Evar O) (Emult (Eopp (Evar O)) (Evar (S O))))

?2 := x :: y ::?Θ
Note that ?Θ is still not instantiated, since the solution for ?1 is valid for every
context that extends x :: y ::?Θ . The user has to choose one of them, the empty
one being the obvious choice.
All problems obtained by the application of the soundness lemma are of the
?
form [[?1 ; ?2 ]]?3 ≡ t. If t contains no metavariables, hints cannot cause divergence
since: h-opp, h-unit and h-times are used a ﬁnite number of times since they
→
consume t; every other problem recursively generated has the form [[?1 ; s ::
?
?Γ ]]?3 ≡ r where r is outside the group signature. To solve each goal, h-var-rec
→
can be applied at most | s | + 1 times and eventually h-var-base will succeed.
Hints in Uniﬁcation 97

5 Conclusions
? ?
In a higher order setting, unification problems of the kind f ?i ≡ o and ?f i ≡ o
are extremely complex. In the latter case, one can do little better than us-
ing generate-and-test techniques; in the first case, the search can be partially
driven by the structure of the function, but still the operation is very expensive.
Moreover, higher order unification does not admit most general unifiers, so both
problems above usually have several different solutions, and it is hard to guide
the procedure towards the intended solution.
On the other side, it is simple to hint solutions to the unification algorithm,
since the system has merely to check their correctness. By adding suitable hints
in a controlled way, we can restrict to a first order setting keeping interesting
higher-order inferences. In particular, we proved that hints are expressive enough
to mimic some interesting ad-hoc unification heuristics like canonical structures,
type classes and coercion pullbacks. It also seems that system provided unifica-
tion errors in case of error-free formulae can be used to suggest to the user the
need for a missing hint, in the spirit of “productive use of failure” [4].
Unification hints can be efficiently indexed using data structures for first order
terms like discrimination trees. Their integration with the general flow of the
unification algorithm is less intrusive than the previously cited ad-hoc techniques.
We have also shown an interesting example of application of unification hints
to the implementation of reflexive tactics. In particular, we instruct the unifica-
tion procedure to automatically infer a syntactic representation S of a term t
such that [[S]] ≡ t, introducing sharing in the process. This operation previously
had to be done by writing a small extra-logical program in the programming
language used to write the system, or in some ad-hoc language for customisa-
tion, like L-tac [5]. Our proposal is superior since the refiner itself becomes able
to solve such unification problems, that can be triggered in situations where the
external language is not accessible, like during semantic analysis of formulae.
A possible extension consists in adding backtracking to the management of
hints. This would require a more intrusive reimplementation of the unification
algorithm; moreover it is not clear that this is the right development direction
since the point is not to just add expressive power to the unification algorithm,
but to get the right balance between expressiveness and effectiveness, especially
in case of failure.
Another possible extension is to relax the linearity constraint on patterns with
the aim to capture more invertible rules, like in the following cases:
?x := 0 ?x := S ?z
plus-0 plus-S
?x +?y ≡?y ?x +?y ≡ S (?z +?y )
It seems natural to enlarge the matching relation allowing the recursive use of
hints, at least when they are invertible. For instance, to solve the unification
?
problem ?1 + (?2 + x) ≡ x we need to apply hint plus-0 but matching the hint
pattern requires a recursive application of hint plus-0 (hence it is not matching
in the usual sense, since ?2 has to be instantiated with 0). The properties of this
“matching” relation need a proper investigation that we leave for future work.
98 A. Asperti et al.

References
1. Barthe, G., Ruys, M., Barendregt, H.: A two-level approach towards lean proof-
checking. In: Berardi, S., Coppo, M. (eds.) TYPES 1995. LNCS, vol. 1158, pp.
16–35. Springer, Heidelberg (1996)
2. Bertot, Y., Gonthier, G., Ould Biha, S., Pasca, I.: Canonical big operators. In:
Mohamed, O.A., Muñoz, C., Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp.
86–101. Springer, Heidelberg (2008)
3. Boutin, S.: Using reflection to build efficient and certified decision procedures. In:
Ito, T., Abadi, M. (eds.) TACS 1997. LNCS, vol. 1281, pp. 515–529. Springer,
Heidelberg (1997)
4. Bundy, A., Basin, D., Hutter, D., Ireland, A.: Rippling: meta-level guidance for
mathematical reasoning. Cambridge University Press, New York (2005)
5. Delahaye, D.: A Tactic Language for the System Coq. In: Parigot, M., Voronkov,
A. (eds.) LPAR 2000. LNCS, vol. 1955, pp. 85–95. Springer, Heidelberg (2000)
6. Gonthier, G., Mahboubi, A., Rideau, L., Tassi, E., Thery, L.: A Modular Formali-
sation of Finite Group Theory. In: Schneider, K., Brandt, J. (eds.) TPHOLs 2007.
LNCS, vol. 4732, pp. 86–101. Springer, Heidelberg (2007)
7. Hall, C., Hammond, K., Jones, S.P., Wadler, P.: Type classes in haskell. ACM
Transactions on Programming Languages and Systems 18, 241–256 (1996)
8. Luo, Z.: Coercive subtyping. J. Logic and Computation 9(1), 105–130 (1999)
9. Luo, Z.: Manifest fields and module mechanisms in intensional type theory. In:
Miculan, M., Scagnetto, I., Honsell, F. (eds.) TYPES 2007. LNCS, vol. 4941.
Springer, Heidelberg (2008)
10. Sacerdoti Coen, C., Tassi, E.: Working with mathematical structures in type theory.
In: Miculan, M., Scagnetto, I., Honsell, F. (eds.) TYPES 2007. LNCS, vol. 4941,
pp. 157–172. Springer, Heidelberg (2008)
11. Sacerdoti Coen, C., Tassi, E.: A constructive and formal proof of Lebesgue’s dom-
inated convergence theorem in the interactive theorem prover Matita. Journal of
Formalized Reasoning 1, 51–89 (2008)
12. Saibi, A.: Typing algorithm in type theory with inheritance. In: The 24th Annual
ACM SIGPLAN - SIGACT Symposium on Principle of Programming Language
(POPL) (1997)
13. Sozeau, M., Oury, N.: First-class type classes. In: Mohamed, O.A., Muñoz, C.,
Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 278–293. Springer, Heidelberg
(2008)
14. The Coq Development Team. The Coq proof assistant reference manual (2005),
http://coq.inria.fr/doc/main.html
15. Wadler, P., Blott, S.: How to make ad-hoc polymorphism less ad hoc. In: POPL
1989: Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles
of programming languages, pp. 60–76. ACM, New York (1989)
16. Wenzel, M.: Type classes and overloading in higher-order logic. In: Gunter,
E.L., Felty, A.P. (eds.) TPHOLs 1997. LNCS, vol. 1275, pp. 307–322. Springer,
Heidelberg (1997)
Psi-calculi in Isabelle

Jesper Bengtson and Joachim Parrow

Dept. of Information Technology, Uppsala University, Sweden

Abstract. Psi-calculi are extensions of the pi-calculus, accommodat-

ing arbitrary nominal datatypes to represent not only data but also
communication channels, assertions and conditions, giving it an expres-
sive power beyond the applied pi-calculus and the concurrent constraint
pi-calculus.
We have formalised psi-calculi in the interactive theorem prover Is-
abelle using its nominal datatype package. One distinctive feature is
that the framework needs to treat binding sequences, as opposed to sin-
gle binders, in an efficient way. While different methods for formalising
single binder calculi have been proposed over the last decades, represen-
tations for such binding sequences are not very well explored.
The main effort in the formalisation is to keep the machine checked
proofs as close to their pen-and-paper counterparts as possible. We dis-
cuss two approaches to reasoning about binding sequences along with
their strengths and weaknesses. We also cover custom induction rules to
remove the bulk of manual alpha-conversions.

1 Introduction

There are today several formalisms to describe the behaviour of computer sys-
tems. Some of them, like the lambda-calculus and the pi-calculus, are intended
to explore fundamental principles of computing and consequently contain as few
and basic primitives as possible. Other are more tailored to application areas
and include many constructions for modeling convenience. Such formalisms are
now being developed en masse. While this is not necessarily a bad thing there
is a danger in developing complicated theories too quickly. The proofs (for ex-
ample of compositionality properties) become gruesome with very many cases
to check and the temptation to resort to formulations such as “by analogy with
. . . ” or “is easily seen. . . ” can be overwhelming. For examples in point, both the
applied pi-calculus [1] and the concurrent constraint pi-calculus [8] have recently
been discovered to have ﬂaws or incompletenesses in the sense that the claimed
compositionality results do not hold [5].
Since such proofs often require stamina and attention to detail rather than
ingenuity and complicated new constructions they should be amenable to proof
mechanisation. Our contribution in this paper is to implement a family of ap-
plication oriented calculi in Isabelle [12]. The calculi we consider are the so
called psi-calculi [5], obtained by extending the basic untyped pi-calculus with
the following parameters: (1) a set of data terms, which can function as both

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 99–114, 2009.

c Springer-Verlag Berlin Heidelberg 2009
100 J. Bengtson and J. Parrow

communication channels and communicated objects, (2) a set of conditions, for

use in conditional constructs such as if statements, (3) a set of assertions, used to
express e.g. constraints or aliases. We base our exposition on nominal data types
and these accommodate e.g. alpha-equivalence classes of terms with binders.
For example, we can use a higher-order logic for assertions and conditions, and
higher-order formalisms such as the lambda calculus for data terms and channels.
The main difficulty in representing calculi such as the lambda-, pi- or psi-
calculi is to find an efficient treatment of binders. Informal proofs often use the
Barendregt variable convention [4], that everything bound is unique. This con-
vention provides a tractable abstraction when doing proofs involving binders,
but it has recently been proven to be unsound in the general case [16]. Theorem
provers have commonly used approaches based on de Bruijn indices [9], higher
order abstract syntax, or nominal logic [13]. We use the nominal datatype pack-
age in Isabelle [15], and its strategy for dealing with single binders. Recent work
by Aydemir et. al. introduce the locally nameless framework [2] which might be
an improvement since the infrastructure is small and elegant.
One of our main contributions in the present paper is to extend the strat-
egy to finite sequences of binders. Though it is possible to recurse over such
sequences and treat each binder individually the resulting proofs would then be-
come morasses of details with no counterpart in the informal proofs. To overcome
this difficulty we introduce the notion of a binding sequence, which simultane-
ously binds arbitrarily finitely many names, and show how it can be implemented
in Isabelle. We use such binding sequences to formulate and establish induction
and inversion rules for the semantics of psi-calculi. The rules have been used to
formally establish compositionality properties of strong bisimilarity. The proofs
are close to their informal counterparts.
We are not aware of any other work on implementing calculi of this calibre in
a proof assistant such as Isabelle. The closest related work are implementations
of the basic pi-calculus, by ourselves [6] and also by others [10,11,14]. Neither
are we aware of any other general technique for multiple binders, other than the
yet unpublished work by Berghofer and Urban which we describe in Section 3.
The rest of the paper is structured as follows. In Section 2 we give a brief
account of psi-calculi and how they use nominal data types. Section 3 treats
implementation issues related to binding sequences and alpha-conversion. In
Section 4 we show how these are used to create our formalisation. In Section 5
we report on the current status of the effort and ideas for further work.

2 Psi-calculi
This section is a brief recapitulation of psi-calculi and nominal data types; for a
more extensive treatment including motivations and examples see [5].

2.1 Nominal Data Types

We assume a countably inﬁnite set of atomic names N ranged over by a, b, . . . , z.
Intuitively, names will represent the symbols that can be statically scoped, and
Psi-calculi in Isabelle 101

also represent symbols acting as variables in the sense that they can be subject to
substitution. A nominal set [13] is a set equipped with name swapping functions
written (a b), for any names a, b. An intuition is that (a b)·X is X with a replaced
by b and b replaced by a. A sequence of swappings is called a permutation, often
denoted p, where p · X means the term X with the permutation p applied to it.
We write p− for the reverse of p. The support of X, written n(X), is the least
set of names A such that (a b) · X = X for all a, b not in A. We write a#X,
pronounced “a is fresh for X”, for a ∈ n(X). If A is a set of names we write
A#X to mean ∀a ∈ A . a#X. We require all elements to have ﬁnite support, i.e.,
n(X) is ﬁnite for all X. A function f is equivariant if (a b) · f (X) = f ((a b) · X)
holds for all X, and similarly for functions and relations of any arity. Intuitively,
this means that all names are treated equally.

2.2 Agents
A psi-calculus is defined by instantiating three nominal data types and four
operators:
Definition 1 (Psi-calculus parameters). A psi-calculus requires the three
(not necessarily disjoint) nominal data types:
T the (data) terms, ranged over by M, N
C the conditions, ranged over by ϕ
A the assertions, ranged over by Ψ
and the four equivariant operators:
.
↔: T × T → C Channel Equivalence
⊗ : A × A → A Composition
1:A Unit
⊆ A ×C Entailment
We require the existence of a substitution function for T, C and A. When X
is a term, condition or assertion we write X[ a := T] to mean the simultaneous
substitution of the names a for the terms T in X. The exact requisites of this
function will be covered in Section 4.
The binary functions above will be written in infix. Thus, if M and N are
.
terms then M ↔ N is a condition, pronounced “M and N are channel equiva-
lent” and if Ψ and Ψ are assertions then so is Ψ ⊗Ψ . Also we write Ψ ϕ, “Ψ
entails ϕ”, for (Ψ, ϕ) ∈ .
We say that two assertions are equivalent if they entail the same conditions:
Definition 2 (assertion equivalence). Two assertions are equivalent, writ-
ten Ψ Ψ , if for all ϕ we have that Ψ ϕ ⇔ Ψ ϕ.
Channel equivalence must be symmetric and transitive, ⊗ must be compositional
with regard to , and the assertions with (⊗, 1) form an abelian monoid.
In the following ã means a finite (possibly empty) sequence of names, a1 , . . . , an .
The empty sequence is written and the concatenation of ã and b̃ is written ãb̃.
102 J. Bengtson and J. Parrow

When occurring as an operand of a set operator, ã means the corresponding set

of names {a1 , . . . , an }. We also use sequences of terms, conditions, assertions etc.
in the same way.
A frame can intuitively be thought of as an assertion with local names:

Deﬁnition 3 (Frame). A frame F is a pair BF , ΨF where BF is a sequence

of names that bind into the assertion ΨF . We use F, G to range over frames.

Name swapping on a frame just distributes to its two components. We identify

alpha equivalent frames, so n(F ) = n(ΨF ) − n(BF ). We overload 1 to also mean
the least informative frame , 1 and ⊗ to mean composition on frames deﬁned
by B1 , Ψ1 ⊗B2 , Ψ2 = B1 B2 , Ψ1 ⊗Ψ2 where B1 is disjoint from n(B2 , Ψ2 ) and
vice versa. We also write Ψ ⊗F to mean , Ψ ⊗F , and (νb)F to mean bBF , ΨF .

Deﬁnition 4 (Equivalence of frames). We deﬁne F ϕ to mean that there

exist BF and ΨF such that F = BF , ΨF , BF #ϕ, and ΨF ϕ. We also deﬁne
F G to mean that for all ϕ it holds that F ϕ iﬀ G ϕ.

Intuitively a condition is entailed by a frame if it is entailed by the assertion and

does not contain any names bound by the frame. Two frames are equivalent if
they entail the same conditions.

Deﬁnition 5 (psi-calculus agents). Given valid psi-calculus parameters as in

Deﬁnition 1, the psi-calculus agents, ranged over by P, Q, . . ., are of the following
forms.
M N.P Output
M (λ x)N.P Input
case ϕ1 : P1 [] · · · [] ϕn : Pn Case
(νa)P Restriction
P |Q Parallel
!P Replication
(|Ψ |) Assertion

In the Input M (λ x)N.P we require that x ⊆ n(N ) is a sequence without dupli-
cates, and here any name in x binds its occurrences in both N and P . Restric-
tion binds a in P . An assertion is guarded if it is a subterm of an Input or
Output . In a replication !P there may be no unguarded assertions in P , and in
case ϕ1 : P1 [] · · · [] ϕn : Pn there may be no unguarded assertion in any Pi .

Formally, we deﬁne name swapping on agents by distributing it over all con-

structors, and substitution on agents by distributing it and avoiding captures by
binders through alpha-conversion in the usual way. We identify alpha-equivalent
agents; in that way we get a nominal data type of agents where the support n(P )
of P is the union of the supports of the components of P , removing the names
bound by λ and ν, and corresponds to the names with a free occurrence in P .
Psi-calculi in Isabelle 103

Table 1. Structured operational semantics. Symmetric versions of Com and Par are
elided. In the rule Com we assume that F(P ) = BP , ΨP and F(Q) = BQ , ΨQ where
BP is fresh for all of Ψ, BQ , Q, M and P , and that BQ is similarly fresh. In the rule
Par we assume that F(Q) = BQ , ΨQ where BQ is fresh for Ψ, P and α. In Open the
expression νã ∪ {b} means the sequence ã with b inserted anywhere.

. .
Ψ M ↔K Ψ M ↔K
In
Out
Ψ M (λ
K N[
y:=L]
y)N.P −−−−−−−→ P [
y := L] Ψ M N.P −−−→ P
KN

α
Ψ Pi −→ P Ψ ϕi
Case
: P −→ P
α
Ψ case ϕ

M (ν
a)N
−−−−−→ P
ΨQ ⊗Ψ P −
KN .
ΨP ⊗Ψ Q −−−→ Q Ψ ⊗ΨP ⊗ΨQ M ↔K
Com
a#Q
τ
a)(P | Q )
Ψ P | Q −→ (ν

α α
ΨQ ⊗Ψ P −→ P Ψ P −→ P
Par bn(α)#Q Scope b#α, Ψ
α α
Ψ P |Q −→ P |Q Ψ (νb)P −→ (νb)P

M (ν
a)N α
−−−−−→ P
Ψ P − b#a, Ψ, M Ψ P | !P −→ P
Open Rep
M (ν
a∪{b})N b ∈ n(N ) α
Ψ (νb)P −−−−−−−−−→ P Ψ !P −→ P

Deﬁnition 6 (Frame of an agent). The frame F (P ) of an agent P is deﬁned

inductively as follows:
F (M (λ : P) = F (!P ) = 1
x)N.P ) = F (M N.P ) = F (case ϕ
F ((|Ψ |)) = , Ψ
F (P | Q) = F (P ) ⊗ F(Q)
F ((νb)P ) = (νb)F (P )

2.3 Operational Semantics

The actions ranged over by α, β are of the following three kinds: Output M (νã)N ,
Input M N , and Silent τ . Here we refer to M as the subject and N as the object.
We deﬁne bn(M (νã)N ) = ã, and bn(α) = ∅ if α is an input or τ .

Deﬁnition 7 (Transitions). A transition is of the kind Ψ P −→ P , mean-

ing that in the environment Ψ the agent P can do an α to become P . The

transitions are deﬁned inductively in Table 1.

3 Binding Sequences
The main diﬃculty when formalising any calculus with binders is to handle
alpha-equivalence. The techniques that have been used thus far by theorem
104 J. Bengtson and J. Parrow

provers share the trait that they only reason about single binders. This works well
for many calculi, but psi-calculi require binding sequences of arbitrary length.
For our psi-calculus datatype (Def. 5), a binding sequence is needed in the
Input-case where the term M (λ x)N.P has the sequence x binding into N and
P . The second place sequences are needed is when defining frames (Def 3).
Frames are derived from processes (Def. 6) and as agents can have an arbi-
trary number of binders, so can the frames. The third occurrence of binding
sequences can be found in the operational semantics (Table 1). In the transition
Ψ P −−−−−−→ P , the sequence
M (ν
a)N
a represents the bound names in P which
occur in the object N .
In order to formalise these types of calculi efficiently in a theorem prover,
libraries with support for sequences of binders have to be added. In the next
sections we will discuss two approaches that have been made in this area, first
one by us, which we call explicit binding sequences, and then one by Berghofer
and Urban which we in this paper will call implicit binding sequences. They
both build on the existing nominal representation of alpha-equivalence classes
where a binding occurrence of the name a in the term T is written [a].T , and
the support of [a].T is the support of T with a removed. From this definition,
creating a term with the binding sequence ã in the term T , written [ã].T , can
easily be done by recursion over ã. The proof that the support of [ã].T is equal
to the support of T with the names of ã removed is trivial. Similarly, the notion
of freshness needs to be expanded to handle sequences. The expression ã#T is
defined as: ∀x ∈ set ã. x#T . This expression is overloaded for when ã is either
a list or a set.

3.1 Explicit Binding Sequences

Our approach is to scale the existing single binder setting to sequences. Isabelle
has native support for generating fresh names, i.e. given any finite context of
names C, Isabelle can generate a name fresh for that context. There is also
a distinctness predicate, written distinct a which states that a contains no
duplicates. From these we can generate a finite sequence a of arbitrary length n
where length a = n,
a#C and distinct a by induction on n.
The term [a].T can be alpha-converted into the term [b].(a b)·T if b#T , where
we call (a b) an alpha-converting swapping. In order to mimic this behaviour with
sequences, we lift name swapping to sequence swapping by pairwise composing
the elements of two sequences to create an alpha-converting permutation. We
a b) for such a composition defined in the following manner:
will write (
Definition 8
([] []) = []
((x :: xs) (y :: ys)) = (x, y) :: (xs ys)
All theories that construct permutations using this function will ensure that the
length of the sequences are equal.
We can now lift alpha-equivalence to support sequences.
Psi-calculi in Isabelle 105

Lemma 1. If length x̃ = length ỹ, distinct ỹ, x̃#ỹ and ỹ#T

then [x̃].T = [ỹ].(x̃ ỹ) · T .
Proof. By induction on the length of x̃ and ỹ.
The distinctness property is a bit stronger than strictly necessary; we only need
that the names in x̃ that actually occur in T have a unique corresponding member
in ỹ. Describing this property formally would be cumbersome and distinctness
is suﬃcient and easy to work with.
Long proofs tend to introduce alpha-converting permutations and it is therefor
important to have a strategy for cancelling these. If a term T has been alpha-
converted using the swapping (a b), becoming (a b) · T , it is possible to apply
the same swapping to the expression where (a b) · T occurs. Using equivariance
properties, the swapping can be distributed over the expression, and when it
reaches (a b) · T , it will cancel out since (a b) · (a b) · T = T . It can also be
cancelled from any remaining term U in the expression, as long as a#U and
b#U . This technique is also applicable when dealing with sequences, where the
alpha-converted term has the form ( a b) · T , with one important observation.
Even though (a b) · (a b) · T = T , it is not generally the case that p · p · T = T . To
cancel a permutation on a term, its inverse must be applied, i.e. p− ·p·T = T . By
applying (a b)− to the expression, the alpha-converting permutation will cancel
out. The permutation will also be cancelled from any remaining term U as long
as a#U and b#U since a#U and b#U implies ( a b) · U = U and (
a b)− · U = U.
In this setting we are able to fully formalise our theories using binding se-
quences. The disadvantage is that facts regarding lengths of sequences and dis-
tinctness need to be maintained throughout the proofs.

3.2 Implicit Binding Sequences

Parallel to our work, Berghofer and Urban developed an alternative theory for
binding sequences which is also being included in the nominal package. Their
approach is to generate the alpha-converting permutation directly using the
following lemma:
Lemma 2. There exists a permutation p s.t. set p ⊆ set x̃ × set(p · x̃) and
(p · x̃)#C.
The intuition is that instead of creating a fresh sequence, a permutation is cre-
ated which when applied to a sequence ensures the needed freshness conditions.
The following corollary makes it possible to discard permutations which are
suﬃciently fresh:
Corollary 1. If x̃#T , ỹ#T and set p ⊆ set x̃ × set ỹ then p · T = T .
From this, a corollary to perform alpha-conversions can be created.
Corollary 2. If set p ⊆ set x̃×set(p· x̃) and (p· x̃)#T then [x̃].T = [p· x̃].p·T .
Proof. since x̃#[x̃].T and (p · x̃)#T we have by Cor. 1 that [x̃].T = p · [x̃].T and
hence by equivariance that [x̃].T = [p · x̃] . p · T .
106 J. Bengtson and J. Parrow

This method has the problem that when cancelling alpha-converting permuta-
tions as in section 3.1, the freshness conditions we use to cancel the permutation
from the remaining terms are lost since (p · x )#U does not imply (p− · x )#U .
We define the following predicate to fix this.
Definition 9. distinctPerm p ≡ distinct((map fst p)@(map snd p))
Intuitively, the distinctPerm predicate ensures that all names in a permutation
are distinct.

Corollary 3. If distinctPerm p then p · p · T = T

Proof. By induction on p.

Thus, by extending Lemma 2 with the condition distintPerm p we get permu-

tations p which can be cancelled by applying p again rather than its inverse.
In general, proofs are easier if we know that the binding sequences are distinct.
The following corollary helps.
Corollary 4. If ã#C then there exists an b̃ s.t. [ã].T = [b̃].T and distinct b̃
and b̃#C.

Proof. Since each name in ã can only bind once in T we can construct b̃ by
replacing any duplicate name in ã with a suﬃciently fresh name.

The advantage of implicit alpha-conversions is that facts about length and dis-
tinctness of sequences do not need to be maintained through the proofs. The
freshness conditions are the ones needed for the single binder case and the dis-
tinctness properties are only needed when cancelling permutations. For most
cases, this method is more convenient to work with. There are disadvantages re-
garding inversion rules, and alpha-equivalence properties that will be discussed
in the next section.

3.3 Alpha-Equivalence
When reasoning with single binders, the nominal approach to alpha-equivalence
is quite straightforward. Two terms [a].T and [b].U are equal if and only if either
a = b and T = U or a = b, a#U and U = (a b) · T . Reasoning about binding
sequences is more diﬃcult. Exactly what does it mean for two terms [ a].T and
[b].U to be equal? As long as T and U cannot themselves have binding sequences
on a top level we know that length a = length b, but the problem with the
general case is what happens when a and b partially share names. As it turns
out, this case is not important in order to reason about these types of equalities,
but special heuristics are required.
The times where we actually get assumptions such as [ a].T = [b].U in our
proofs are when we do induction or inversion over a term with binders. Typically,
[b].U is the term we start with, and [
a].T is the term that appears in the induction
or inversion rule. These rules are designed in such a way that any bound names
Psi-calculi in Isabelle 107

appearing in the rules can be assumed to be suﬃciently fresh. More precisely, we

can ensure that a#b and a#U . If we are working with explicit binding sequences
we can also know that a is distinct. In this case, the heuristic is straightforward.
Using the information provided by the induction rule we know using Lemma 1
that [b].U = [ a b) · U and hence that T = (
a].( a b) · U . From here we continue
with the proofs similarly to the single binder case.
When working with implicit sequences the problem is a bit more delicate.
These rules have been generated using a permutation designed to ensure fresh-
ness conditions and we do not know exactly how a and b originally related to
each other. We do know that the terms are alpha-equivalent and as such, there
is a permutation which equates them. We ﬁrst prove the following corollary:
Corollary 5. If [a].T = [b].U then a ∈ supp T = b ∈ supp U and a#T = b#U .

Proof. By the deﬁnition of alpha-equivalence on terms.

We can now prove the following lemma:

a].T = [b].U and
Lemma 3. If [ a#b then there exists a permutation p s.t.
set p ⊆ set a × set b,
a#U and T = p · U
Proof. The intuition here is to construct p by using Cor. 5 to ﬁlter out the pairs
of names from a and b that do not occur in T and U respectively and pairing
together the rest. The proof is done by induction on the length of a and b.

The problem with this approach is that we do not know how a and b are related.
If we know that they are both distinct then we can construct p such that a = p· b
but generally we do not know this. The problematic cases are the ones dealing
with inversion, in which case we resort to explicit binding sequences, but for the
majority of our proofs Lemma 3 is enough.

4 Formalisation
Psi-calculi are parametric calculi. A speciﬁc instance is created by instantiating
the framework with dataterms for the terms, assertions and conditions of the
calculus. We also require an entailment relation, a notion of channel equality
and composition of assertions. Isabelle has good support for reasoning about
parametric systems through the use of locales [3].

4.1 Substitution Properties

We require a substitution function on agents. Since terms, assertions and con-
ditions of psi-calculi are parameters, a locale is created to ensure that a set of
substitution properties hold.

Deﬁnition 10. A term M of type α is a substType if there is a substitution

function subst :: α ⇒ name list ⇒ β list ⇒ α which meets the following
constraints, where length x̃ = length T̃ and distinct x̃
108 J. Bengtson and J. Parrow

Equivariance: p · (M [x̃ := T̃ ]) = (p · M )[(p · x̃) := (p · T̃ )]

Freshness: if a#M [x̃ := T̃ ] and a#x̃ then a#M
if a#M and a#T̃ then a#M [x̃ := T̃ ]
if set x̃ ⊆ supp M and a#M [x̃ := T̃ ] then a#T̃
if x̃#M̃ then M [x̃ := T̃ ] = M
if x̃#ỹ and ỹ#T̃ then M [x̃ỹ := T̃ Ũ ] = (M [x̃ := T̃ ])[ỹ := Ũ]
Alpha-equivalence: if set p ⊆ set x̃ × set(p · x̃) and (p · x̃)#M then
M [x̃ := T̃ ] = (p · M )[(p · x̃) := T̃ ]

The intuition is that subst is a simultaneous substitution function which replaces

all occurrences of the names in x̃ in M with the corresponding dataterm in T̃ .
All that the locale dictates is that there is a function of the correct type which
satisfies the constraints. Exactly how it works needs only be specified when
creating an instance of the calculus in order to prove that the constraints are
satisfied.
These constraints are the ones we need for the formalisation but we have not
proven that they are strictly minimal. We leave this for future work.

4.2 The Psi Datatype

Nominal Isabelle does not support datatypes with binding sequences or nested
datatypes. The two cases that are problematic when formalising psi-calculi are
the Input case, which requires a binding sequence, and the Case case which
requires a list of assertions and processes. The required datatype can be encoded
using mutual recursion in the following way.

Deﬁnition 11. The psi-calculi datatype has three type variables for terms, as-
sertions and conditions respectively. In the Res and the Bind cases, name is a
binding occurrence.

nominal datatype (α, β, γ) psi = Output α α (α, β, γ) psi

In order to create a substitution function for (α β γ) psi we create a locale with

the following three substitution functions as substTypes.
Psi-calculi in Isabelle 109

substTerm :: α ⇒ name list ⇒ α list ⇒ α

substAssert :: β ⇒ name list ⇒ α list ⇒ β
substCond :: γ ⇒ name list ⇒ α list ⇒ γ
These functions will handle substitutions on terms, assertions and conditions
respectively. Note that we always substitute names for terms.
The substitution function for psi can now be deﬁned in the standard way
where the substitutions are pushed through the datatype avoiding the binders.
The axioms for substType can then be proven for the psi substitution func-
tion where the axioms themselves are used when the proofs reaches the terms,
assertions and conditions.

4.3 Frames
The four nominal morphisms from Def. 1 are also encoded using locales along
with their equivariance properties. From this definition, implementing Def. 2
and a locale for our requirements on assertion equivalence is straightforward.
To implement frames, the following nominal datatype is created:
Definition 12
nominal datatype β frame = Assertion β
| FStep name (β frame)
In order to overload the ⊗ operator to work on frames as described in Def. 3
we create the following two nominal functions.
Definition 13
insertAssertion (Assertion Ψ ) Ψ = Assertion(Ψ ⊗Ψ )
x#Ψ ⇒ insertAssertion (FStep x F ) Ψ = FStep x (insertAssertion F Ψ )
(Assertion Ψ ) ⊗ G = insertAssertion G Ψ
x#G ⇒ (FStep x F ) ⊗ G = FStep x (F ⊗ G)

The following lemma is then derivable:

Lemma 4. If BP #BQ , BP #ΨQ and BQ #ΨP
then BP , ΨP ⊗ BQ , ΨQ = BP @BQ , ΨP ⊗ΨQ .
The implementations of Defs. 4 and 6 are then straightforward.

4.4 Operational Semantics

The operational semantics in Def. 7 is formalised in a similar manner to [6].
Since the actions on the labels can contain bound names which bind into the
derivative of the transition, a residual datatype needs to be created which com-
bines the actions with their derivatives. Since a bound output can contain an
arbitrary number of bound names, binding sequences must be used here in a
similar manner to psi and frame.
110 J. Bengtson and J. Parrow

nominal datatype (α, β, γ) boundOutput =

Output α (α, β, γ) psi
| BStep name (α, β, γ) boundOutput
datatype α action = Input α α
| Tau
datatype (α, β, γ) residual = Free (α action) ((α, β, γ) psi)
| Bound α ((α, β, γ) boundOutput)

We will use the notation (νa)N ≺ P for a term of type boundOutput which has
a )N ≺ P
a into N and P . We can also write Ψ P −→ M (ν
the binding sequence
for Ψ P −−−−−−→ P and similarly for input and tau transitions.
M (ν
a)N

As usual, the operational semantics is deﬁned using an inductively deﬁned

predicate. As in [6] rules which can have either free or bound residuals are split
into these two cases. We also saturate our rules with freshness conditions to
ensure that the bound names are fresh for for all terms outside their scope.
This is done to satisfy the vc-property described in [16] so that Isabelle can
automatically infer an induction rule, but also to give us as much freshness
information as possible when doing induction on transitions. Moreover, all frames
are required to have distinct binding sequences. The introduction rules in Table 1
only include the freshness conditions which are strictly necessary and frames
with non distinct binding sequences. These can be inferred from our inductive
deﬁnition using regular alpha converting techniques and Cor. 4.
We will not cover the complete semantics here, just two rules to demonstrate
some diﬀerences to the pen-and-paper formalisation.
The transition rule Par has the implicit assumption that F (Q) = BQ , ΨQ .
When formalising the semantics, one inductive case will look as follows:

ΨQ ⊗Ψ P −→ P
α
F (Q) = BQ , ΨQ BQ #Ψ, P, α, P , Q
Par
distinct BQ
Ψ P |Q −→ P |Q
α

Inferring the transition for P means selecting a speciﬁc alpha-variant of F (Q)

as ΨQ appears without binders in the inference of the transition. Freshness con-
ditions for BQ are central for the proofs to hold.
Next consider the rule Open. We want the binding sequence on the transition
to behave like a set in that we must not depend on the order of its names. Our
formalisation solves this by explicitly splitting the binding sequence in two and
placing the opened name in between. By creating a rule which holds for all such
splits, we mimic the eﬀect of a set.

a
b#a,
c, Ψ, M
Ψ P −−−−−−→ P
M (ν c)N
b ∈ n(N )
Open

a#Ψ, P, M, c
Ψ (νb)P −−−−−−−→ P
M (ν
ab
c)N

c#Ψ, P, M
Psi-calculi in Isabelle 111

4.5 Induction Rules

At the core of any nominal formalisation is the need to create custom induction
rules which allow the introduced bound names to be fresh for any given context.
Without these, the user is forced to do manual alpha-conversions throughout
the proofs and such proofs will diﬀer signiﬁcantly from their pen and paper
counterparts, where freshness is just assumed. An in depth description can be
found in [16]. Very recent additions to the nominal package generate induction
rules where the user is allowed to choose a set of name which can be arbitrarily
fresh for each inductive case. In most cases, this set will be the set of binders
present in the rule.

Standard induction. Isabelle will automatically create a rule for doing in-
duction on transitions of the form Ψ P −→ Rs, where Rs is a residual. In
nominal induction the predicate to be proven has the extra argument C, such
that all bound names introduced by the induction rule are fresh for C. Thus, the
predicate has the form Prop C Ψ P Rs. This induction rule is useful for very
general proofs about transitions, but we often need proofs which are specialised
for input, output, or tau transitions. We create the following custom induction
rules:
Lemma 5.

Ψ P −−−→ P
MN
Ψ P −−−−−−→ P
M (ν
a)N
Ψ P −→ P
τ

.. .. ..
. . .
Prop C Ψ P M N P a)(N ≺ P ))
Prop C Ψ P M ((ν Prop C Ψ P P

Proof. Follows immediately from the induction rule generated by Isabelle.

The inductive steps for each rule have been left out as they are instances of the
ones from the automatically generated induction rule, but with the predicates
changed to match the corresponding transition.
These induction rules work well only as long as the predicate to be proven
does not depend on anything under the scope of a binder. Trying to prove the
following lemma illustrates the problem.

Lemma 6. If Ψ P −−−−−−→ P , x#P and x#

M (ν
a)N
a then x#N and x#P

Proof. By induction over the transitions of the form Ψ P −−−−−−→ P .

M (ν
a)N

The problem is that none of the induction rules we have will prove this lemma
in a satisfactory way. Every applicable case in the induction rule will introduce
its own bound output term (νb)N ≺ P where we know that (νb)N ≺ P =
(νa)N ≺ P . What we need to prove relates to the term P , what the inductive
hypotheses will give us is something related to the term P where all we know
is that they are part of alpha-equivalent terms.
112 J. Bengtson and J. Parrow

Proving this lemma on its own is not too diﬃcult but in every step of ev-
ery proof of this type, manual alpha-conversions and equivariance properties are
needed. The following induction rule solves this problem.

Ψ P −−−−−−→ P
M (ν
a)N
⎛ ⎞
a#b, Ψ, P, M, C ∧ b#N, P ∧

⎜ set p ⊆ set
a × set b ∧ ⎟
a N P b p C. ⎜
∀Ψ P M ⎝ Prop C Ψ P M
⎟
⎠
a N P −→

Prop C Ψ P M b (p · N ) (p · P )
..
.
Prop C Ψ P M a N P
The diﬀerence between this rule and the output rule in Lemma 5 is that the
predicate in Lemma 5 takes a residual (ν a)N ≺ P as one argument and the

predicate in this rule takes
a, N and P as three separate ones. By disassoci-
ating the binding sequence from the residual in this manner we have lost the
ability to alpha-convert the residual, but we have gained the ability to reason
about terms under the binding sequence. The extra added case in the induction
rule above (beginning with ∀Ψ P M . . .) is designed to allow the predicate to
mimic the alpha-conversion abilities we have lost. When proving this induction
rule, Lemma 3 is used in each step to generate the alpha-converting permuta-
tion, Prop is proven in the standard way and then alpha-converted using the
new inductive case.
With this lemma, we must prove that the predicate we are trying to prove can
respect alpha-conversions. The advantage is that it only has to be done once for
each proof. Moreover, the case is very general and does not require the processes
or actions to be of a speciﬁc form.
Using this induction rule will not allow us to prove lemmas which reason
directly about the binding sequence a. The new inductive case swaps a sequence
a for b but as in Lemma 3, we do not know exactly how these sequences relate

to each other.

Induction with frames. A very common proof strategy in the psi-calculus is

to do induction on a transition of a process which has a speciﬁc frame. Trying
to prove the following lemma illustrates this.

Lemma 7. If Ψ P −−−→ P , F (P ) = BP , ΨP , X#P and BP #X, Ψ, P, M

MN
.
then there exists a K. s.t. Ψ ⊗ΨP M ↔ K and X#K.

Proof. By induction on the transition Ψ P −−−→ P . The intuition of the

proof is that K is the subject in the process P .

This lemma suﬀers from the same problem as Lemma 6 – every inductive step
will generate a frame alpha-equivalent to BP , ΨP and many tedious alpha-
conversions have to be done to prove the lemma. Moreover, some of our lemmas
Psi-calculi in Isabelle 113

need to directly reason about the binding sequence of the frame. A similar in-
duction rule as for output transitions can be created to solve the problem.

Ψ P −−−→ P
MN

F (P ) = BP , ΨP
⎛ distinct BP ⎞
(p · BP )#Ψ, P, M, C, N, P , BP ∧ BP #ΨP ∧
⎜ set p ⊆ set BP × set(p · BP ) ∧ ⎟
∀Ψ P M N P BP ΨP p C. ⎜ ⎟
⎝ Prop C Ψ P M N P BP ΨP −→ ⎠

Prop C Ψ P M N P (p · BP ) (p · ΨP )
..
.
Prop C Ψ P M N P BP ΨP

This lemma requires that the binding sequence BP is distinct. This added re-
quirement allows the alpha converting case to relate the sequence BP to p · BP
allowing for a larger class of lemmas to be proven. Our semantics require all
frames to have distinct binding sequences making this added requirement un-
problematic.
A corresponding lemma has to be created for output transitions as well, but
since frames only aﬀect subjects as far as input and output transitions are con-
cerned, this induction rule does not have to use the same mechanism for the
bound names in the residual as for the ones in the frame.
After introducing these custom induction rules, we were able to remove thou-
sands of lines of code which were only dealing with alpha-conversions.

5 Conclusions and Future Work

Nominal Isabelle has proven to be a very potent tool when doing this formali-
sation. Its support for locales has made the formalisation of parametric calculi
such as psi-calculi feasible and the nominal datatype package handles binders
elegantly.
Psi-calculi require substantially more infrastructure than the pi-calculus [6].
The reason for this is mainly that binding sequences are a very new addition to
the nominal package, and many of the automatic rules are not fully developed.
Extending the support for binding sequences will require a fair bit of work, but
we believe that the custom induction rules that we have designed can be created
automatically as they do not use any intrinsic properties of psi-calculi.
We are currently working on extending our framework to include weak bisim-
ulation and barbs. We also plan to work on typed psi calculi where we aim to
make the type system as general and parametric as psi calculi themselves.
The source ﬁles for this formalisation can be found at:
http://www.it.uu.se/katalog/jesperb/psi.tar.gz
114 J. Bengtson and J. Parrow

Acknowledgments. We want to convey our sincere thanks to Stefan Berghofer

for his hard work on expanding the nominal package to include the features we
have needed for this formalisation.

References
1. Abadi, M., Fournet, C.: Mobile values, new names, and secure communication. In:
Proceedings of POPL 2001, pp. 104–115. ACM, New York (2001)
2. Aydemir, B., Charguéraud, A., Pierce, B.C., Pollack, R., Weirich, S.: Engineer-
ing formal metatheory. In: POPL 2008: Proceedings of the 35th annual ACM
SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 3–15.
ACM, New York (2008)
3. Ballarin, C.: Locales and locale expressions in isabelle/isar. In: Berardi, S., Coppo,
M., Damiani, F. (eds.) TYPES 2003. LNCS, vol. 3085, pp. 34–50. Springer,
Heidelberg (2004)
4. Barendregt, H.P.: The Lambda Calculus – Its Syntax and Semantics. Studies in
Logic and the Foundations of Mathematics, vol. 103. North-Holland, Amsterdam
(1984)
5. Bengtson, J., Johansson, M., Parrow, J., Victor, B.: Psi-calculi: Mobile processes,
nominal data, and logic. Technical report, Uppsala University (2009); (submitted),
http://user.it.uu.se/~ joachim/psi.pdf
6. Bengtson, J., Parrow, J.: Formalising the pi-calculus using nominal logic. In: Seidl,
H. (ed.) FOSSACS 2007. LNCS, vol. 4423, pp. 63–77. Springer, Heidelberg (2007)
7. Berghofer, S., Urban, C.: Nominal Inversion Principles. In: Mohamed, O.A.,
Muñoz, C., Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 71–85. Springer,
Heidelberg (2008)
8. Buscemi, M.G., Montanari, U.: Open bisimulation for the concurrent constraint
π-calculus. In: Drossopoulou, S. (ed.) ESOP 2008. LNCS, vol. 4960, pp. 254–268.
Springer, Heidelberg (2008)
9. de Bruijn, N.G.: Lambda calculus notation with nameless dummies. a tool for
automatic formula manipulation with application to the church-rosser theorem.
Indagationes Mathematicae 34, 381–392 (1972)
10. Hirschkoff, D.: A full formalisation of π-calculus theory in the calculus of construc-
tions. In: Gunter, E.L., Felty, A.P. (eds.) TPHOLs 1997. LNCS, vol. 1275, pp.
153–169. Springer, Heidelberg (1997)
11. Honsell, F., Miculan, M., Scagnetto, I.: π-calculus in (co)inductive type theory.
Theoretical Comput. Sci. 253(2), 239–285 (2001)
12. Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle/HOL. LNCS, vol. 2283. Springer,
Heidelberg (2002)
13. Pitts, A.M.: Nominal logic, a first order theory of names and binding. Information
and Computation 186, 165–193 (2003)
14. Röckl, C., Hirschkoff, D.: A fully adequate shallow embedding of the π-calculus in
Isabelle/HOL with mechanized syntax analysis. J. Funct. Program. 13(2), 415–451
(2003)
15. Urban, C.: Nominal techniques in Isabelle/HOL. Journal of Automated Reason-
ing 40(4), 327–356 (2008)
16. Urban, C., Berghofer, S., Norrish, M.: Barendregt’s variable convention in rule in-
ductions. In: Pfenning, F. (ed.) CADE 2007. LNCS, vol. 4603, pp. 35–50. Springer,
Heidelberg (2007)
Some Domain Theory and Denotational
Semantics in Coq

Nick Benton1 , Andrew Kennedy1 , and Carsten Varming2,

1
Microsoft Research, Cambridge, UK
{nick,akenn}@microsoft.com
2
Carnegie-Mellon University, Pittsburgh, USA
[email protected]

Abstract. We present a Coq formalization of constructive ω-cpos

(extending earlier work by Paulin-Mohring) up to and including the
inverse-limit construction of solutions to mixed-variance recursive do-
main equations, and the existence of invariant relations on those solu-
tions. We then define operational and denotational semantics for both a
simply-typed CBV language with recursion and an untyped CBV
language, and establish soundness and adequacy results in each case.

1 Introduction

The use of proof assistants in formalizing language metatheory and implementing

certified tools has grown enormously over the last five years or so. Most current
work on mechanizing language definitions and type soundness results, certified
compilation, proof carrying code, and so on has been based on operational se-
mantics. But in our work on both certified compilation and on the semantics of
languages with state, we have often found ourselves wanting a Coq formalization
of the kind of denotational semantics that we have grown accustomed to working
with on paper.
Mechanizing domain theory and denotational semantics has an illustrious his-
tory. Provers such as HOL, Isabelle/HOL and Coq can all trace an ancestral line
back to Milner’s LCF [16], which was a proof checker for Scott’s PPλ logic of
cpos, continuous functions and admissible predicates. And although later sys-
tems were built on less domain-specific foundations, there have subsequently
been dozens of formalizations of different notions of domains and bits of seman-
tics, with examples in all the major provers. Few, however, have really gone
far enough to be useful. This paper describes our Coq formalization of ω-cpos
and the denotational semantics of both typed and untyped versions of a simple
functional language, going considerably further than previous work. A compan-
ion paper [8] describes a non-trivial compiler correctness theorem that has been
formalized and proved using one of these denotational models.

Research supported in part by National Science Foundation Grants CCF-0541021,
CCF-0429505.

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 115–130, 2009.

c Springer-Verlag Berlin Heidelberg 2009
116 N. Benton, A. Kennedy, and C. Varming

Our formalization is based on a Coq library for constructive pointed ω-cpos

and continuous functions written by Paulin-Mohring [20] as a basis for a se-
mantics of Kahn networks, and of probabilistic programs [6]. Section 2 describes
our slight generalization of Paulin-Mohring’s library to treat predomains and
a general lift monad. In Section 3, we then define a simply-typed call-by-value
functional language, give it a denotational semantics using our predomains and
prove the standard soundness and adequacy theorems, establishing the corre-
spondence between the operational and denotational semantics. These results
seem not to have been previously mechanized for a higher-order language.
Section 4 is about solving recursive domain equations. We formalize Scott’s in-
verse limit construction along the lines of work by Freyd [11,12] and Pitts [22,23].
This approach characterizes the solutions as minimal invariants, yielding rea-
soning principles that allow one to construct and work with recursively-defined
predicates and relations over the recursively-defined domains. In Section 5, we
define the semantics of an untyped call-by-value language using a particular re-
cursive domain, and use the associated reasoning principles to again establish
soundness and adequacy theorems.

2 Basic Domain Theory

This first part of the development is essentially unchanged from the earlier work
of Paulin-Mohring [20]. The main difference is that Paulin-Mohring formalized
pointed cpos and continuous maps, with a special-case construction of flat cpos
(those that arise from adding a bottom element under all elements of an other-
wise discretely ordered set), whereas we use potentially bottomless cpos (‘pre-
domains’) and formalize a general constructive lift monad.

2.1 Complete Partial Orders

We start by defining the type of preorders, comprising a carrier type tord (to
which :> means we can implicitly coerce), a binary relation Ole (written infix
as "), and proofs that Ole is reflexive and transitive:
Record ord := mk ord
{tord :> Type;
Ole : tord → tord → Prop;
Ole refl : ∀ x : tord, Ole x x ;
Ole trans : ∀ x y z : tord, Ole x y → Ole y z → Ole x z }.
Infix "" := Ole.
The equivalence relation == is then defined to be the symmetrisation of ":
Definition Oeq (O : ord ) (x y : O) := x y ∧ y x.
Infix "==" := Oeq (at level 70).
Both == and " are declared as parametric Setoid relations, with " being
a partial order modulo ==. Most of the constructions that follow are proved
and declared to be morphisms with respect to these relations, which then allows
convenient (in)equational rewriting in proofs.
Some Domain Theory and Denotational Semantics in Coq 117

The type of monotone functions between partial orders is a parameterized

record type, comprising a function between the underlying types of the two
order parameters and a proof that that function preserves order:
Definition monotonic (O1 O2 : ord ) (f : O1 → O2 ) := ∀ x y, x y → f x f y.
Record fmono (O1 O2 : ord ) := mk fmono
{fmonot :> O1 → O2 ;
fmonotonic: monotonic fmonot}.
For any O1 O2 : ord, the monotonic function space O1 →m O2 : ord is defined
by equipping fmono O1 O2 with the order inherited from the codomain, f " g
iff f x " g x for all x.
We define natO : ord by equipping the set of natural numbers, nat, with the
usual ‘vertical’ order, ≤. If c : natO →m O for some O : ord, we call c a chain
in O. Now a complete partial order is defined as a dependent record comprising
an underlying order, tord, a function for computing the least upper bound of
any chain in tord, and proofs that this is both an upper bound (le lub), and less
than or equal to any other upper bound (lub le):
Record cpo:= mk cpo
{tcpo :> ord ;
: (natO→m tcpo) → tcpo;
le lub : ∀ (c : natO→m tcpo) (n : nat), c n c;
lub le : ∀ (c : natO→m tcpo) (x : tcpo), (∀ n, c n x ) → c x }.
This definition of a complete partial order is constructive in the sense that we
require least upper bounds of chains not only to exist, but to be computable in
Coq’s logic of total functions.
A monotone function f between two cpos, D1 and D2 , is continuous if it
preserves (up to ==) least upper bounds. One direction of this is already a
consequence of monotonicity, so we just have to specify the other:
Definition continuous (D1 D2 : cpo) (f : D1 →m D2 ) :=
∀ c : natO→m D1 , f ( c) (f ◦c).
Record fconti (D1 D2 : cpo) := mk fconti
{fcontit : D1 →m D2 ;
fcontinuous : continuous fcontit}.
For any D1 D2 : cpo, the continuous function space D1 →c D2 : ord is
defined by equipping the type fconti D1 D2 with the pointwise order inherited
from D2 . We then define D1 ⇒c D2 : cpo by equipping D1 →c D2 with least
upper bounds computed pointwise: if c : natO →m (D1 →c D2 ) is a chain, then
c : (D1 →c D2 ) is λd1 . (λn.c n d1 ).
If D : cpo, write ID D : D →c D for the continuous identity function on D.
If f : D →c E and g : E →c F write g ◦ f : D →c F for their composition.
Composition of continuous maps is associative, with ID as a unit.

Discrete cpos. If X : Type then equipping X with the order x1 " x2 iﬀ x1 = x2

(i.e. Leibniz equality) yields a cpo that we write Discrete X.
Finite products. Write 1 for the one-point cpo, Discrete unit, which is termi-
nal, in that for any f g : D →c 1, f == g. If D1 D2 : cpo then equipping
118 N. Benton, A. Kennedy, and C. Varming

the usual product of the underlying types of their underlying orders with
the pointwise ordering yields a product order. Equipping that order with a
pointwise least upper bound operation c = ( (fst ◦c), (snd◦c)) for c →m
D1 × D2 yields a product cpo D1 × D2 with continuous πi : D1 × D2 →c Di .
We write f, g for the unique (up to ==) continuous function such that
f == π1 ◦ f, g and g == π2 ◦ f, g .
Closed structure. We can define operations curry : (D × E →c F ) → (D →c
E ⇒c F ) and ev : (E ⇒c D) × E →c D such that for any f : D × E →c F ,
curry(f ) is the unique continuous map such that f == ev ◦ curry f ◦ π1 , π2 .
We define uncurry : (D ⇒c E ⇒c F ) →c D × E ⇒c F by uncurry =
curry(ev ◦ev ◦π1 , π1 ◦π2 , π2 ◦π2 ) and we check that uncurry(curry(f )) == f
and curry(uncurry(h)) == h for all f and h.
So our internal category CPO of cpos and continuous maps is Cartesian closed.
We elide the details of other constructions, including finite coproducts, strict
function spaces and general indexed products, that are in the formalization.
Although our cpos are not required to have least elements, those that do are of
special interest. We use Coq’s typeclass mechanism to capture them:
Class Pointed (D : cpo) := { ⊥ : D; Pleast : ∀ d : D, ⊥ d }.
Instance DOne pointed : Pointed 1.
Instance prod pointed A B { pa : Pointed A} {pb : Pointed B } : Pointed (A × B ).
Instance fun pointed A B {pb : Pointed B } : Pointed (A ⇒c B ).
Now if D is Pointed, and f : D →c D then we can define fixp f , the least fixed
point of f in the usual way, as the least upper bound of the chain of iterates
of f starting at ⊥. We define FIXP : (D ⇒c D) →c D to be the ‘internalised’
version of fixp.
If D : cpo and P : D → Prop, then P is admissible if for all chains c :
natO →m D such that (∀n. P (cn )), one has P ( c). In such a case, the subset
type {d : D | P (d)} with the order and lubs inherited from D is a cpo. We can
also prove the standard fixed point induction principle:
Definition fixp ind D { pd : Pointed D} : ∀ (F : D →m D)(P : D→ Prop),
admissible P → P ⊥ → (∀ x, P x → P (F x )) → P (fixp F ).
The main technical complexity in this part of the formalization is simply the
layering of definitions, with (for example) cpos being built on ord s, and D ⇒c E
being built on D →c E, which is built on D →m E, which is built on D → E.
Definitions have to be built up in multiple staged versions and there are many
implicit coercions and hints for Coq’s auto tactic, which are tricky to get right.
There is also much boilerplate associated with morphism declarations supporting
setoid rewriting, and there is some tension between the elementwise and ‘point-
free’ styles of working.

2.2 The Lift Monad

The basic order theory of the previous section goes through essentially as it
does when working classically on paper. In particular, the deﬁnitions of lubs
in products and function spaces are already constructive. But lifting will allow
Some Domain Theory and Denotational Semantics in Coq 119

us to express general partial recursive functions, which, in Coq’s logic of total

functions, is clearly going to involve some work. Our solution is a slight general-
ization of Paulin-Mohring’s treatment of the particular case of flat cpos, which
in turn builds on work of Capretta [9] on general recursion in type theory. We
exploit Coq’s support for coinductive datatypes [10], defining lifting in terms of
a type Stream of potentially infinite streams:
Variable D : cpo.
CoInductive Stream := Eps : Stream → Stream | Val : D → Stream.
An element of Stream is (classically) either the infinite Eps (Eps (Eps (. . . ))),
or some finite sequence of Eps steps, terminated by Val d for some d : D,
Eps (Eps (. . . Eps (Val d) . . .)). One can think of Stream as defining a resump-
tions monad, which we will subsequently quotient to define lifting. For x : Stream
and n : nat, pred nth x n is the stream that results from removing the first n
Eps steps from x. The order on Stream is coinductively defined by
CoInductive DLle : Stream→ Stream→ Prop :=
| DLleEps : ∀ x y, DLle x y → DLle (Eps x ) (Eps y)
| DLleEpsVal : ∀ x d, DLle x (Val d ) → DLle (Eps x ) (Val d )
| DLleVal : ∀ d d’ n y, pred nth y n = Val d’ → d d’ → DLle (Val d ) y.
which satisfies the following coinduction principle:
Lemma DLle rec : ∀ R : Stream→ Stream→ Prop,
(∀ x y, R (Eps x ) (Eps y) → R x y) →
(∀ x d, R (Eps x ) (Val d ) → R x (Val d )) →
(∀ d y, R (Val d ) y → ∃ n, ∃ d’, pred nth y n = Val d’ ∧ d d’ )
→ ∀ x y, R x y → DLle x y.
The coinduction principle is used to show that DLle is reflexive and transitive,
allowing us to construct a preorder DL ord : ord (and we now write the usual "
for the order). The infinite stream of Eps ’s, Ω, is the least element of the order.
Constructing a cpo from DL ord is slightly subtle. We need to define a func-
tion that maps chains c : (natO →m DL ord) to their lubs in DL ord. An
important observation is that if some cn is non-Ω, i.e. there exists a dn such
that cn == Val dn , then for any m ≥ n, there is a dm such that cm == Val dm
and that moreover, the sequence dn , dn+1 , . . . , forms a chain in D. Classically,
the idea is that we look for such a cn ; if we find one, then we can construct a
chain in D and return Val applied to the least upper bound of that chain. If
there’s no such chain then the least upper bound is Ω. But we cannot simply
test whether a particular cn is Ω or not: we can only examine finite prefixes.
So we make a ‘parallel’ corecursive search through all the cn s, which may be
pictured something like this:1

c0 = Eps ···
: Eps v: Eps
vvv vv
c1 = Eps Eps ? ?
:
vvv
c2 = Eps ? ? ?

1
In reality, the output stream ‘ticks’ less frequently than the picture would suggest.
120 N. Benton, A. Kennedy, and C. Varming

The output we are trying to produce is an element of DL ord. Each time our
interleaving search finds an Eps , we produce an Eps on the output. So if every
element of the chain is Ω, we will end up producing Ω on the output. But should
we find a Val d after outputting some finite number of Eps s, then we know all
later elements of the chain are also non-Ω, so we go ahead and build the chain
in D that they form and compute its least upper bound using the lub operation
of D. The details of this construction, and the proof that it does indeed yield
the least upper bound of the chain c, involve interesting bits of constructive
reasoning: going from knowing that there is a chain in D to actually having that
chain in one’s hand so as to take its lub uses (a provable form of) constructive
indefinite description, for example. But at the end of the day, we end up with a
constructive definition of D⊥ : cpo, which is clearly Pointed.
Lifting gives a strong monad [17] on CPO. The unit η : D →c D⊥ applies the
Val constructor. If f : D →c E⊥ define kleisli f : D⊥ →c E⊥ to be the map
cofix kl (d : D⊥ ) : E⊥ := match d with Eps dl ⇒ Eps (kl dl) | Val d ⇒ f d
Thinking operationally, the way in which kleisli sequences computations is very
intuitive. To run kleisli f d, we start by running d. Every time d takes an Eps
step, we do too, so if d diverges so does kleisli f d. Should d yield a value d ,
however, the remaining steps are those of f d . We prove that kleisli f actually
is a continuous function and, amongst other things, satisfies all the equations
making (−⊥ , η, kleisli(−)) a Kleisli triple on CPO. It is also convenient to have
‘parameterized’ versions of the Kleisli operators Kleislir D E : (D × E →c
F⊥ ) → (D × E⊥ →c F⊥ ) defined by composing kleisli with the evident strength
τ : D × E⊥ →c (D × E)⊥ .

3 A Simply-Typed Functional Language

Our ﬁrst application of the domain theory formalization is mechanize the denota-
tional semantics of PCFv , a simply-typed, call-by-value functional language with
recursion. Types in PCFv consist of integer, boolean, functions and products;
we represent typing environments by a list of types.
Inductive Ty := Int | Bool | Arrow (τ1 τ2 : Ty) | Prod (τ1 τ2 : Ty).
Infix " -> " := Arrow . Infix " * " := Prod (at level 55).
Definition Env := list Ty.
We separate syntactic values v from general expressions e, and restrict the
syntax to ANF (administrative normal form), with explicit sequencing of evalu-
ation by LET and inclusion of values into expressions by VAL. As usual, there
immediately arises the question of how to represent binders. Our ﬁrst attempt
used de Bruijn indices of type nat in the syntax representation, and a separate
type for typing judgments:
Inductive Value := VAR : nat → Value | FIX : Ty → Ty → Exp → Value |. . .
Inductive VTy (Γ : Env ) (τ : Ty) : Value → Type :=
| TVAR : ∀ m , nth error Γ m = Some τ → VTy Γ (VAR m) τ
| TFIX : ∀ e τ1 τ2 , (τ = τ1 -> τ2 ) → ETy (τ1 :: τ1 -> τ2 :: Γ ) e τ2 → VTy Γ (FIX τ1
τ2 e) τ . . .
Some Domain Theory and Denotational Semantics in Coq 121

The major drawback of the above is that typing judgments contain proof
objects: simple equalities between types, as in TFIX, and the existence of a
variable in the environment, as in TVAR. It’s necessary to prove (at some length)
that any two typings of the same term are equal, whilst definitions and theorems
are hedged with well-formedness side-conditions.
We recently switched to a strongly-typed term representation in which
variable and term types are indexed by Ty and Env, ensuring that terms are
well-typed by construction. Definitions and theorems become more natural and
much more concise, and the problems with equality proofs go away.2 Here is the
complete definition of well-typed terms:
Inductive Var : Env → Ty → Type :=
| ZVAR : ∀ Γ τ , Var (τ :: Γ ) τ | SVAR : ∀ Γ τ τ , Var Γ τ → Var (τ :: Γ ) τ .
Inductive Value : Env → Ty → Type :=
| TINT : ∀ Γ , nat → Value Γ Int | TBOOL : ∀ Γ , bool → Value Γ Bool
| TVAR : ∀ Γ τ , Var Γ τ → Value Γ τ
| TFIX : ∀ Γ τ1 τ2 , Exp (τ1 :: τ1 -> τ2 :: Γ ) τ2 → Value Γ (τ1 -> τ2 )
| TPAIR : ∀ Γ τ1 τ2 , Value Γ τ1 → Value Γ τ2 → Value Γ (τ1 * τ2 )
with Exp : Env → Ty → Type :=
| TFST : ∀ Γ τ1 τ2 , Value Γ (τ1 * τ2 ) → Exp Γ τ1
| TSND : ∀ Γ τ1 τ2 , Value Γ (τ1 * τ2 ) → Exp Γ τ2
| TOP : ∀ Γ , (nat → nat → nat) → Value Γ Int → Value Γ Int → Exp Γ Int
| TGT : ∀ Γ , Value Γ Int → Value Γ Int → Exp Γ Bool
| TVAL : ∀ Γ τ , Value Γ τ → Exp Γ τ
| TLET : ∀ Γ τ1 τ2 , Exp Γ τ1 → Exp (τ1 :: Γ ) τ2 → Exp Γ τ2
| TAPP : ∀ Γ τ1 τ2 , Value Γ (τ1 -> τ2 ) → Value Γ τ1 → Exp Γ τ2
| TIF : ∀ Γ τ , Value Γ Bool → Exp Γ τ → Exp Γ τ → Exp Γ τ .
Definition CExp τ := Exp nil τ . Definition CValue τ := Value nil τ .
Variables of type Var Γ τ are represented by a “typed” de Bruijn index that
is in essence a proof that τ lives at that index in Γ . The typing rule associated
with each term constructor can be read directly off its definition: for example,
TLET takes an expression typed as τ1 under Γ , and another expression typed
as τ2 under Γ extended with a new variable of type τ1 ; its whole type is then τ2
under Γ . The abbreviations CExp and CValue define closed terms.
Now the operational semantics can be presented very directly:
Inductive Ev : ∀ τ , CExp τ → CValue τ → Prop :=
| e Val : ∀ τ (v : CValue τ ), TVAL v ⇓ v
| e Op : ∀ op n1 n2 , TOP op (TINT n1 ) (TINT n2 ) ⇓ TINT (op n1 n2 )
| e Gt : ∀ n1 n2 , TGT (TINT n1 ) (TINT n2 ) ⇓ TBOOL (ble nat n2 n1 )
| e Fst : ∀ τ1 τ2 (v1 : CValue τ1 ) (v2 : CValue τ2 ), TFST (TPAIR v1 v2 ) ⇓ v1
| e Snd : ∀ τ1 τ2 (v1 : CValue τ1 ) (v2 : CValue τ2 ), TSND (TPAIR v1 v2 ) ⇓ v2
| e App : ∀ τ1 τ2 e (v1 : CValue τ1 ) (v2 : CValue τ2 ),
substExp (doubleSubst v1 (TFIX e)) e ⇓ v2 → TAPP (TFIX e) v1 ⇓ v2
| e Let : ∀ τ1 τ2 e1 e2 (v1 : CValue τ1 ) (v2 : CValue τ2 ),
e1 ⇓ v1 → substExp (singleSubst v1 ) e2 ⇓ v2 → TLET e1 e2 ⇓ v2
| e IfTrue : ∀ τ (e1 e2 : CExp τ ) v, e1 ⇓ v → TIF (TBOOL true) e1 e2 ⇓ v
2
The new Program and dependent destruction tactics in Coq 8.2 are invaluable for
working with this kind of strongly dependent representation.
122 N. Benton, A. Kennedy, and C. Varming

| e IfFalse : ∀ τ (e1 e2 : CExp τ ) v, e2 ⇓ v → TIF (TBOOL false) e1 e2 ⇓ v

where "e ’⇓’ v " := (Ev e v ).
Substitutions are typed maps from variables to values:
Definition Subst Γ Γ := ∀ τ , Var Γ τ → Value Γ τ .
Definition hdSubst Γ Γ τ : Subst (τ ::Γ ) Γ → Value Γ τ := . . ..
Definition tlSubst Γ Γ τ : Subst (τ ::Γ ) Γ → Subst Γ Γ := . . ..
To apply a substitution on de Bruijn terms (functions substVal and subst-
Exp), one would conventionally define a shift operator, but the full dependent
type of this operator (namely Val (Γ ++ Γ ) τ → Val (Γ ++[τ ] ++ Γ ) τ ) is hard
to work with. Instead, we first define general renamings (maps from variables to
variables), and then bootstrap substitution on terms, defining shift in terms of
renaming [5,15,1]. Definitions and lemmas regarding composition must be simi-
larly bootstrapped: first composition of renamings is defined, then composition
of substitution with renaming, and finally composition of substitutions.

3.1 Denotational Semantics

The semantics of types is inductive, using product of cpo’s to interpret products
and continuous functions into a lifted cpo to represent call-by-value functions.
Fixpoint SemTy τ := match τ with
| Int ⇒ Discrete nat | Bool ⇒ Discrete bool
| τ1 -> τ2 ⇒ SemTy τ1 ⇒c (SemTy τ2 )⊥
| τ1 * τ2 ⇒ SemTy τ1 × SemTy τ2 end.
Fixpoint SemEnv Γ := match Γ with nil ⇒ 1 | τ :: Γ ⇒ SemEnv Γ × SemTy τ
end.
We interpret Value Γ τ in SemEnv Γ →c SemTy τ . Expressions are similar,
except that the range is a lifted cpo. Note how we have used a ‘point-free’ style,
with no explicit mention of value environments.
Fixpoint SemVar Γ τ (var : Var Γ τ ) : SemEnv Γ →c SemTy τ :=
match var with ZVAR ⇒ π2 | SVAR v ⇒ SemVar v ◦ π1 end.
Fixpoint SemExp Γ τ (e : Exp Γ τ ) : SemEnv Γ →c (SemTy τ )⊥ :=
match e with
| TOP op v1 v2 ⇒ η ◦ uncurry (SimpleOp2 op) ◦ SemVal v1 , SemVal v2
| TGT v1 v2 ⇒ η ◦ uncurry (SimpleOp2 ble nat) ◦ SemVal v2 , SemVal v1
| TAPP v1 v2 ⇒ ev ◦ SemVal v1 , SemVal v2
| TVAL v ⇒ η ◦ SemVal v
| TLET e1 e2 ⇒ Kleislir (SemExp e2 ) ◦ ID, SemExp e1
| TIF v e1 e2 ⇒ (choose @3 (SemExp e1 )) (SemExp e2 ) (SemVal v )
| TFST v ⇒ η ◦ π1 ◦ SemVal v
| TSND v ⇒ η ◦ π2 ◦ SemVal v
end with SemVal Γ τ (v : Value Γ τ ) : SemEnv Γ →c SemTy τ :=
match v with
| TINT n ⇒ K (n : Discrete nat)
| TBOOL b ⇒ K (b : Discrete bool )
| TVAR i ⇒ SemVar i
| TFIX e ⇒ FIXP ◦ curry (curry (SemExp e))
| TPAIR v1 v2 ⇒ SemVal v1 , SemVal v2 end.
Some Domain Theory and Denotational Semantics in Coq 123

In the above SimpleOp2 lifts binary Coq functions to continuous maps on

discrete cpos, K is the usual combinator and choose is a continuous conditional.

3.2 Soundness and Adequacy

We ﬁrst prove soundness, showing that if an expression e evaluates to a value v,

then the denotation of e is indeed the denotation of v. This requires that substi-
tution commutes with the semantic meaning function. We define the ‘semantics’
of a substitution s : Subst Γ Γ to be a map in SemEnv Γ →c SemEnv Γ .
Fixpoint SemSubst Γ Γ : Subst Γ Γ → SemEnv Γ →c SemEnv Γ :=
match Γ with
| nil ⇒ fun s ⇒ K (tt : 1)
| :: ⇒ fun s ⇒ SemSubst (tlSubst s) , SemVal (hdSubst s) end.
This is then used to prove the substitution lemma, which in turn is used in
the e App and e Let cases of the soundness proof.
Lemma SemCommutesWithSubst:
(∀ Γ τ (v : Value Γ τ ) Γ (s : Subst Γ Γ ),
SemVal v ◦ SemSubst s == SemVal (substVal s v ))
∧ (∀ Γ τ (e : Exp Γ τ ) Γ (s : Subst Γ Γ ),
SemExp e ◦ SemSubst s == SemExp (substExp s e)).
Theorem Soundness: ∀ τ (e : CExp τ ) v, e ⇓ v → SemExp e == η ◦ SemVal v.
We now prove adequacy, showing that if the denotation of a closed expression e
is some (lifted) element, then e converges to a value. The proof uses a logical
relation between syntax and semantics. We start by defining a liftRel operation
that takes a relation between a cpo and values and lifts it to a relation between
a lifted cpo and expressions, then use this to define relExp in terms of relVal.
Definition liftRel τ (R : SemTy τ → CValue τ → Prop) :=
fun d e ⇒ ∀ d’, d == Val d’ → ∃ v, e ⇓ v ∧ R d’ v.
Fixpoint relVal τ : SemTy τ → CValue τ → Prop := match τ with
| Int ⇒ fun d v ⇒ v = TINT d
| Bool ⇒ fun d v ⇒ v = TBOOL d
| τ1 -> τ2 ⇒ fun d v ⇒ ∃ e, v = TFIX e ∧ ∀ d1 v1 , relVal τ1 d1 v1 → liftRel (relVal
τ2 ) (d d1 ) (substExp (doubleSubst v1 v ) e)
| τ1 * τ2 ⇒ fun d v ⇒ ∃ v1 , ∃ v2 , v = TPAIR v1 v2 ∧ relVal τ1 (π1 d ) v1 ∧ relVal τ2
(π2 d ) v2 end.
Fixpoint relEnv Γ : SemEnv Γ → Subst Γ nil → Prop := match Γ with
| nil ⇒ fun ⇒ True
| τ :: Γ ⇒ fun d s ⇒ relVal τ (π2 d ) (hdSubst s) ∧ relEnv Γ (π1 d ) (tlSubst s) end.
Definition relExp τ := liftRel (relVal τ ).
The logical relation reflects == and is admissible:
Lemma relVal lower : ∀ τ d d’ v, d d’ → relVal τ d’ v → relVal τ d v.
Lemma relVal admissible: ∀ τ v, admissible (fun d ⇒ relVal τ d v ).
These lemmas are then used in the proof of the Fundamental Theorem for the
logical relation, which is proved by induction on the structure of terms.
124 N. Benton, A. Kennedy, and C. Varming

Theorem FT : (∀ Γ τ v ρ s, relEnv Γ ρ s → relVal τ (SemVal v ρ) (substVal s v ))

∧ (∀ Γ τ e ρ s, relEnv Γ ρ s → relExp τ (SemExp e ρ) (substExp s e)).
Now we instantiate the fundamental theorem with closed expressions to obtain
Corollary Adequacy: ∀ τ (e : CExp τ ) d, SemExp e tt == Val d → ∃ v, e ⇓ v.

4 Recursive Domain Equations

We now outline our formalization of the solution of mixed-variance recursive
domain equations, such as arise in modelling untyped higher-order languages,
languages with higher-typed store or languages with general recursive types.
The basic technology for solving domain equations is Scott’s inverse limit
construction, our formalization of which follows an approach due to Freyd [11,12]
and Pitts [23]. A key idea is to separate the positive and negative occurences,
specifying recursive domains as fixed points of locally continuous bi-functors
F : CPOop × CPO → CPO, i.e. domains D such that such that F (D, D) D.
The type of mixed variance locally-continuous bifunctors on CPO is defined as
the type of records comprising an action on pairs of objects (ob), a continuous
action on pairs of morphisms (mor ), contravariant in the first argument and
covariant in the second, together with proofs that mor respects both composition
(morph comp) and identities (morph id ): Record BiFunctor := mk functor
{ ob : cpo → cpo → cpo;
mor : ∀ (A B C D : cpo), (B ⇒c A) × (C ⇒c D) ⇒c (ob A C ⇒c ob B D) ;
morph comp: ∀ A B C D E F f g h k,
mor B E D F (f, g) ◦ mor A B C D (h, k )
== mor (h ◦f, g ◦k ) ;
morph id : ∀ A B, mor (IDA , IDB ) == ID
We single out the strict bifunctors, taking pointed cpos to pointed cpos:
Definition FStrict : BiFunctor → Type :=
fun BF ⇒ ∀ D E, Pointed D → Pointed E → (Pointed (ob BF D E )).
We build interesting bifunctors from a few primitive ones in a combinatory
style. If D : cpo then BiConst D : BiFunctor on objects is constantly D and on
morphisms is constantly ID D. This is strict if D is pointed. BiArrow : BiFunctor
on objects takes (D, E) to D ⇒c E with the action on morphisms given by con-
jugation. This is strict. If F : BiFunctor then BiLift F : BiFunctor on objects
takes (D, E) to (ob F (D, E))⊥ and on morphisms composes mor F with the mor-
phism part of the lift functor. This is always strict. If F1 F2 : BiFunctor then
BiPair F1 F2 : BiFunctor on objects takes (D, E) to (ob F1 (D, E))×(ob F2 (D, E))
with the evident action on morphisms. This is strict if both F1 and F2 are. The
definition of BiSum F1 F2 : BiFunctor is similar, though this is not generally
strict as our coproduct is a separated sum.
A pair (f : D →c E, g : E →c D) is an embedding-projection (e-p) pair if
g ◦ f == idD and f ◦ g " idE . If F is a bifunctor and (f, g) an e-p pair, then
(mor F (g, f ), mor F (f, g)) is an e-p pair.
Now let F : BiFunctor and F S : FStrict F . We define Di to be the sequence
of cpos defined by D0 = 1 and Dn+1 = ob F (Dn , Dn ). We then define a sequence
of e-p pairs:
Some Domain Theory and Denotational Semantics in Coq 125

e0 = ⊥ : D0 →c D1 p0 = ⊥ : D1 →c D0
en+1 = mor F (pn , en ) : Dn+1 →c Dn+2 pn+1 = mor F (en , pn ) : Dn+2 →c Dn+1 .
Let πi : Πj Dj →c Di be the projections from the product of all the Dj s.
The predicate P : Πj Dj → Prop defined by P d := ∀i, πi d == pn (πi+1 d) is
admissible, so we can define the sub-cpo D∞ to be {d | P d} with order and lubs
inherited from the indexed product. D∞ will be the cpo we seek, so we now need
to construct the required isomorphism.
Define tn : Dn → D∞ to be the map that for i < n projects Dn to Di
via pi ◦ · · · ◦ pn−1 and for i > n embeds Dn in Di via en ◦ · · · ◦ ei−1 . Then
mor F (ti , πi ) : ob F (D∞ , D∞ ) →c ob F (Di , Di ) = Di+1 , so ti+1 ◦ mor F (ti , πi ) :
ob F (D∞ , D∞ ) →c D∞ , and mor F (πi , ti ) ◦ π1+1 : D∞ →c ob F (D∞ , D∞ ). We
then define

UNROLL := (mor F (πi , ti ) ◦ πi+1 ) : D∞ →c ob F (D∞ , D∞ )
i

ROLL := (ti+1 ◦ mor F (ti , πi )) : ob F (D∞ , D∞ ) →c D∞
i

Some calculation shows that ROLL ◦ UNROLL == ID D∞ and UNROLL ◦

ROLL == ID(ob F (D∞ , D∞ )), so we have constructed the desired isomorphism.
In order to do anything useful with recursively defined domains, we really need
some general reasoning principles that allow us to avoid unpicking all the com-
plex details of the construction above every time we want to prove something.
One ‘partially’ abstract interface to the construction reveals that D∞ comes
equipped with a chain of retractions ρi : D∞ →c D∞ such that i ρi == IDD∞ ;
concretely, ρi can be taken to be ti ◦ πi . A more abstract and useful principle
is given by Pitts’s [23] characterization of the solution as a minimal invariant,
which is how we will establish the existence of a recursively defined logical rela-
tion in Section 5.1. Let δ : (D∞ ⇒c D∞ ) →c (D∞ ⇒c D∞ ) be given by
δ e = ROLL ◦ mor F (e, e) ◦ UNROLL
The minimal invariance property is then the assertion that IDD∞ is equal to
fix(δ), which we prove via a pointwise comparison of the chain of retractions
whose lub we know to be the identity function with the chain whose lub gives
the least fixed point of δ.

5 A Uni-typed Lambda Calculus

We now apply the technology of the previous section to formalize the denota-
tional semantics of an uni-typed (untyped) CBV lambda calculus with constants.
This time the values are variables, numeric constants, and λ abstractions; ex-
pressions are again in ANF with LET and VAL constructs, together with func-
tion application, numeric operations, and a zero-test conditional. For binding,
we use de Bruijn indices and separate well-formedness judgments ETyping and
VTyping. The evaluation relation is as follows:
126 N. Benton, A. Kennedy, and C. Varming

Inductive Evaluation : Exp → Value → Type :=

| e Value : ∀ v, VAL v ⇓ v
| e App : ∀ e v2 v, substExp [v2 ] e ⇓ v → (APP (LAMBDA e) v2 ) ⇓ v
| e Let : ∀ e1 v1 e2 v2 , e1 ⇓ v1 → substExp [v1 ] e2 ⇓ v2 → (LET e1 e2 ) ⇓ v2
| e Ifz1 : ∀ e1 e2 v1 , e1 ⇓ v1 → IFZ (NUM O) e1 e2 ⇓ v1
| e Ifz2 : ∀ e1 e2 v2 n, e2 ⇓ v2 → IFZ (NUM (S n)) e1 e2 ⇓ v2
| e Op : ∀ op v1 n1 v2 n2, OP op (NUM n1 ) (NUM n2 ) ⇓ NUM (op n1 n2 )
where "e ’⇓’ v " := (Evaluation e v ).
Inductive Converges e : Prop := e Conv : ∀ v ( : Evaluation e v ), Convergese.
Notation "e ’⇓’" := (Converges e).
Note that Evaluation is here in Type rather than Prop, which is a knock-
on eﬀect of separating the deﬁnition of terms from that of well-formedness.3 We
plan instead to make untyped terms well-scoped by construction, using the same
techniques as we did for the typed language in Section 3.

5.1 Semantic Model

We interpret the unityped language in a solution for the recursive domain equa-
tion D (nat + (D →c D))⊥ , following the intuition that a computation either
diverges or produces a value which is a number or a function. This is not the
‘tightest’ domain equation one could use for CBV: one could make function space
strict, or equivalently make the argument of the function space be a domain of
values rather than computations. But this equation still gives an adequate model.
The construction in Coq is an instantiation of results from the previous section.
First we build the strict bifunctor F (D, E) = (nat + (D →c E))⊥ :
Definition FS := BiLift strict (BiSum (BiConst (Discrete nat)) BiArrow ).
And then we construct the solution, deﬁning domains D∞ for computations
and V∞ for values:
Definition D∞ := D∞ FS.
Definition V∞ := Dsum (Discrete nat) (D∞ →c D∞ ).
Definition Roll : (V∞ )⊥ →c D∞ := ROLL FS.
Definition Unroll : D∞ →c (V∞ )⊥ := UNROLL FS.
Definition UR iso : Unroll ◦ Roll == ID := DIso ur FS.
Definition RU iso : Roll ◦ Unroll == ID := DIso ru FS.
For environments we deﬁne the n-ary product of V∞ and projection function.
Fixpoint SemEnv n : cpo := match n with O ⇒ 1| S n ⇒ SemEnv n × V∞ end.
Fixpoint projenv (m n : nat) : (m < n) → SemEnv n →c V∞ :=
match m, n with
| m, O ⇒ fun inconsistent ⇒ match (lt n O m inconsistent) with end
| O, S n ⇒ fun ⇒ π2
| S m, S n ⇒ fun h ⇒ projenv (lt S n h) ◦ π1 end.
3
We induce over evaluations to construct well-formedness derivations when showing
well-formedness preservation, and well-formedness derivations are themselves in Type
so that we can use them to inductively define the denotational semantics.
Some Domain Theory and Denotational Semantics in Coq 127

We deﬁne a lifting operator Dlift : (V∞ →c D∞ ) →c D∞ →c D∞ and an operator

Dapp : V∞ × V∞ →c (V∞ )⊥ that applies the first component of a pair to the sec-
ond, returning ⊥ in the when the first component is not a function. (Coproducts
are introduced with INL and INR, and eliminated with [·, ·].)
Definition Dlift : (V∞ →c D∞ ) →c D∞ →c D∞ :=
curry (Roll ◦ ev ◦kleisli ◦ (Unroll ◦ −) ◦ π1 , Unroll ◦ π2 ).
Definition Dapp : V∞ × V∞ →c (V∞ )⊥ :=
ev ◦ [⊥ : Discrete nat →c D∞ →c (V∞ )⊥ , (Unroll ◦ −)] ◦ π1 , Roll ◦ η ◦ π2 .
We can then define the semantics of the unityped language:
Fixpoint SemVal v n (vt : VTyping n v ) : SemEnv n →c V∞ :=
match vt with
| TNUM n ⇒ INL ◦ (@K (Discrete nat) n)
| TVAR m nthm ⇒ projenv nthm
| TLAMBDA t b ⇒ INR ◦ Dlift ◦ curry (Roll ◦ SemExp b)
end with SemExp e n (et : ETyping n e) : SemEnv n →c (V∞ )⊥ :=
match et with
| TAPP v1 v2 ⇒ Dapp ◦ SemVal v1 , SemVal v2
| TVAL v ⇒ η ◦ SemVal v
| TLET e1 e2 ⇒ ev ◦ curry(Kleislir(SemExp e2 )), SemExp e1
| TOP op v1 v2 ⇒
uncurry(Operator2 (η ◦ INL ◦ uncurry(SimpleOp2 op))) ◦
[η, ⊥ : (D∞ →c D∞ ) →c (Discrete nat)⊥ ] ◦ SemVal v1 ,
[η, ⊥ : (D∞ →c D∞ ) →c (Discrete nat)⊥ ] ◦ SemVal v2
| TIFZ v e1 e2 ⇒ ev ◦
[[K (SemExp e1 ), K (SemExp e2 )] ◦ zeroCase,
⊥ : (D∞ →c D∞ ) →c SemEnv n →c (V∞ )⊥ ] ◦ (SemVal v ), ID end.

5.2 Soundness and Adequacy

As with the typed language, the proof of soundness makes use of a substitution
lemma, and in addition uses the isomorphism of the domain D∞ in the case for
APP. The proof then proceeds by induction, using equational reasoning to show
that evaluation preserves semantics:
Lemma Soundness: ∀ e v (et : ETyping 0 e) (vt : VTyping 0 v ),
e ⇓ v → SemExp et == η ◦ SemVal vt.
We again use a logical relation between syntax and semantics to prove ad-
equacy, but now cannot define the relation by induction on types. Instead we
have a recursive specification of a logical relation over our recursively defined do-
main, but it is not at all clear that such a relation exists: because of the mixed
variance of the function space, the operator on relations whose fixpoint we seek
is not monotone. Following Pitts [23], however, we again use the technique of
separating positive and negative occurrences, defining a monotone operator in
the complete lattice of pairs of relations, with the superset order in the first
component and the subset order in the second. A fixed point of that operator is
then constructed by Knaster-Tarski.
We first define a notion of admissibility on ==-respecting relations between
elements of our domain of values V∞ and closed syntactic values in Value and
128 N. Benton, A. Kennedy, and C. Varming

show that this is closed under intersection, so admissible relations form a com-
plete lattice.
We then define a relational action corresponding to the bifunctor used in
defining our recursive domain. This action, RelV , maps a pair of relations R, S
on (V∞ )⊥ × Value to a new relation that relates (inl m) to (NUM m) for all
m : nat, and relates (inr f ) to (LAMBDA e) just when f : D∞ →c D∞ is strict
and satisfies the ‘logical’ property

∀(d, v) ∈ R, ∀d , Unroll(f (Roll(Val d))) == Val d

→ ∃v , substExp v e ⇓ v ∧ (d , v ) ∈ S

We then show that RelV maps admissible relations to admissible relations

and is contravariant in its first argument and covariant in its second. Hence the
function λR : RelAdmop . λS : RelAdm. (RelV S R, RelV R S) is monotone
on the complete lattice RelAdmop × RelAdm. Thus it has a least fixed point
(Δ− , Δ+ ). By applying the minimal invariant property from the previous section,
we prove that in fact Δ− == Δ+ , so we have found a fixed point, Δ of RelV ,
which is the logical relation we need to prove adequacy.
We extend Δ to Δe , a relation on (V∞ )⊥ × Exp, by (d, e) ∈ Δe if and only if
for all d if d == Val d then there exists a value v and a derivation e ⇓ v such
that (d , v) ∈ Δ.
The fundamental theorem for this relation is that for any environment env,
derivations of VTyping n v and ETyping n e, and any list vs of n closed values
such that nth error vs i = Some v implies (projenv i env, v) ∈ Δ for all i and v,
(SemVal v env , substVal vs v) ∈ Δ and (SemExp e env , substExp vs e) ∈ Δe .
Adequacy is then a corollary of the fundamental theorem:
Theorem Adequacy: ∀ e (te : ETyping 0 e) d, SemExp te tt == Val d → e ⇓ .

6 Discussion

As we noted in the introduction, there have been many mechanized treatments

of different aspects of domain theory and denotational semantics. One rough
division of this previous work is between axiomatic approaches and those in
which definitions and proofs of basic results about cpos, continuous functions
and so on are made explicitly with the prover’s logic. LCF falls into the first
category, as does Reus’s work on synthetic domain theory in LEGO [25]. In the
second category, HOLCF, originally due to Regensburger [24] and later reworked
by Müller et al [18], uses Isabelle’s axiomatic type class mechanism to define
and prove properties of domain-theoretic structures within higher order logic.
HOLCPO [2,4] was an extension of HOL with similar goals, and basic definitions
have also been formalized in PVS [7]. Coq’s library includes a formalization by
Kahn of some general theory of dcpos [14].
HOLCF is probably the most developed of these systems, and has been used to
prove interesting results [19,26], but working in a richer dependent type theory
gives us some advantages. We can express the semantics of a typed language
Some Domain Theory and Denotational Semantics in Coq 129

as a dependently typed map from syntax to semantics, rather than only being
able to do shallow embeddings – this is clearly necessary if one wishes to prove
theorems like adequacy or do compiler correctness. Secondly, it seems one really
needs dependent types to work conveniently with monads and logical relations,
or to formalize the inverse limit construction.4
The constructive nature of our formalization and the coinductive treatment of
lifting has both benefits and drawbacks. On the minus side, some of the proofs
and constructions are much more complex than they would be classically and
one does sometimes have to pay attention to which of two classically-equivalent
forms of definition one works with. Worse, some constructions do not seem to be
possible, such as the smash product of pointed domains; not being able to define
⊗ was one motivation for moving from Paulin-Mohring’s pointed cpos to our
unpointed ones. One benefit that we have not yet seriously investigated, however,
is that it is possible to extract actual executable code from the denotational
semantics. Indeed, the lift monad can be seen as a kind of syntax-free operational
semantics, not entirely unlike game semantics; this perspective, and possible
connections with step-indexing, seem to merit further study.
The Coq development is of a reasonable size. The domain theory library,
including the theory of recursive domain equations, is around 7000 lines. The
formalization of the typed language and its soundness and adequacy proofs are
around 1700 lines and the untyped language takes around 2500. Although all the
theorems go through (with no axioms), we have to admit that the development
is currently rather ‘rough’. Nevertheless, we have already used it as the basis
of a non-trivial formalization of some new research [8] and our intention is to
develop the formalization into something that is more widely useful. Apart from
general polishing, we plan to abstract some of the structure of our category of
domains to make it convenient to work simultaneously with different categories,
including categories of algebras. We would also like to provide better support for
‘diagrammatic’ rewriting in monoidal (multi)categories. It is convenient to use
Setoid rewriting for pointfree equational reasoning, direct translating the normal
categorical commuting diagrams. But dealing with all the structural morphisms
is still awkward, and it should be possible to support something more like the
diagrammatic proofs one can do with ‘string diagrams’ [13].

References
1. Adams, R.: Formalized metatheory with terms represented by an indexed family
of types. In: Filliâtre, J.-C., Paulin-Mohring, C., Werner, B. (eds.) TYPES 2004.
LNCS, vol. 3839, pp. 1–16. Springer, Heidelberg (2006)
2. Agerholm, S.: Domain theory in HOL. In: Joyce, J.J., Seger, C.-J.H. (eds.) HUG
1993. LNCS, vol. 780. Springer, Heidelberg (1994)
4
Agerholm [3] formalized the construction of a model of the untyped lambda calculus
using HOL-ST, a version of HOL that supports ZF-like set theory; this is elegant
but HOL-ST is not widely used and no denotational semantics seems to have been
done with the model. Petersen [21] formalized a reflexive cpo based on P ω in HOL,
though this also appears not to have been developed far enough to be useful.
130 N. Benton, A. Kennedy, and C. Varming

3. Agerholm, S.: Formalizing a model of the lambda calculus in HOL-ST. Technical

Report 354, University of Cambridge Computer Laboratory (1994)
4. Agerholm, S.: LCF examples in HOL. The Computer Journal 38(2) (1995)
5. Altenkirch, T., Reus, B.: Monadic presentations of lambda terms using generalized
inductive types. In: Flum, J., Rodríguez-Artalejo, M. (eds.) CSL 1999. LNCS,
vol. 1683, pp. 453–468. Springer, Heidelberg (1999)
6. Audebaud, P., Paulin-Mohring, C.: Proofs of randomized algorithms in Coq. In:
Uustalu, T. (ed.) MPC 2006. LNCS, vol. 4014, pp. 49–68. Springer, Heidelberg
(2006)
7. Bartels, F., Dold, A., Pfeifer, H., Von Henke, F.W., Rueß, H.: Formalizing fixed-
point theory in PVS. Technical report, Universität Ulm (1996)
8. Benton, N., Hur, C.-K.: Biorthogonality, step-indexing and compiler correctness.
In: ACM International Conference on Functional Programming (2009)
9. Capretta, V.: General recursion via coinductive types. Logical Methods in Com-
puter Science 1 (2005)
10. Coquand, T.: Infinite objects in type theory. In: Barendregt, H., Nipkow, T. (eds.)
TYPES 1993, vol. 806. Springer, Heidelberg (1994)
11. Freyd, P.: Recursive types reduced to inductive types. In: IEEE Symposium on
Logic in Computer Science (1990)
12. Freyd, P.: Remarks on algebraically compact categories. In: Applications of Cate-
gories in Computer Science. LMS Lecture Notes, vol. 177 (1992)
13. Joyal, A., Street, R.: The geometry of tensor calculus. Adv. in Math. 88 (1991)
14. Kahn, G.: Elements of domain theory. In: The Coq users’ contributions library
(1993)
15. McBride, C.: Type-preserving renaming and substitution (unpublished draft)
16. Milner, R.: Logic for computable functions: Description of a machine implementa-
tion. Technical Report STAN-CS-72-288, Stanford University (1972)
17. Moggi, E.: Notions of computation and monads. Inf. Comput. 93(1), 55–92 (1991)
18. Müller, O., Nipkow, T., von Oheimb, D., Slotosch, O.: HOLCF = HOL + LCF. J.
Functional Programming 9 (1999)
19. Nipkow, T.: Winskel is (almost) right: Towards a mechanized semantics textbook.
Formal Aspects of Computing 10 (1998)
20. Paulin-Mohring, C.: A constructive denotational semantics for Kahn networks in
Coq. In: From Semantics to Computer Science. Essays in Honour of G Kahn (2009)
21. Petersen, K.D.: Graph model of LAMBDA in higher order logic. In: Joyce, J.J.,
Seger, C.-J.H. (eds.) HUG 1993. LNCS, vol. 780. Springer, Heidelberg (1994)
22. Pitts, A.M.: Computational adequacy via ‘mixed’ inductive definitions. In: Main,
M.G., Melton, A.C., Mislove, M.W., Schmidt, D., Brookes, S.D. (eds.) MFPS 1993.
LNCS, vol. 802. Springer, Heidelberg (1994)
23. Pitts, A.M.: Relational properties of domains. Inf. Comput. 127 (1996)
24. Regensburger, F.: HOLCF: Higher order logic of computable functions. In: Schu-
bert, E.T., Alves-Foss, J., Windley, P. (eds.) HUG 1995. LNCS, vol. 971. Springer,
Heidelberg (1995)
25. Reus, B.: Formalizing a variant of synthetic domain theory. J. Automated Reason-
ing 23 (1999)
26. Varming, C., Birkedal, L.: Higher-order separation logic in Isabelle/HOLCF. In:
Mathematical Foundations of Programming Semantics (2008)
Turning Inductive into Equational Speciﬁcations

Stefan Berghofer , Lukas Bulwahn, and Florian Haftmann

Technische Universität München

Institut für Informatik, Boltzmannstraße 3, 85748 Garching, Germany
http://www.in.tum.de/~ berghofe/
http://www.in.tum.de/~ bulwahn/
http://www.in.tum.de/~ haftmann/

Abstract. Inductively deﬁned predicates are frequently used in formal

specifications. Using the theorem prover Isabelle, we describe an ap-
proach to turn a class of systems of inductively defined predicates into
a system of equations using data flow analysis; the translation is carried
out inside the logic and resulting equations can be turned into functional
program code in SML, OCaml or Haskell using the existing code gener-
ator of Isabelle. Thus we extend the scope of code generation in Isabelle
from functional to functional-logic programs while leaving the trusted
foundations of code generation itself intact.

1 Introduction
Inductively defined predicates (for short, (inductive) predicates) are a popu-
lar specification device in the theorem proving community. Major theory devel-
opments in the proof assistant Isabelle/HOL [8] make pervasive use of them,
e.g. formal semantics of realistic programming language fragments [11]. From
such large applications naturally the desire arises to generate executable proto-
types from the abstract specifications. It is well-known how systems of predicates
can be transformed to functional programs using mode analysis. The approach
described in [1] for Isabelle/HOL works but has turned out unsatisfactorily:

– The applied transformations are not trivial but are carried out outside the
LCF inference kernel, thus relying on a large code base to be trusted.
– Recently a lot of code generation facilities in Isabelle/HOL have been gen-
eralized to cover type classes and more languages than ML, but this has not
yet been undertaken for predicates.

In our view it is high time to tackle execution of predicates again; we present

a transformation from predicates to function-like equations that is not a mere
re-implementation, but brings substantial improvements:

– The transformation is carried out inside the logic; thus the transformation
is guarded by LCF inferences and does not increase the trusted code base.

Supported by BMBF in the VerisoftXT project under grant 01 IS 07008 F.

Supported by DFG project NI 491/10-1.

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 131–146, 2009.

c Springer-Verlag Berlin Heidelberg 2009
132 S. Berghofer, L. Bulwahn, and F. Haftmann

– The code generator itself can be fed with the function-like equations and
does not need to be extended; also other tools involving equational reasoning
could beneﬁt from the transformation.
– Proposed extensions can also work inside the logic and do not endanger
trustability.
The role of our transformation in this scenario is shown in the following picture:

inductive package inductive predicates transformer

executable code code generator equational theorems

The remainder of this paper is structured as follows: we brieﬂy review existing

work in §2 and explain the preliminaries in §3. The main section (§4) explains
how the translation works, followed by a discussion of further extensions (§5).
Our conclusion (§6) will deal with future work.
In our presentation we use fairly standard notation, plus little Isabelle/HOL-
speciﬁc concrete syntax.

2 Related Work
From the technical point of view, the execution of predicates has been extensively
studied in the context of the programming languages Curry [4] and Mercury [10].
The central concept for executing predicates are modes, which describe dataﬂow
by partitioning arguments into input and output.
We already mentioned the state-of-the-art implementation of code generation
for predicates in Isabelle/HOL [1] which turns inductive predicates into ML
programs extralogically using mode analysis.
Delahaye et al. provide a similar direct extraction for the Coq proof assis-
tant [2]; however at most one solution is computed, multiple solutions are not
enumerated.
For each of these approaches, correctness is ensured by pen-and-paper proofs.
Our approach instead animates the correctness proof by applying it to each
single predicate using the proof assistant itself; thus correctness is guaranteed
by construction.

3 Preliminaries
3.1 Inductive Predicates
An inductive predicate is characterized by a collection of introduction rules (or
clauses), each of which has a conclusion and an arbitrary number of premises.
It corresponds to the smallest set closed under these clauses. As an example,
consider the following predicate describing the concatenation of two lists, which
can be deﬁned in in Isabelle/HOL using the inductive command:
Turning Inductive into Equational Speciﬁcations 133

inductive append :: α list ⇒ α list ⇒ α list ⇒ bool where

append [] ys ys
| append xs ys zs =⇒ append (x · xs) ys (x · zs)

For each predicate, an elimination (or case analysis) rule is provided, which for
append has the form

append
Xs Ys Zs =⇒
(ys. Xs = [] =⇒ Ys = ys =⇒ Zs = ys =⇒ P ) =⇒
( xs ys zs x .
Xs = x · xs =⇒
Ys = ys =⇒ Zs = x · zs =⇒ append xs ys zs =⇒ P ) =⇒
P

There is also an induction rule, which however is not relevant in our scenario.
In introduction rules, we distinguish between premises of the form Q u1 . . . uk ,
where Q is an inductive predicate, and premises of other shapes, which we call
side conditions. Without loss of generality, we only consider clauses without side
conditions in most parts of our presentation. The general form of a clause is

Ci : Qi,1 ui,1 =⇒ · · · =⇒ Qi,ni ui,ni =⇒ P ti

We use ki,j and l to denote the arities of the predicates Qi,j and P , i.e. the
length of the argument lists ui,j and ti , respectively.

3.2 Code Generation

The Isabelle code generator views generated programs as an implementation of

an equational rewrite system, e.g. the following program normalizes a list of
natural numbers to its sum by equational rewriting:

sum [Suc Zero_nat, Suc Zero_nat]

datatype nat = Zero_nat | Suc of nat;

fun plus_nat (Suc m) n = plus_nat m (Suc n)
| plus_nat Zero_nat n = n;
fun sum [] = Zero_nat
| sum (m :: ms) = plus_nat m (sum ms);

Suc (Suc Zero_nat)

The code generator turns a set of equational theorems into a program inducing
the same equational rewrite system. This means that any sequence of reduction
steps the generated program performs on a term can be simulated in the logic:
134 S. Berghofer, L. Bulwahn, and F. Haftmann

equational theorems t v ... u

code generation

program with same equational semantics t v ... u

This guarantees partial correctness [3]. As a further consequence only program

statements which contribute to a program’s equational semantics (e.g. fun in
ML) are correctness-critical, whereas others are not. For example, the construc-
tors of a datatype in ML need only meet the syntactic characteristics of a
datatype, but not the usual logical properties of a HOL datatype such as injec-
tivity. This gives us some freedom in choosing datatype constructors which we
will employ in §4.2.

4 Transforming Clauses to Equations

4.1 Mode Analysis

In order to execute a predicate P , its arguments are classiﬁed as input or output.

For example, all three arguments of append could be input, meaning that the
predicate just checks whether the third list is the concatenation of the first and
second list. Another possibility would be to consider only the first and second
argument as input, while the third one is output. In this case, the predicate
actually computes the concatenation of the two input lists. Yet another way of
using append would be to consider the third argument as input, while the first
two arguments are output. This means that the predicate enumerates all possible
ways of splitting the input list into two parts. This notion of dataflow is made
explicit by means of modes [6].

Modes. For a predicate P with k arguments, we denote a particular dataﬂow

assignment by a mode which is a set M ⊆ {1, . . . , k} such that M is exactly
the set of all parameter position numbers denoting input parameters. A mode
assignment for a given clause

Qi,1 ui,1 =⇒ · · · =⇒ Qi,ni ui,ni =⇒ P ti

is a list of modes M, Mi,1 , . . . Mi,ni for the predicates P, Qi,1 , . . . , Qi,ni , where
1 ≤ i ≤ m, M ⊆ {1, . . . , l} and Qi,j ⊆ {1, . . . , ki,j }. Let FV (t) denote the set
of free variables in a term t. Given a vector of arguments t and a mode M , the
projection expression tM denotes the list of all arguments in t (in the order of
their occurrence) whose index is in M .
Turning Inductive into Equational Speciﬁcations 135

Mode consistency. Given a clause

Qi,1 ui,1 =⇒ · · · =⇒ Qi,ni ui,ni =⇒ P ti

a corresponding mode assignment M, Mi,1 , . . . Mi,ni is consistent if there exists

a chain of sets v0 ⊆ · · · ⊆ vn of variables generated by

1. v0 = FV (ti M )
2. vj = vj−1 ∪ FV (ui,j )

such that

3. FV (ui,j Mi,j ) ⊆ vj−1

4. FV (ti ) ⊆ vn

Consistency models the possibility of a sequential evaluation of premises in a

given order, where vj represents the known variables after the evaluation of the
j-th premise:

1. initially, all variables in input arguments of P are known

2. after evaluation of the j-th premise, the set of known variables is extended
by all variables in the arguments of Qi,j ,
3. when evaluating the j-th premise, all variables in the arguments of Qi,j have
to be known,
4. ﬁnally, all variables in the arguments of P must be contained in the set of
known variables.

Without loss of generality we can examine clauses under mode inference modulo
reordering of premises. For side conditions R, condition 3 has to be replaced by
FV (R) ⊆ vj−1 , i.e. all variables in R must be known when evaluating it. This
deﬁnition yields a check whether a given clause is consistent with a particular
mode assignment.

4.2 Enumerating Output Arguments of Predicates

A predicate of type α ⇒ bool is isomorphic to a set over type α; executing
inductive predicates means to enumerate elements of the corresponding set. For
this purpose we use an abstract algebra of primitive operations on such predicate
enumerations. To establish an abstraction, we ﬁrst deﬁne an explicit type to
represent predicates:

datatype α pred = pred (α ⇒ bool )

with a projection operator eval :: α pred ⇒ α ⇒ bool satisfying

eval (pred f ) = f

We provide four further abstract operations on α pred :

136 S. Berghofer, L. Bulwahn, and F. Haftmann

– ⊥ :: α pred is the empty enumeration.

– single :: α ⇒ α pred is the singleton enumeration.
– (>>=) :: α pred ⇒ (α ⇒ β pred ) ⇒ β pred applies a function to every
element of an enumeration which itself returns an enumeration and flattens
all resulting enumerations.
– (%) :: α pred ⇒ α pred ⇒ α pred forms the union of two enumerations.
These abstract operations, which form a plus monad, are used to build up the
code equations of predicates (§4.3). Table 1 contains their definitions and relates
them to their counterparts on sets. In order to equip these abstract operations
with an executable model, we introduce an auxiliary datatype:
datatype α seq = Empty | Insert α (α pred ) | Union (α pred list )
Values of type α seq are embedded into type α pred by defining:
Seq :: (unit ⇒ α seq) ⇒ α pred
Seq f =
(case f () of Empty ⇒ ⊥ | Insert x xq ⇒ single x % xq
| Union xqs ⇒ ◦ xqs)
where ◦ :: α pred list ⇒ α pred flattens a list of predicates into one predicate.
Seq will serve as datatype constructor for type α pred ; on top of this, we prove
the following code equations for our α pred algebra:
⊥ = Seq (λu. Empty)
single x = Seq (λu. Insert x ⊥)
Seq g >
>= f =
Seq (λu. case g () of Empty ⇒ Empty
| Insert x xq ⇒ Union [f x , xq >
>= f ]
| Union xqs ⇒ Union (map (λx . x > >= f ) xqs))
Seq f % Seq g =
Seq (λu. case f () of Empty ⇒ g ()
| Insert x xq ⇒ Insert x (xq % Seq g)
| Union xqs ⇒ Union (xqs @ [Seq g]))

Table 1. Abstract operations for predicate enumerations

eval eval (pred P ) x ←→ P x x ∈P

⊥ ⊥ = pred (λx . False) {}
single x single x = pred (λy. y = x ) {x }

P >>= f P >>= f = pred (λx . ∃ y. eval P y ∧ eval (f y) x ) f‘P
P Q P Q = pred (λx . eval P x ∨ eval Q x ) P ∪Q

Here (‘ ) :: (α ⇒ β) ⇒ (α ⇒ bool ) ⇒ β ⇒ bool is the

image operator on sets satisfying f ‘ A = {y. ∃ x ∈A. y = f x }.
Turning Inductive into Equational Speciﬁcations 137

For membership tests we deﬁne a further auxiliary constant:

member :: α seq ⇒ α ⇒ bool

member Empty x ←→ False
member (Insert y yq) x ←→ x = y ∨ eval yq x
member (Union xqs) x ←→ list-ex (λxq. eval xq x ) xqs

where list-ex :: (α ⇒ bool ) ⇒ α list ⇒ bool is existential quantiﬁcation on lists,

and use it to prove the code equation

eval (Seq f ) = member (f ())

From the point of view of the logic, this characterization of the α pred algebra
in terms of unit abstractions might seem odd; their purpose comes to surface
when translating these equations to executable code, e.g. in ML:
datatype ’a pred = Seq of (unit -> ’a seq)
and ’a seq = Empty | Insert of ’a * ’a pred | Union of ’a pred list;
val bot_pred : ’a pred = Seq (fn u => Empty)
fun single x = Seq (fn u => Insert (x, bot_pred));
fun bind (Seq g) f =
Seq (fn u =>
(case g () of Empty => Empty
| Insert (x, xq) => Union [f x, bind xq f]
| Union xqs => Union (map (fn x => bind x f) xqs)));
fun sup_pred (Seq f) (Seq g) =
Seq (fn u =>
(case f () of Empty => g ()
| Insert (x, xq) => Insert (x, sup_pred xq (Seq g))
| Union xqs => Union (append xqs [Seq g])));
fun eval A_ (Seq f) = member A_ (f ())
and member A_ Empty x = false
| member A_ (Insert (y, yq)) x = eq A_ x y orelse eval A_ yq x
| member A_ (Union xqs) x = list_ex (fn xq => eval A_ xq x) xqs;

In the function deﬁnitions for eval and member, the expression A_ is the dictio-
nary for the eq class allowing for explicit equality checks using the overloaded
constant eq.
In shape this follows a well-known ML technique for lazy lists: each inspection
of a lazy list by means of an application f () is protected by a constructor Seq.
Thus we enforce a lazy evaluation strategy for predicate enumerations even for
eager languages.

4.3 Compilation Scheme for Clauses

The central idea underlying the compilation of a predicate P is to generate a
function P M for each mode M of P that, given a list of input arguments, enu-
merates all tuples of output arguments. The clauses of an inductive predicate
can be viewed as a logic program. However, in contrast to logic programming
138 S. Berghofer, L. Bulwahn, and F. Haftmann

languages like Prolog, the execution of the functional program generated from
the clauses uses pattern matching instead of uniﬁcation. A precondition for the
applicability of pattern matching is that the input arguments in the conclusions
of the clauses, as well as the output arguments in the premises of the clauses are
built up using only datatype constructors and variables. In the following descrip-
tion of the translation scheme, we will treat the pattern matching mechanism as
a black box. However, our implementation uses a pattern translation algorithm
due to Slind [9, §3.3], which closely resembles the techniques used in compilers
for functional programming languages. The following notation will be used in
our description of the translation mechanism:

x = x1 . . . xl (x) = (x1 , . . . , xl )
τ = τ1 . . . τl τ ⇒ σ = τ1 ⇒ · · · ⇒ τl ⇒ σ
τ = τ1 × · · · × τl M − = {1, . . . , l}\M

Let P :: τ ⇒ bool be a predicate and M, Mi,1 , . . . Mi,ni be a consistent mode

assignment for the clauses Ci of P . The function P M corresponding to mode M
of P is deﬁned as follows:

P M :: τ M ⇒ ( τ M − ) pred
P M xM ≡ pred (λ(xM − ). P x)

Given the input arguments xM :: τ M , function P M returns a set of tuples

of output arguments (xM − ) for which P x holds. For modes {1, 2} and {3} of
the introductory append example, the corresponding deﬁnitions are as follows:

append{1,2} :: α list ⇒ α list ⇒ α list pred

append{1,2} xs ys = pred (λzs. append xs ys zs)
append{3} :: α list ⇒ (α list × α list) pred
append{3} zs = pred (λ(xs, ys). append xs ys zs)

The recursion equation for P M can be obtained from the clauses characterizing
P in a canonical way:

P M xM = C1 xM % · · · % Cm xM

Intuitively, this means that the set of output values generated by P M is the
union of the output values generated by the clauses Ci . In order for pattern
matching to work, all patterns occurring in the program must be linear, i.e.
no variable may occur more than once. This can be achieved by renaming the
free variables occurring in the terms ti , ui,1 , . . ., ui,ni , and by adding suitable

equality checks to the generated program. Let ti , ui,1 , . . ., ui,ni denote these
linear terms obtained by renaming the aforementioned ones, and let θi = {yi →
z i }, θi,1 = {yi,1 → z i,1 }, . . ., θi,ni = {yi,ni → z i,ni } be substitutions such that

θi (ti ) = ti , θi,1 (ui,1 ) = ui,1 , . . ., θi,ni (ui,ni ) = ui,ni , and (dom(θi ) ∪ dom(θi,1 ) ∪
Turning Inductive into Equational Speciﬁcations 139

· · · ∪ dom(θi,ni )) ∩ FV (Ci ) = ∅. The expressions Ci corresponding to the clauses

can then be deﬁned by
Ci xM ≡
single (xM ) > >= (λa0 . case a0 of

(ti M ) ⇒ if y i = z i then ⊥ else
M
Qi,1i,1 (ui,1 Mi,1 ) >
>= (λa1 . case a1 of
−
(ui,1 Mi,1 ) ⇒ if y i,1 = z i,1 then ⊥ else
..
.
Mi,n
Qi,ni i (ui,ni Mi,ni ) >
>= (λani . case ani of
−
(ui,ni Mi,ni
) ⇒ if y i,ni = z i,ni then ⊥ else
single (ti M − )
| ⇒ ⊥)
| ⇒ ⊥)
| ⇒ ⊥)
− −
Here, Mi,1 = {1, . . . , ki,1 }\Mi,1 , . . ., Mi,ni
= {1, . . . , ki,ni }\Mi,ni denote the sets
of indices of output arguments corresponding to the respective modes. As an
example, we give the recursive equations for append on modes {1, 2} and {3}:

append{1,2} xs ys =
single (xs, ys) >>= (λa. case a of
([], zs) ⇒ single zs
| (z · zs, ws) ⇒ ⊥) %
single (xs, ys) >>= (λb. case b of
([], zs) ⇒ ⊥
| (z · zs, ws) ⇒ append{1,2} zs ws >
>= (λvs. single (z · vs)))

append{3} xs =
single xs >>= (λys. single ([], ys)) %
single xs >>= (λa. case a of
[] ⇒ ⊥
| z · zs ⇒ append{3} zs > >= (λb. case b of
(ws, vs) ⇒ single (z · ws, vs)))

Side conditions can be embedded into this translation scheme using the function
if-pred :: α
ifpred b = (if b then single () else ⊥)
that maps False and True to the empty sequence and the singleton sequence
containing only the unit element, respectively.

4.4 Proof of Recursion Equations

We will now describe how to prove the recursion equation for P M given in the
previous section using the deﬁnition of P M , as well as the introduction and
140 S. Berghofer, L. Bulwahn, and F. Haftmann

elimination rules for P . We will also need introduction and elimination rules for
the operators on type pred , which we show in Table 2. From the deﬁnition of
P M , we can easily derive the introduction rule
P x =⇒ eval (P M xM ) (xM − )
and the elimination rule
eval (P M xM ) (xM − ) =⇒ P x
By extensionality (rule =I ), proving
P M xM = C1 xM % · · · % Cm xM
amounts to showing that

(1) x. eval (P M xM ) x =⇒ eval (C1 xM % · · · % Cm xM ) x
(2) x. eval (C1 xM % · · · % Cm xM ) x =⇒ eval (P M xM ) x

where x :: τ M − . The variable x can be expanded to a tuple of variables:

(1) xM − . eval (P M xM ) (xM − ) =⇒
eval (C1 xM % · · · % Cm xM ) (xM − )
(2) xM . eval (C1 xM % · · · % Cm xM ) (xM − ) =⇒
−

eval (P M xM ) (xM − )

Proof of (1). From eval (P M xM ) (xM − ), we get P x using the elimination
rule for P M . Applying the elimination rule for P
P x =⇒ E1 x =⇒ · · · =⇒ Em x =⇒ R

Ei x ≡ bi . x = ti =⇒ Qi,1 ui,1 =⇒ · · · =⇒ Qi,ni ui,ni =⇒ R
yields m proof obligations, each of which corresponds to an introduction rule.
Note that bi consists of the free variables of ui,j and ti . For the ith introduction

Table 2. Introduction and elimination rules for operators on pred

⊥E eval ⊥ x =⇒ R
single I eval (single x ) x
single E eval (single x ) y =⇒ (y = x =⇒ R) =⇒ R
>>=I eval P x =⇒ eval (Q x )y =⇒ eval (P > >= Q) y
>>=E eval (P > >= Q) y =⇒ ( x . eval P x =⇒ eval (Q x ) y =⇒ R) =⇒ R
I1 eval A x =⇒ eval (A B ) x
I2 eval B x =⇒ eval (A B ) x
E eval (A B ) x =⇒ (eval A x =⇒ R) =⇒ (eval B x =⇒ R) =⇒ R
ifpred I P =⇒ eval (ifpred P ) ()
ifpred E eval (ifpred b) x =⇒ (b =⇒ x = () =⇒ R) =⇒ R

=I ( x . eval A x =⇒ eval B x ) =⇒ ( x . eval B x =⇒ eval A x ) =⇒ A = B
Turning Inductive into Equational Speciﬁcations 141

rule, we have to prove eval (C1 xM % · · · % Cm xM ) (xM − ) from the as-
sumptions x = ti and Qi,1 ui,1 , . . ., Qi,ni ui,ni . By applying the rules %I1 and
%I2 in a suitable order, we select the Ci corresponding to the ith introduction
rule, which leaves us with the proof obligation eval (Ci ti M ) (ti M − ). By the
definition of Ci and the rule >>=I , this gives rise to the two proof obligations
(1.i) eval (single (ti M )) (ti M )
(1.ii) eval (case ti M of

(ti M ) ⇒ if yi = z i then ⊥ else
M
Qi,1i,1 (ui,1 Mi,1 ) >
>= (λa1 . case a1 of . . .)
| ⇒ ⊥) ti M −

Goal (1.i) is easily proved using single I . Concerning goal (1.ii), note that (ti M )
matches (ti M ), so we have to consider the first branch of the case expression.

Due to the definition of ti , we also know that yi = z i , which means that we have
to consider the else branch of the if clause. This leads to the new goal
>= (λa1 . case a1 of . . .)) ti M −
M
eval (Qi,1i,1 (ui,1 Mi,1 ) >
>=I , can be split up into the two goals
that, by applying rule >
M −
(1.iii) eval (Qi,1i,1 (ui,1 Mi,1 )) (ui,1 Mi,1 )
−
(1.iv) eval (case ui,1 Mi,1 of
−
(ui,1 Mi,1 ) ⇒ if y i,1 = z i,1 then ⊥ else . . .
| ⇒ ⊥) ti M −
Goal (1.iii) follows from the assumption Qi,1 ui,1 using the introduction rule for
M
Qi,1i,1 , while goal (1.iv) can be solved in a similar way as goal (1.ii). Repeating
M Mi,ni
this proof scheme for Qi,2i,2 , . . ., Qi,ni finally leads us to a goal of the form
eval (single (ti M − )) (ti M − )
which is trivially solvable using single I .

Proof of (2). The proof of this direction is dual to the previous one: rather
than splitting up the conclusion into simpler formulae, we now perform for-
ward inferences that transform complex premises into simpler ones. Eliminating
eval (C1 xM % · · · % Cm xM ) (xM − ) using rule %E leaves us with m proof
obligations of the form
eval (Ci xM ) (xM − ) =⇒ eval (P M xM ) (xM − )
By unfolding the deﬁnition of Ci and applying rule >
>=E to the premise of the
above implication, we obtain a0 such that
(2.i) eval (single (xM )) a0
(2.ii) eval (case a0 of

(ti M ) ⇒ if yi = z i then ⊥ else
M
Qi,1i,1 (ui,1 Mi,1 ) >
>= (λa1 . case a1 of . . .)
| ⇒ ⊥) xM −
142 S. Berghofer, L. Bulwahn, and F. Haftmann

From (2.i), we get xM = a0 by rule single E . Since a0 must be an element of

a datatype, we can analyze its shape by applying suitable case splitting rules.
Of the generated cases only one case is non-trivial. In the trivial cases, a0 does

not match (ti M ), so the case expression evaluates to ⊥, and the goal can be

solved using ⊥E . In the non-trivial case, we have that a0 = (ti M ). Splitting
up the if expression yields two cases. In the then case, the whole expression
evaluates to ⊥, so the goal is again provable using ⊥E . In the else branch, we

have that y i = z i , and hence a0 = (ti M ) by definition of ti , which also implies
xM = ti M . Assumption (2.ii) can thus be rewritten to
>= (λa1 . case a1 of . . .)) xM −
M
eval (Qi,1i,1 (ui,1 Mi,1 ) >
>=E , we obtain a1 such that
By another application of >
M
(2.iii) eval (Qi,1i,1 (ui,1 Mi,1 )) a1
(2.iv) eval (case a1 of
−
(ui,1 Mi,1 ) ⇒ if y i,1 = z i,1 then ⊥ else . . .
| ⇒ ⊥) xM −
The assumption (2.iv) is treated in a similar way as (2.ii). A case analysis over
−
a1 reveals that the only non-trivial case is the one where a1 = (ui,1 Mi,1 ). The
only non-trivial branch of the if expression is the else branch, where y i,1 = z i,1 .
−
Hence, by definition of ui,1 , it follows that a1 = (ui,1 Mi,1 ), which entitles us to
M −
rewrite (2.iii) to eval (Qi,1i,1 (ui,1 Mi,1 )) (ui,1 Mi,1 ), from which we can deduce
M
Qi,1 ui,1 by applying the elimination rule for Qi,1i,1 . By repeating this kind of
M Mi,n
reasoning for Qi,2i,2 , . . ., Qi,ni i , we also obtain that Qi,2 ui,2 , . . ., Qi,ni ui,ni
holds. Furthermore, after the complete decomposition of (2.iv), we end up with
an assumption of the form
eval (single (ti M − )) (xM − )
from which we can deduce ti M − = xM − by an application of single E . Thus,
using the equations gained from (2.i) and (2.ii), the conclusion of the implication
we set out to prove can be rephrased as
eval (P M ti M ) (ti M − )
Thanks to the introduction rule for P M , it suffices to prove P ti , which can
easily be done using the introduction rule
Qi,1 ui,1 =⇒ · · · =⇒ Qi,ni ui,ni =⇒ P ti
together with the previous results.

4.5 Animating Equations

We have shown in detail how to derive executable equations from the specifica-
tion of a predicate P for a consistent mode M . The results are always enumer-
ations of type α pred. We discuss briefly how to get access to the enumerated
values of type α proper.
Turning Inductive into Equational Specifications 143

Membership tests. The type constructor pred can be stripped using explicit
membership tests. For example, we could deﬁne a suﬃx predicate using append :

is-suﬃx zs ys ←→ (∃ xs. append xs ys zs)

Using the deﬁnition of append{2,3} this can be reformulated as

is-suﬃx zs ys ←→ (∃ xs. eval (append{2,3} ys zs) xs)

from which follows

is-suﬃx zs ys ←→ eval (append{2,3} ys zs >

>= (λ-. single ())) ()

using introduction and elimination rules for op = and single. This equation
then is directly executable.

Enumeration queries. When developing inductive speciﬁcations it is often

desirable to check early whether the specification behaves as expected by enu-
merating one or more solutions which satisfy the specification. In our framework
this cannot be expressed inside the logic: values of type α pred are set-like,
whereas each concrete enumeration imposes a certain order on elements which
is not reflected in the logic. However it can be done directly on the generated
code, e.g. in ML using
fun nexts [] = NONE
| nexts (xq :: xqs) = case next xq
of NONE => nexts xqs
| SOME (x, xq) => SOME (x, Seq (fn () => Union (xq :: xqs)))
and next (Seq f) = case f ()
of Empty => NONE
| Insert (x, xq) => SOME (x, xq)
| Union xqs => nexts xqs;

Wrapped up in a suitable user interface this allows to interactively enumerate

solutions ﬁtting to inductive predicates.

5 Extensions to the Base Framework

5.1 Higher-Order Modes
A useful extension of the framework presented in §4.3 is to allow inductive pred-
icates that take other predicates as arguments. A standard example for such a
predicate is the reﬂexive transitive closure taking a predicate of type α ⇒ α ⇒
bool as an argument, and returning a predicate of the same type:

inductive rtc :: (α ⇒ α ⇒ bool ) ⇒ α ⇒ α ⇒ bool

for r :: α ⇒ α ⇒ bool where
rtc r x x
| r x y =⇒ rtc r y z =⇒ rtc r x z
144 S. Berghofer, L. Bulwahn, and F. Haftmann

In addition to its two arguments of type α, rtc also has a parameter r that stays
fixed throughout the definition. The general form of a mode for a higher-order
predicate P with k arguments and parameters r1 , . . . , rρ with arities k1 , . . . , kρ
is (M1 , . . . , Mρ , M ), where Mi ⊆ {1, . . . , ki } (for 1 ≤ i ≤ ρ) and M ⊆ {1, . . . , k}.
Intuitively, this mode means that P r1 · · · rρ has mode M , provided that ri
has mode Mi . The possible modes for rtc are ({}, {1}), ({}, {2}), ({}, {1, 2}),
({1}, {1}), ({2}, {2}), ({1}, {1, 2}), and ({2}, {1, 2}). The general definition of
the function corresponding to the mode (M1 , . . . , Mρ , M ) of a predicate P is

P (M1 ,...,Mρ ,M) ::

(τ 1 M1 ⇒ ( τ 1 M1− ) pred ) ⇒ · ·· ⇒
(τ ρ Mρ ⇒ ( τ ρ Mρ− ) pred ) ⇒ ( τ M − ) pred
P (M1 ,...,Mρ ,M) s1 . . . sρ xM ≡ pred (λ(xM − ). P
(λx1 . eval (s1 x1 M1 ) (x1 M1− )) . . .
(λxρ . eval (s1 xρ Mρ ) (xρ Mρ− )) x)
Since P expects predicates as parameters, but si are functions returning sets,
these have to be converted back to predicates using eval before passing them to
P . For rtc, the deﬁnitions of the functions corresponding to the modes ({1}, {1})
and ({2}, {2}) are
rtc({1},{1}) :: (α ⇒ α pred ) ⇒ α ⇒ α pred
rtc({1},{1}) s x ≡ pred (λy. rtc (λx y . eval (s x ) y ) x y)
rtc({2},{2}) :: (α ⇒ α pred ) ⇒ α ⇒ α pred
rtc({2},{2}) s y ≡ pred (λx . rtc (λx y . eval (s y ) x ) x y)
The corresponding recursion equations have the form
rtc({1},{1}) r x =
single x > >= (λx . single x ) %
single x > >= (λx . r x >>= (λy. rtc({1},{1}) r y >
>= (λz . single z )))

rtc({2},{2}) r y =
single y > >= (λx . single x ) %
single y > >= (λz . rtc({2},{2}) r z >
>= (λy. r y >
>= (λx . single x )))

5.2 Mixing Predicates and Functions

When mixing predicates and functions, mode analysis treats functions as predi-
cates where all arguments are input. This can restrict the number of consistent
mode assignments considerably.
The following mutually inductive predicates model a grammar generating all
words containing equally many as and bs. This example, which is originally due
to Hopcroft and Ullman, can be found in the Isabelle tutorial by Nipkow [8].

inductive
S :: alfa list ⇒ bool and
A :: alfa list ⇒ bool and B :: alfa list ⇒ bool
Turning Inductive into Equational Speciﬁcations 145

By choosing mode {} for the above predicates (i.e. their arguments are all out-
put), we can enumerate all elements of the set S containing equally many as and
bs. However, the above predicates cannot easily be used with mode {1}, i.e. for
checking whether a given word is generated by the grammar. This is because of
the rules with the conclusions A (b · v @ w ) and B (a · v @ w ). Since the append
function (denoted by @) is not a constructor, we cannot do pattern matching
on the argument. However, the problematic rules can be rephrased as
append v w vw =⇒ A v =⇒ A w =⇒ A (b · vw )
append v w vw =⇒ B v =⇒ B w =⇒ B (a · vw )
The problematic expression v @ w in the conclusion has been replaced by a new
variable vw. The fact that vw is the result of appending the two lists v and w is
now expressed using the append predicate from §3. In order to check whether a
given word can be generated using these rules, append ﬁrst enumerates all ways
of decomposing the given list vw into two sublists v and w, and then recursively
checks whether these words can be generated by the grammar.

6 Conclusion and Future Work

We have presented a deﬁnitional translation for inductive predicates to equa-
tions which can be turned into executable code using existing code generation
infrastructure in Isabelle/HOL. This is a fundamental contribution to extend the
scope of code generation from functional to functional-logic programs embedded
into Isabelle/HOL without compromising the trusted implementation of the code
generator itself. We have applied our translation to two larger case studies, the
μJava semantics by Nipkow, von Oheimb and Pusch [7] and the ς-calculus by
Henrio and Kammüller [5], resulting in simple interpreters for these two pro-
gramming languages. Further experiments suggest the following extensions:

– Successful mode inference does not guarantee termination. Like in Prolog,

the order of premises in introduction rules can inﬂuence termination. Using
termination analysis built-in in Isabelle/HOL, we can guess which modes
lead to terminating functions. The mode analysis can use this and prefer
terminating modes over possibly non-terminating ones.
– Rephrasing recursive functions to inductive predicates, as we apply it in
§5.2, possibly results in more modes for the mode analysis. But applying the
146 S. Berghofer, L. Bulwahn, and F. Haftmann

transformation blindly could lead to unnecessarily complicated equations.

The mode analysis should be extended to infer modes using the transforma-
tion only when required.
– The executable model for enumerations we have presented is sometimes inap-
propriate: it performs depth-first search which can lead to a non-terminating
search in an irrelevant but infinite branch of the search tree. It has to be fig-
ured out how alternative search strategies (e.g. iterative depth-first search)
can provide a solution for this.

We plan to integrate our procedure into the next Isabelle release.

References
1. Berghofer, S., Nipkow, T.: Executing higher order logic. In: Callaghan, P., Luo, Z.,
McKinna, J., Pollack, R. (eds.) TYPES 2000. LNCS, vol. 2277, p. 24. Springer,
Heidelberg (2002)
2. Delahaye, D., Dubois, C., Étienne, J.F.: Extracting purely functional contents from
logical inductive types. In: Schneider, K., Brandt, J. (eds.) TPHOLs 2007. LNCS,
vol. 4732, pp. 70–85. Springer, Heidelberg (2007)
3. Haftmann, F., Nipkow, T.: A code generator framework for Isabelle/HOL. Tech.
Rep. 364/07, Department of Computer Science, University of Kaiserslautern (2007)
4. Hanus, M.: A unified computation model for functional and logic programming.
In: Proc. 24th ACM Symposium on Principles of Programming Languages (POPL
1997), pp. 80–93 (1997)
5. Henrio, L., Kammüller, F.: A mechanized model of the theory of objects. In: Bon-
sangue, M.M., Johnsen, E.B. (eds.) FMOODS 2007. LNCS, vol. 4468, pp. 190–205.
Springer, Heidelberg (2007)
6. Mellish, C.S.: The automatic generation of mode declarations for prolog programs.
Tech. Rep. 163, Department of Artificial Intelligence (1981)
7. Nipkow, T., von Oheimb, D., Pusch, C.: μJava: Embedding a programming lan-
guage in a theorem prover. In: Bauer, F., Steinbrüggen, R. (eds.) Foundations of
Secure Computation. Proc. Int. Summer School Marktoberdorf 1999, pp. 117–144.
IOS Press, Amsterdam (2000)
8. Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle/HOL. LNCS, vol. 2283. Springer,
Heidelberg (2002)
9. Slind, K.: Reasoning about terminating functional programs. Ph.D. thesis, Institut
für Informatik, TU München (1999)
10. Somogyi, Z., Henderson, F.J., Conway, T.C.: Mercury: an efficient purely declar-
ative logic programming language. In: Proceedings of the Australian Computer
Science Conference, pp. 499–512 (1995)
11. Wasserrab, D., Nipkow, T., Snelting, G., Tip, F.: An operational semantics and
type safety proof for multiple inheritance in C++. In: OOPSLA 2006: Proceedings
of the 21st annual ACM SIGPLAN conference on Object-oriented programming
languages, systems, and applications, pp. 345–362. ACM Press, New York (2006)
Formalizing the Logic-Automaton Connection

Stefan Berghofer and Markus Reiter

Technische Universität München

Institut für Informatik, Boltzmannstraße 3, 85748 Garching, Germany

Abstract. This paper presents a formalization of a library for automata

on bit strings in the theorem prover Isabelle/HOL. It forms the basis of
a reﬂection-based decision procedure for Presburger arithmetic, which is
eﬃciently executable thanks to Isabelle’s code generator. With this work,
we therefore provide a mechanized proof of the well-known connection
between logic and automata theory.

1 Introduction

Although higher-order logic (HOL) is undecidable in general, there are many

decidable logics such as Presburger arithmetic or the Weak Second-order theory
of One Successor (WS1S) that can be embedded into HOL. Since HOL can be
viewed as a logic containing a functional programming language, an interesting
approach for implementing a decision procedure for such a decidable logic in a
theorem prover based on HOL is to write and verify the decision procedure as
a recursive function in HOL itself. This approach, which is called reflection [7],
has been used in proof assistants based on type theory for quite a long time. For
example, Boutin [4] has used reflection to implement a decision procedure for
abelian rings in Coq. Recently, reflection has also gained considerable attention
in the Isabelle/HOL community. Chaieb and Nipkow have used this technique
to verify various quantifier elimination procedures for dense linear orders, real
and integer linear arithmetic, as well as Presburger arithmetic [5,12]. While the
decision procedures by Chaieb and Nipkow are based on algebraic methods like
Cooper’s algorithm, there are also semantic methods, as implemented e.g. in
the Mona tool [8] for deciding WS1S formulae. In order to check the validity of
a formula, Mona translates it to an automaton on bitstrings and then checks
whether it has accepting states. Basin and Friedrich [1] have connected Mona to
Isabelle/HOL using an oracle-based approach, i.e. they simply trust the answer
of the tool. As a motivation for their design decision, they write:
Hooking an ‘oracle’ to a theorem prover is risky business. The oracle
could be buggy [. . .]. The only way to avoid a buggy oracle is to recon-
struct a proof in the theorem prover based on output from the oracle,
or perhaps verify the oracle itself. For a semantics based decision pro-
cedure, proof reconstruction is not a realistic option: one would have to
formalize the entire automata-theoretic machinery within HOL [. . .].

Supported by BMBF in the VerisoftXT project under grant 01 IS 07008 F.

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 147–163, 2009.

c Springer-Verlag Berlin Heidelberg 2009
148 S. Berghofer and M. Reiter

In this paper, we show that verifying decision procedures based on automata in

HOL is not as unrealistic as it may seem. We develop a library for automata
on bitstrings, including operations like forming the product of two automata,
projection, and determinization of nondeterministic automata, which we then
use to build a decision procedure for Presburger arithmetic. The procedure can
easily be changed to cover WS1S, by just exchanging the automata for atomic
formulae. The specification of the decision procedure is completely executable,
and efficient ML code can be generated from it using Isabelle’s code generator [2].
To the best of our knowledge, this is the first formalization of an automata-based
decision procedure for Presburger arithmetic in a theorem prover.
The paper is structured as follows. In §2, we introduce basic concepts such
as Presburger arithmetic, automata theory, bit vectors, and BDDs. The library
for automata is described in §3. The actual decision procedure together with its
correctness proof is presented in §4, and §5 draws some conclusions. Due to lack
of space, we will not discuss any of the proofs in detail. However, the interested
reader can find the complete formalization on the web1 .

2 Basic Deﬁnitions
2.1 Presburger Arithmetic

Formulae of Presburger arithmetic are represented by the following datatype:

datatype pf = Eq (int list ) int | Le (int list ) int | And pf pf | Or pf pf

| Imp pf pf | Forall pf | Exist pf | Neg pf

The atomic formulae are Diophantine (in)equations Eq ks l and Le ks l, where ks

are the (integer) coeﬃcients and l is the right-hand side. Variables are encoded
using de Bruijn indices, meaning that the ith coeﬃcient in ks belongs to the
variable with index i. Thus, the well-known stamp problem

∀ x ≥8. ∃ y z . 3 ∗ y + 5 ∗ z = x

can be encoded by

Forall (Imp (Le [−1] −8) (Exist (Exist (Eq [5, 3, −1] 0))))

Like Boudet and Comon [3], we only consider variables ranging over the natural
numbers. The left-hand side of a Diophantine (in)equation can be evaluated
using the function

eval-dioph :: int list ⇒ nat list ⇒ int

eval-dioph (k · ks) (x · xs) = k ∗ int x + eval-dioph ks xs
eval-dioph [] xs = 0
eval-dioph ks [] = 0
1
http://www.in.tum.de/~ berghofe/papers/automata
Formalizing the Logic-Automaton Connection 149

where xs is a valuation, x · xs denotes the ‘Cons’ operator, and int coerces a

natural number to an integer. A Presburger formula can be evaluated by
eval-pf :: pf ⇒ nat list ⇒ bool
eval-pf (Eq ks l ) xs = (eval-dioph ks xs = l )
eval-pf (Le ks l ) xs = (eval-dioph ks xs ≤ l )
eval-pf (Neg p) xs = (¬ eval-pf p xs)
eval-pf (And p q) xs = (eval-pf p xs ∧ eval-pf q xs)
eval-pf (Or p q) xs = (eval-pf p xs ∨ eval-pf q xs)
eval-pf (Imp p q) xs = (eval-pf p xs −→ eval-pf q xs)
eval-pf (Forall p) xs = (∀ x . eval-pf p (x · xs))
eval-pf (Exist p) xs = (∃ x . eval-pf p (x · xs))

2.2 Abstract Automata

The abstract framework for automata used in this paper is quite similar to the
one used by Nipkow [11]. The purpose of this framework is to factor out all
properties that deterministic and nondeterministic automata have in common.
Automata are characterized by a transition function tr of type σ ⇒ α ⇒ σ, where
σ and α denote the types of states and input symbols, respectively. Transition
functions can be extended to words, i.e. lists of symbols in a canonical way:
steps :: (σ ⇒ α ⇒ σ) ⇒ σ ⇒ α list ⇒ σ
steps tr q [] = q
steps tr q (a · as) = steps tr (tr q a) as
The reachability of a state q from a state p via a word as is deﬁned by
reach :: (σ ⇒ α ⇒ σ) ⇒ σ ⇒ α list ⇒ σ ⇒ bool
reach tr p as q ≡ q = steps tr p as
Another characteristic property of an automaton is its set of accepting states.
Given a predicate P denoting the accepting states, an automaton is said to
accept a word as iﬀ from a starting state s we reach an accepting state via as:
accepts :: (σ ⇒ α ⇒ σ) ⇒ (σ ⇒ bool ) ⇒ σ ⇒ α list ⇒ bool
accepts tr P s as ≡ P (steps tr s as)

2.3 Bit Vectors and BDDs

The automata used in the formalization of our decision procedure for Presburger
arithmetic are of a speciﬁc kind: the input symbols of an automaton correspond-
ing to a formula with n free variables x0 , . . . , xn−1 are bit lists of length n.
⎡⎡ ⎤⎡ ⎤⎡ ⎤ ⎡ ⎤⎤
x0 b0,0 b0,1 b0,2 b0,m−1
.. ⎢⎢ .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥ · · · ⎢ .. ⎥⎥
. ⎣⎣ . ⎦ ⎣ . ⎦ ⎣ . ⎦ ⎣ . ⎦⎦
xn−1 bn−1,0 bn−1,1 bn−1,2 bn−1,m−1
150 S. Berghofer and M. Reiter

The rows in the above word are interpreted as natural numbers, where the left-
most column, i.e. the ﬁrst symbol in the list
corresponds to the least signiﬁcant
m−1
bit. Therefore, the value of variable xi is j=0 bi,j 2j . The list of values of n
variables denoted by a word can be computed recursively as follows:

nats-of-boolss :: nat ⇒ bool list list ⇒ nat list

nats-of-boolss n [] = replicate n 0
nats-of-boolss n (bs · bss) =
map (λ(b, x ). nat-of-bool b + 2 ∗ x ) (zip bs (nats-of-boolss n bss))

where zip [b 0 , b 1 , . . .] [x 0 , x 1 , . . .] yields [(b 0 , x 0 ), (b 1 , x 1 ), . . .], replicate n x

denotes the list [x , . . ., x ] of length n, and nat-of-bool maps False and True to
0 and 1, respectively. We can insert a bit vector in the ith row of a word by

insertll :: nat ⇒ α list ⇒ α list list ⇒ α list list

insertll i [] [] = []
insertll i (a · as) (bs · bss) = insertl i a bs · insertll i as bss

where insertl i a bs inserts a into list bs at position i. The interaction between

nats-of-boolss and insertll can be characterized by the following theorem:

If ∀ bs∈bss. is-alph n bs and |bs| = |bss| and i ≤ n then

nats-of-boolss (Suc n) (insertll i bs bss) =
insertl i (nat-of-bools bs) (nats-of-boolss n bss).

Here, is-alph n xs means that xs is a valid symbol, i.e. the length of xs is equal
to the number of variables n, |bs| and |bss| denote the lengths of bs and bss,
respectively, and bs ∈ bss means that bs is a member of the list bss. Moreover,
nat-of-bools is similar to nats-of-boolss, with the diﬀerence that it works on a
single row vector instead of a list of column vectors:

nat-of-bools :: bool list ⇒ nat

nat-of-bools [] = 0
nat-of-bools (b · bs) = nat-of-bool b + 2 ∗ nat-of-bools bs

Since the input symbols of our automata are bit vectors, it would be rather inef-
ﬁcient to just represent the transition function for a given state as an association
list relating bit vectors to successor states. For such a list, the lookup operation
would be exponential in the number of variables. When implementing the Mona
tool, Klarlund [8] already observed that representing the transition function as
a BDD is more eﬃcient. BDDs are represented by the datatype2

datatype α bdd = Leaf α | Branch (α bdd ) (α bdd )

The functions bdd-map :: (α ⇒ β) ⇒ α bdd ⇒ β bdd and bdd-all :: (α ⇒ bool )

⇒ α bdd ⇒ bool can be deﬁned in a canonical way. The lookup operation, whose
runtime is linear in the length of the input vector, has the deﬁnition
2
Since the leaves are not just 0 or 1, Klarlund [8] calls this a multi-terminal BDD.
Formalizing the Logic-Automaton Connection 151

bdd-lookup :: α bdd ⇒ bool list ⇒ α

bdd-lookup (Leaf x ) bs = x
bdd-lookup (Branch l r ) (b · bs) = bdd-lookup (if b then r else l ) bs

This operation only returns meaningful results if the height of the BDD is less
or equal to the length of the bit vector. We write bddh n bdd to mean that the
height of bdd is less or equal to n. Two BDDs can be combined using a binary
operator f as follows:

bdd-binop :: (α ⇒ β ⇒ γ) ⇒ α bdd ⇒ β bdd ⇒ γ bdd

bdd-binop f (Leaf x ) (Leaf y) = Leaf (f x y)
bdd-binop f (Branch l r ) (Leaf y) =
Branch (bdd-binop f l (Leaf y)) (bdd-binop f r (Leaf y))
bdd-binop f (Leaf x ) (Branch l r ) =
Branch (bdd-binop f (Leaf x ) l ) (bdd-binop f (Leaf x ) r )
bdd-binop f (Branch l 1 r 1 ) (Branch l 2 r 2 ) =
Branch (bdd-binop f l 1 l 2 ) (bdd-binop f r 1 r 2 )

If the two BDDs have diﬀerent heights, the shorter one is expanded on the ﬂy.
The following theorem states that bdd-binop yields a BDD corresponding to the
pointwise application of f to the functions represented by the argument BDDs:

If bddh |bs| l and bddh |bs| r then

bdd-lookup (bdd-binop f l r ) bs = f (bdd-lookup l bs) (bdd-lookup r bs).

2.4 Deterministic Automata

We represent deterministic ﬁnite automata (DFAs) by pairs of type nat bdd

list × bool list, where the ﬁrst and second component denotes the transition
function and the set of accepting states, respectively. The states of a DFA are
simply natural numbers. Note that we do not mention the start state in the
representation of the DFA, since it will always be 0. Not all pairs of the above
type are well-formed DFAs. The two lists must have the same length, and all
leaves of the BDDs in the list representing the transition function must be valid
states, i.e. be smaller than the length of the two lists. Moreover, the heights of
all BDDs must be less or equal to the number of variables n, and the set of states
must be nonempty. These conditions are captured by the following deﬁnition:

dfa-is-node :: dfa ⇒ nat ⇒ bool

Moreover, the transition function and acceptance condition can be deﬁned by

152 S. Berghofer and M. Reiter

dfa-trans :: dfa ⇒ nat ⇒ bool list ⇒ nat

dfa-trans A q bs ≡ bdd-lookup (fst A)[q] bs
dfa-accepting :: dfa ⇒ nat ⇒ bool
dfa-accepting A q ≡ (snd A)[q]

where xs[i] denotes the ith element of list xs. Finally, using the generic functions
from §2.2, we can produce variants of these functions tailored to DFAs:

dfa-steps A ≡ steps (dfa-trans A)

dfa-accepts A ≡ accepts (dfa-trans A) (dfa-accepting A) 0
dfa-reach A ≡ reach (dfa-trans A)

2.5 Nondeterministic Automata

Nondeterministic ﬁnite automata (NFAs) are represented by pairs of type bool

list bdd list × bool list. While the second component representing the accepting
states is the same as for DFAs, the transition table is now a list of BDDs mapping
a state and an input symbol to a ﬁnite set of successor states, which we represent
as a bit vector. In order for the transition table to be well-formed, the length
of the bit vectors representing the sets must be equal to the number of states
of the automaton, which coincides with the length of the transition table. This
well-formedness condition for the bit vectors is expressed by the predicate

nfa-is-node :: nfa ⇒ bool list ⇒ bool

nfa-is-node A ≡ λqs. |qs| = |fst A|

The deﬁnition of wf-nfa can be obtained from the one of wf-dfa by just replacing
dfa-is-node by nfa-is-node. Due to its “asymmetric” type, a transition function
of type nat ⇒ bool list ⇒ bool list would be incompatible with the abstract
functions from §2.2. We therefore lift the function to work on ﬁnite sets of natural
numbers rather that just single natural numbers. This is accomplished by

subsetbdd :: bool list bdd list ⇒ bool list ⇒ bool list bdd ⇒ bool list bdd
subsetbdd [] [] bdd = bdd
subsetbdd (bdd · bdds) (b · bs) bdd =
(if b then subsetbdd bdds bs (bdd-binop bv-or bdd bdd )
else subsetbdd bdds bs bdd )

where bv-or is the bit-wise or operation on bit vectors, i.e. the union of two
ﬁnite sets. Using this operation, subsetbdd combines all BDDs in the ﬁrst list,
for which the corresponding bit in the second list is True. The third argument
of subsetbdd serves as an accumulator and is initialized with a BDD consisting
of only one Leaf containing the empty set, which is the neutral element of bv-or :

nfa-emptybdd :: nat ⇒ bool list bdd

nfa-emptybdd n ≡ Leaf (replicate n False)
Formalizing the Logic-Automaton Connection 153

Using subsetbdd, the transition function for NFAs can now be deﬁned as follows:

nfa-trans :: nfa ⇒ bool list ⇒ bool list ⇒ bool list

nfa-trans A qs bs ≡ bdd-lookup (subsetbdd (fst A) qs (nfa-emptybdd |qs|)) bs

A set of states is accepting iﬀ at least one of the states in the set is accepting:

nfa-accepting :: bool list ⇒ bool list ⇒ bool

nfa-accepting [] bs = False
nfa-accepting (a · as) [] = False
nfa-accepting (a · as) (b · bs) = (a ∧ b ∨ nfa-accepting as bs)
nfa-accepting :: nfa ⇒ bool list ⇒ bool
nfa-accepting A ≡ nfa-accepting (snd A)

As in the case of DFAs, we can now instantiate the generic functions from §2.2.
In order to check whether we can reach an accepting state from the start state,
we apply accepts to the ﬁnite set containing only the state 0.

nfa-startnode :: nfa ⇒ bool list

nfa-startnode A ≡ replicate |fst A| False[0 := True]
nfa-steps A ≡ steps (nfa-trans A)
nfa-accepts A ≡ accepts (nfa-trans A) (nfa-accepting A) (nfa-startnode A)
nfa-reach A ≡ reach (nfa-trans A)

where xs[i := y] denotes the replacement of the ith element of xs by y.

2.6 Depth First Search

The eﬃciency of the automata constructions presented in this paper crucially

depends on the fact that the generated automata only contain reachable states.
When implemented in a naive way, the construction of a product DFA from
two DFAs having m and n states will lead to a DFA with m · n states, while the
construction of a DFA from an NFA with n states will result in a DFA having 2n
states, many of which are unreachable. By using a depth-ﬁrst search algorithm
(DFS) for the generation of the automata, we can make sure that all of their
states are reachable. In order to simplify the implementation of the automata
constructions, as well as their correctness proofs, the DFS algorithm is factored
out into a generic function, whose properties can be proved once and for all. The
DFS algorithm is based on a representation of graphs, as well as a data struc-
ture for storing the nodes that have already been visited. Our version of DFS,
which generalizes earlier work by Nishihara and Minamide [10,13], is designed
as an abstract module using the locale mechanism of Isabelle, thus allowing the
operations on the graph and the node store to be implemented in diﬀerent ways
depending on the application at hand. The module is parameterized by a type
α of graph nodes, a type β representing the node store, and the functions
154 S. Berghofer and M. Reiter

succs :: α ⇒ α list ins :: α ⇒ β ⇒ β empt :: β

is-node :: α ⇒ bool memb :: α ⇒ β ⇒ bool invariant :: β ⇒ bool

where succs returns the list of successors of a node, and the predicate is-node
describes the (finite) set of nodes. Moreover, ins x S, memb x S and empt cor-
respond to {x } ∪ S, x ∈ S and ∅ on sets. The node store must also satisfy an
additional invariant. Using Isabelle’s infrastructure for the definition of functions
by well-founded recursion [9], the DFS function can be defined as follows3 :

dfs :: β ⇒ α list ⇒ β
dfs S [] = S
dfs S (x · xs) = (if memb x S then dfs S xs else dfs (ins x S ) (succs x @ xs))

Note that this function is partial, since it may loop when instantiated with ins,
memb and empt operators not behaving like their counterparts on sets, or when
applied to a list of start values not being valid nodes. However, since dfs is tail
recursive, Isabelle’s function deﬁnition package can derive the above equations
without preconditions, which is crucial for the executability of dfs. The central
property of dfs is that it computes the transitive closure of the successor relation:

If is-node y and is-node x then

memb y (dfs empt [x ]) = ((x , y) ∈ (succsr succs)∗ ).

where succsr turns a successor function into a relation:

succsr :: (γ ⇒ δ list) ⇒ γ × δ ⇒ bool

succsr succs ≡ {(x , y) | y ∈ succs x }

3 Automata Construction

In this section, we will describe all automata constructions that are used to
recursively build automata from formulae in Presburger arithmetic. The simplest
one is the complement, which we describe in §3.1. It will be used to model
negation. The product automaton construction described in §3.2 corresponds to
binary operators such as ∨, ∧, and −→, whereas the more intricate projection
construction shown in §3.3 is used to deal with existential quantiﬁers. Finally,
§3.4 illustrates the construction of automata corresponding to atomic formulae.

3.1 Complement

The complement construction is straightforward. We only exchange the accept-

ing and the non-accepting states, and leave the transition function unchanged:

negate-dfa :: dfa ⇒ dfa

negate-dfa ≡ λ(t , a). (t , map Not a)
3
We use dfs S xs as an abbreviation for gen-dfs succs ins memb S xs.
Formalizing the Logic-Automaton Connection 155

prod-succs :: dfa ⇒ dfa ⇒ nat × nat ⇒ (nat × nat) list

prod-succs A B ≡ λ(i, j ). add-leaves (bdd-binop Pair (fst A)[i] (fst B )[j ] ) []
prod-ins :: nat × nat
⇒ nat option list list × (nat × nat) list
⇒ nat option list list × (nat × nat) list
prod-ins ≡
λ(i, j ) (tab, ps). (tab[i := tab[i] [j := Some |ps|]], ps @ [(i, j )])
prod-memb :: nat × nat ⇒ nat option list list × (nat × nat) list ⇒ bool
prod-memb ≡ λ(i, j ) (tab, ps). tab[i] [j ] = None
prod-empt :: dfa ⇒ dfa ⇒ nat option list list × (nat × nat) list
prod-empt A B ≡ (replicate |fst A| (replicate |fst B | None), [])
prod-dfs :: dfa ⇒ dfa ⇒ nat × nat ⇒ nat option list list × (nat × nat) list
prod-dfs A B x ≡
gen-dfs (prod-succs A B ) prod-ins prod-memb (prod-empt A B ) [x ]
binop-dfa :: (bool ⇒ bool ⇒ bool ) ⇒ dfa ⇒ dfa ⇒ dfa
binop-dfa f A B ≡
let (tab, ps) = prod-dfs A B (0, 0)
in (map (λ(i, j ). bdd-binop (λk l . the tab[k] [l] ) (fst A)[i] (fst B )[j ] ) ps,
map (λ(i, j ). f (snd A)[i] (snd B )[j ] ) ps)

Fig. 1. Deﬁnition of the product automaton

A well-formed DFA A will accept a word bss iﬀ it is not accepted by the DFA
produced by negate-dfa:

If wf-dfa A n and ∀ bs∈bss. is-alph n bs then

dfa-accepts (negate-dfa A) bss = (¬ dfa-accepts A bss).

3.2 Product Automaton

Given a binary logical operator f :: bool ⇒ bool ⇒ bool, the product automaton
construction is used to build a DFA corresponding to the formula f P Q from
DFAs A and B corresponding to the formulae P and Q, respectively. As sug-
gested by its name, the state space of the product automaton corresponds to the
cartesian product of the state spaces of the DFAs A and B. However, as already
mentioned in §2.6, not all of the elements of the cartesian product constitute
reachable states. We therefore need an algorithm for computing the reachable
states of the resulting DFA. Moreover, since the automata framework described
in §2.4–2.5 relies on the states to be encoded as natural numbers, we also need
to produce a mapping from nat × nat to nat. All of this can be achieved just by
instantiating the abstract DFS framework with suitable functions, as shown in
Fig. 1. In this construction, the store containing the visited states is a pair nat
option list list × (nat × nat ) list, where the ﬁrst component is a matrix denoting
156 S. Berghofer and M. Reiter

a partial map from nat × nat to nat. The second component of the store is a list
containing all visited states (i, j ). It can be viewed as a map from nat to nat ×
nat, which is the inverse of the aforementioned map. In order to compute the list
of successor states of a state (i, j ), prod-succs combines the BDDs representing
the transition tables of state i of A, and of state j of B using the Pair oper-
ator, and then collects all leaves of the resulting BDD. The operation prod-ins
for inserting a state into the store updates the entry at position (i, j ) of the
matrix tab with the number of visited states, and appends (i, j ) to the list ps of
visited states. By deﬁnition of DFS, this operation is guaranteed to be applied
only if the state (i, j ) has not been visited yet, i.e. the corresponding entry in
the matrix is None and (i, j ) is not contained in the list ps. We now produce
a speciﬁc version of DFS called prod-dfs by instantiating the generic function
from §2.6, and using the list containing just one pair of states as a start value.
By induction on gen-dfs, we can prove that the matrix and the list computed by
prod-dfs encodes a bijection between the reachable states (i, j ) of the product
automaton, and natural numbers k corresponding to the states of the resulting
DFA, where k is smaller than the number of reachable states:

If prod-is-node A B x then
((fst (prod-dfs A B x ))[i][j ] = Some k ∧ dfa-is-node A i ∧ dfa-is-node B j ) =
(k < |snd (prod-dfs A B x )| ∧ (snd (prod-dfs A B x ))[k ] = (i, j )).

The start state x must satisfy a well-formedness condition prod-is-node, meaning

that its two components must be valid states of A and B. Using this result, as
well as the fact that prod-dfs computes the transitive closure, we can then show
by induction on bss that a state m is reachable in the resulting automaton via a
sequence of bit vectors bss iﬀ the corresponding states s 1 and s 2 are reachable
via bss in the automata A and B, respectively:

If ∀ bs∈bss. is-alph n bs then

(∃ m. dfa-reach (binop-dfa f A B ) 0 bss m ∧
(fst (prod-dfs A B (0, 0)))[s 1 ][s 2 ] = Some m ∧
dfa-is-node A s 1 ∧ dfa-is-node B s 2 ) =
(dfa-reach A 0 bss s 1 ∧ dfa-reach B 0 bss s 2 ).

Finally, bdd-binop produces the resulting product automaton by combining the

transition tables of A and B using the mapping from nat × nat to nat computed
by prod-dfs, and by applying f to the acceptance conditions of A and B. Using the
previous theorem, we can prove the correctness statement for this construction:

If wf-dfa A n and wf-dfa B n and ∀ bs∈bss. is-alph n bs then

dfa-accepts (binop-dfa f A B ) bss = f (dfa-accepts A bss) (dfa-accepts B bss).

3.3 Projection

Using the terminology from §2.3, the automaton for ∃ x . P can be obtained from
the one for P by projecting away the row corresponding to the variable x. Since
Formalizing the Logic-Automaton Connection 157

this operation yields an NFA, it is advantageous to ﬁrst translate the DFA for P
into an NFA, which can easily be done by replacing all the leaves in the transition
table by singleton sets, and leaving the set of accepting states unchanged. The
correctness of this operation called nfa-of-dfa is expressed by

If wf-dfa A n and ∀ bs∈bss. is-alph n bs then

nfa-accepts (nfa-of-dfa A) bss = dfa-accepts A bss.

Given a BDD representing the transition table of a particular state of an NFA,

we can project away the ith variable by combining the two children BDDs of
the branches at depth i using the bv-or operation:

quantify-bdd :: nat ⇒ bool list bdd ⇒ bool list bdd

quantify-bdd i (Leaf q) = Leaf q
quantify-bdd 0 (Branch l r ) = bdd-binop bv-or l r
quantify-bdd (Suc i) (Branch l r ) =
Branch (quantify-bdd i l ) (quantify-bdd i r )

To produce the NFA corresponding to the quantiﬁed formula, we just map this
operation over the transition table:

quantify-nfa :: nat ⇒ nfa ⇒ nfa

quantify-nfa i ≡ λ(bdds, as). (map (quantify-bdd i) bdds, as)

Due to its type, we could apply this function repeatedly to quantify over several
variables in one go. The correctness of this construction is summarized by

If wf-nfa A (Suc n) and i ≤ n and ∀ bs∈bss. is-alph n bs then

nfa-accepts (quantify-nfa i A) bss =
(∃ bs. nfa-accepts A (insertll i bs bss) ∧ |bs| = |bss|).

This means that the new NFA accepts a list bss of column vectors iff the original
NFA accepts the list obtained from bss by inserting a suitable row vector bs
representing the existential witness. Matters are complicated by the additional
requirement that the word accepted by the new NFA must have the same length
as the witness. This requirement can be satisfied by appending zero vectors to the
end of bss, which does not change its interpretation. Since the other constructions
(in particular the complement) only work on DFAs, we turn the obtained NFA
into a DFA by applying the usual subset construction. The central idea is that
each set of states produced by nfa-steps can be viewed as a state of a new DFA.
As mentioned in §2.6, not all of these sets are reachable from the initial state
of the NFA. Similar to the product construction, the algorithm for computing
the reachable sets shown in Fig. 2 is an instance of the general DFS framework.
The node store is now a pair of type nat option bdd × bool list list, where the
first component is a BDD representing a partial map from finite sets (encoded as
bit vectors) to natural numbers, and the second component is the list of visited
states representing the inverse map. To insert new entries into a BDD, we use
158 S. Berghofer and M. Reiter

bddinsert :: α bdd ⇒ bool list ⇒ α ⇒ α bdd

bddinsert (Leaf a) [] x = Leaf x
bddinsert (Leaf a) (w · ws) x =
(if w then Branch (Leaf a) (bddinsert (Leaf a) ws x )
else Branch (bddinsert (Leaf a) ws x ) (Leaf a))
bddinsert (Branch l r ) (w · ws) x =
(if w then Branch l (bddinsert r ws x ) else Branch (bddinsert l ws x ) r )

The computation of successor states in subset-succs and the transition relation

in det-nfa closely resembles the deﬁnition of nfa-trans from §2.5. Using the fact
that subset-dfs computes a bijection between ﬁnite sets and natural numbers,
we can prove the correctness theorem for det-nfa:

If wf-nfa A n and ∀ bs∈bss. is-alph n bs then

dfa-accepts (det-nfa A) bss = nfa-accepts A bss.

Recall that the automaton produced by quantify-nfa will only accept words
with a suﬃcient number of trailing zero column vectors. To get a DFA that also
accepts words without trailing zeros, we mark all states as accepting from which
an accepting state can be reached by reading only zeros. This construction, which
is sometimes referred to as the right quotient, can be characterized as follows:

If wf-dfa A n and ∀ bs∈bss. is-alph n bs then

dfa-accepts (rquot A n) bss = (∃ m. dfa-accepts A (bss @ zeros m n)).

where zeros m n produces a word consisting of m zero vectors of size n.

3.4 Diophantine (In)Equations

We now come to the construction of DFAs for atomic formulae, namely Dio-
phantine (in)equations. For this purpose, we use a method due to Boudet and
Comon [3]. The key observation is that xs is a solution of a Diophantine equation
iff it is a solution modulo 2 and the quotient of xs and 2 is a solution of another
equation with the same coefficients, but with a different right-hand side:

(eval-dioph ks xs = l ) =
(eval-dioph ks (map (λx . x mod 2) xs) mod 2 = l mod 2 ∧
eval-dioph ks (map (λx . x div 2) xs) =
(l − eval-dioph ks (map (λx . x mod 2) xs)) div 2)

In other words, the states of the DFA accepting the solutions of the equation
correspond to the right-hand sides reachable from the initial right-hand side l,
which will again be computed using the DFS algorithm. To ensure termination
of DFS, it is crucial to prove that the reachable right-hand sides m are bounded:

If |m| ≤ max |l | ( k ←ks. |k |) then
|(m − eval-dioph ks (map (λx . x mod 2) xs)) div 2| ≤ max |l | ( k ←ks. |k |).
Formalizing the Logic-Automaton Connection 159

subset-succs :: nfa ⇒ bool list ⇒ bool list list

subset-succs A qs ≡ add-leaves (subsetbdd (fst A) qs (nfa-emptybdd |qs|)) []
subset-ins :: bool list
⇒ nat option bdd × bool list list
⇒ nat option bdd × bool list list
subset-ins qs ≡ λ(bdd , qss). (bddinsert bdd qs (Some |qss|), qss @ [qs])
subset-memb :: bool list ⇒ nat option bdd × bool list list ⇒ bool
subset-memb qs ≡ λ(bdd , qss). bdd-lookup bdd qs = None
subset-empt :: nat option bdd × bool list list
subset-empt ≡ (Leaf None, [])
subset-dfs :: nfa ⇒ bool list ⇒ nat option bdd × bool list list
subset-dfs A x ≡
gen-dfs (subset-succs A) subset-ins subset-memb subset-empt [x ]
det-nfa :: nfa ⇒ dfa
det-nfa A ≡
let (bdd , qss) = subset-dfs A (nfa-startnode A)
in (map (λqs. bdd-map (λqs. the (bdd-lookup bdd qs))
(subsetbdd (fst A) qs (nfa-emptybdd |qs|)))
qss,
map (nfa-accepting A) qss)

Fig. 2. Deﬁnition of the subset construction

eq-dfa :: nat ⇒ int list ⇒ int ⇒ dfa

eq-dfa n ks l ≡
let (is, js) = dioph-dfs n ks l
in (map (λj . make-bdd
(λxs. if eval-dioph ks xs mod 2 = j mod 2
then the is[int -to -nat -bij ((j − eval -dioph ks xs ) div 2)] else |js|)
n [])
js @
[Leaf |js|],
map (λj . j = 0) js @ [False])

Fig. 3. Deﬁnition of the automata for Diophantine equations

By instantiating the abstract gen-dfs function, we obtain a function

dioph-dfs :: nat ⇒ int list ⇒ int ⇒ nat option list × int list

that, given the number of variables, the coeﬃcients, and the right-hand side,
computes a bijection between reachable right-hand sides and natural numbers:
160 S. Berghofer and M. Reiter

((fst (dioph-dfs nks l ))[int -to -nat -bij m] = Some k ∧

|m| ≤ max |l | ( k ←ks. |k |)) =
(k < |snd (dioph-dfs n ks l )| ∧ (snd (dioph-dfs n ks l ))[k ] = m)

The ﬁrst component of the pair returned by dioph-dfs can be viewed as a partial
map from integers to natural numbers, where int-to-nat-bij maps negative and
non-negative integers to odd and even list indices, respectively. As shown in
Fig. 3, the transition table of the DFA is constructed by eq-dfa as follows: if
the current state corresponds to the right-hand side j, and the DFA reads a
bit vector xs satisfying the equation modulo 2, then the DFA goes to the state
corresponding to the new right-hand side (j − eval-dioph ks xs) div 2, otherwise
it goes to an error state, which is the last state in the table. To produce a BDD
containing the successor states for all bit vectors of length n, we use the function

make-bdd :: (nat list ⇒ α) ⇒ nat ⇒ nat list ⇒ α bdd

make-bdd f 0 xs = Leaf (f xs)
make-bdd f (Suc n) xs =
Branch (make-bdd f n (xs @ [0])) (make-bdd f n (xs @ [1]))

The key property of eq-dfa states that for every right-hand side m reachable
from l, the state reachable from m via a word bss is accepting iﬀ the list of
natural numbers denoted by bss satisﬁes the equation with right-hand side m:

If (l , m) ∈ (succsr (dioph-succs n ks))∗ and ∀ bs∈bss. is-alph n bs then

dfa-accepting (eq-dfa n ks l )
(dfa-steps (eq-dfa n ks l ) (the (fst (dioph-dfs n ks l ))[int -to -nat -bij m] ) bss) =
(eval-dioph ks (nats-of-boolss n bss) = m).

Here, dioph-succs n ks returns a list of up to 2n successor states reachable from

a given state by reading a single column vector of size n. The proof of the above
property is by induction on bss, where the equation given at the beginning of
this section is used in the induction step. The correctness property of eq-dfa can
then be obtained from this result as a simple corollary:

If ∀ bs∈bss. is-alph n bs then

dfa-accepts (eq-dfa n ks l ) bss = (eval-dioph ks (nats-of-boolss n bss) = l ).

Diophantine inequations can be treated in a similar way.

4 The Decision Procedure

We now have all the machinery in place to write a decision procedure for Pres-
burger arithmetic. A formula can be transformed into a DFA by the following
function:
Formalizing the Logic-Automaton Connection 161

dfa-of-pf :: nat ⇒ pf ⇒ dfa

dfa-of-pf n (Eq ks l ) = eq-dfa n ks l
dfa-of-pf n (Le ks l ) = ineq-dfa n ks l
dfa-of-pf n (Neg p) = negate-dfa (dfa-of-pf n p)
dfa-of-pf n (And p q) = binop-dfa (∧) (dfa-of-pf n p) (dfa-of-pf n q)
dfa-of-pf n (Or p q) = binop-dfa (∨) (dfa-of-pf n p) (dfa-of-pf n q)
dfa-of-pf n (Imp p q) = binop-dfa (−→) (dfa-of-pf n p) (dfa-of-pf n q)
dfa-of-pf n (Exist p) =
rquot (det-nfa (quantify-nfa 0 (nfa-of-dfa (dfa-of-pf (Suc n) p)))) n
dfa-of-pf n (Forall p) = dfa-of-pf n (Neg (Exist (Neg p)))

By structural induction on formulae, we can show the correctness theorem

If ∀ bs∈bss. is-alph n bs then

dfa-accepts (dfa-of-pf n p) bss = eval-pf p (nats-of-boolss n bss).

Note that a closed formula is valid iﬀ the start state of the resulting DFA is
accepting, which can easily be seen by letting n = 0 and bss = []. Most cases of
the induction can be proved by a straightforward application of the correctness
results from §3. Unsurprisingly, the only complicated case is the one for the
existential quantiﬁer, which we will now examine in more detail. In this case,
the left-hand side of the correctness theorem is

dfa-accepts
(rquot (det-nfa (quantify-nfa 0 (nfa-of-dfa (dfa-of-pf (Suc n) p)))) n) bss

which, according to the correctness statement for rquot, is equivalent to

∃ m. dfa-accepts (det-nfa (quantify-nfa 0 (nfa-of-dfa (dfa-of-pf (Suc n) p))))

(bss @ zeros m n)

By correctness of det-nfa, quantify-nfa and nfa-of-dfa, this is the same as

∃ m bs. dfa-accepts (dfa-of-pf (Suc n) p) (insertll 0 bs (bss @ zeros m n)) ∧

|bs| = |bss| + |zeros m n|

Using the induction hypothesis, this can be rewritten to

∃ m bs. eval-pf p (nats-of-boolss (Suc n) (insertll 0 bs (bss @ zeros m n))) ∧

|bs| = |bss| + |zeros m n|

According to the properties of nats-of-boolss from §2.3, this can be recast as

∃ m bs. eval-pf p (nat-of-bools bs · nats-of-boolss n bss) ∧ |bs| = |bss| + m

which is obviously equivalent to the right-hand side of the correctness theorem

∃ x . eval-pf p (x · nats-of-boolss n bss)

since we can easily produce suitable instantiations for m and bs from x.

162 S. Berghofer and M. Reiter

5 Conclusion

First experiments with the algorithm presented in §4 show that it can compete
quite well with the standard decision procedure for Presburger arithmetic avail-
able in Isabelle. Even without minimization, the DFA for the stamp problem
from §2.1 has only 6 states, and can be constructed in less than a second. The
following table shows the size of the DFAs (i.e. the number of states) for all sub-
formulae of the stamp problem. Thanks to the DFS algorithm, they are much
smaller than the DFAs that one would have obtained using a naive construction:

Exist Exist Eq [5, 3, −1] 0

Forall Imp 13 9 9
6 15 Le [−1] −8
5

The next step is to formalize a minimization algorithm, e.g. along the lines of
Constable et al. [6]. We also intend to explore other ways of constructing DFAs
for Diophantine equations, such as the approach by Wolper and Boigelot [15],
which is more complicated than the one shown in §3.4, but can directly deal
with variables over the integers rather than just natural numbers. To improve
the performance of the decision procedure on large formulae, we would also like
to investigate possible optimizations of the simple representation of BDDs pre-
sented in §2.3. Verma [14] describes a formalization of reduced ordered BDDs
with sharing in Coq. To model sharing, Verma’s formalization is based on a
memory for storing BDDs. Due to their dependence on the memory, algorithms
using this kind of BDDs are no longer purely functional, which makes reason-
ing about them substantially more challenging. Finally, we also plan to extend
our decision procedure to cover WS1S, and use it to tackle some of the circuit
veriﬁcation problems described by Basin and Friedrich [1].

Acknowledgements. We would like to thank Tobias Nipkow for suggesting this

project and for numerous comments, Clemens Ballarin and Markus Wenzel for
answering questions concerning locales, and Alex Krauss for help with well-
founded recursion and induction schemes.

References
1. Basin, D., Friedrich, S.: Combining WS1S and HOL. In: Gabbay, D., de Rijke, M.
(eds.) Frontiers of Combining Systems 2. Studies in Logic and Computation, vol. 7,
pp. 39–56. Research Studies Press/Wiley (2000)
2. Berghofer, S., Nipkow, T.: Executing higher order logic. In: Callaghan, P., Luo, Z.,
McKinna, J., Pollack, R. (eds.) TYPES 2000. LNCS, vol. 2277, p. 24. Springer,
Heidelberg (2002)
3. Boudet, A., Comon, H.: Diophantine equations, Presburger arithmetic and ﬁnite
automata. In: Kirchner, H. (ed.) CAAP 1996. LNCS, vol. 1059, pp. 30–43. Springer,
Heidelberg (1996)
Formalizing the Logic-Automaton Connection 163

4. Boutin, S.: Using reflection to build efficient and certified decision procedures. In:
Ito, T., Abadi, M. (eds.) TACS 1997. LNCS, vol. 1281, pp. 515–529. Springer,
Heidelberg (1997)
5. Chaieb, A., Nipkow, T.: Proof synthesis and reflection for linear arithmetic. Journal
of Automated Reasoning 41, 33–59 (2008)
6. Constable, R.L., Jackson, P.B., Naumov, P., Uribe, J.: Constructively formalizing
automata theory. In: Plotkin, G., Stirling, C., Tofte, M. (eds.) Proof, Language,
and Interaction: Essays in Honor of Robin Milner. MIT Press, Cambridge (2000)
7. Harrison, J.: Metatheory and reflection in theorem proving: A survey and critique.
Technical Report CRC-053, SRI Cambridge (1995),
http://www.cl.cam.ac.uk/users/jrh/papers/reflect.dvi.gz
8. Klarlund, N.: Mona & Fido: The logic-automaton connection in practice. In:
Nielsen, M. (ed.) CSL 1997. LNCS, vol. 1414, pp. 311–326. Springer, Heidelberg
(1998)
9. Krauss, A.: Partial recursive functions in higher-order logic. In: Furbach, U.,
Shankar, N. (eds.) IJCAR 2006. LNCS, vol. 4130, pp. 589–603. Springer, Hei-
delberg (2006)
10. Minamide, Y.: Verified decision procedures on context-free grammars. In: Schnei-
der, K., Brandt, J. (eds.) TPHOLs 2007. LNCS, vol. 4732, pp. 173–188. Springer,
Heidelberg (2007)
11. Nipkow, T.: Verified lexical analysis. In: Grundy, J., Newey, M. (eds.) TPHOLs
1998. LNCS, vol. 1479, pp. 1–15. Springer, Heidelberg (1998)
12. Nipkow, T.: Linear quantifier elimination. In: Armando, A., Baumgartner, P.,
Dowek, G. (eds.) IJCAR 2008. LNCS, vol. 5195, pp. 18–33. Springer, Heidelberg
(2008)
13. Nishihara, T., Minamide, Y.: Depth first search. In: Klein, G., Nipkow, T., Paul-
son, L. (eds.) The Archive of Formal Proofs,
http://afp.sf.net/entries/Depth-First-Search.shtml (June 2004); Formal
proof development
14. Verma, K.N., Goubault-Larrecq, J., Prasad, S., Arun-Kumar, S.: Reflecting BDDs
in Coq. In: He, J., Sato, M. (eds.) ASIAN 2000. LNCS, vol. 1961, pp. 162–181.
Springer, Heidelberg (2000)
15. Wolper, P., Boigelot, B.: On the construction of automata from linear arithmetic
constraints. In: Schwartzbach, M.I., Graf, S. (eds.) TACAS 2000. LNCS, vol. 1785,
pp. 1–19. Springer, Heidelberg (2000)
Extended First-Order Logic

Chad E. Brown and Gert Smolka

Saarland University, Saarbrücken, Germany

Abstract. We consider the EFO fragment of simple type theory, which

restricts quantiﬁcation and equality to base types but retains lambda
abstractions and higher-order variables. We show that this fragment en-
joys the characteristic properties of ﬁrst-order logic: complete proof sys-
tems, compactness, and countable models. We obtain these results with
an analytic tableau system and a concomitant model existence lemma.
All results are with respect to standard models. The tableau system is
well-suited for proof search and yields decision procedures for substantial
fragments of EFO.

1 Introduction
First-order logic can be considered as a natural fragment of Church’s type the-
ory [1]. In this paper we exhibit a larger fragment of type theory, called EFO,
that still enjoys the characteristic properties of first-order logic: complete proof
systems, compactness, and countable models. EFO restricts quantification and
equality to base types but retains lambda abstractions and higher-order vari-
ables. Like type theory, EFO has a type o of truth values and admits functions
that take truth values to individuals. Such functions are not available in first-
order logic. A typical example is a conditional C : oιιι taking a truth value and
two individuals as arguments and returning one of the individuals. Here is a
valid EFO formula that specifies the conditional and states one of its properties:

(∀xy. C ⊥xy = y ∧ C xy = x) → C (x=y)xy = y

The starting point for EFO is an analytic tableau system derived from Brown’s
Henkin-complete cut-free one-sided sequent calculus for extensional type the-
ory [2]. The tableau system is well-suited for proof search and yields decision
procedures and the finite model property for three substantial fragments of EFO:
lambda-free formulas (e.g., pa → pb → p(a∧b)), Bernays-Schönfinkel-Ramsey
formulas [3], and equations between pure lambda terms (terms not involving
type o). The decidability and finite model results are mostly known, but it is
remarkable that we obtain them with a single tableau system.
The proofs of the main results follow the usual development of first-order
logic [4,5], which applies the abstract consistency technique to a model existence
lemma for the tableau system (Hintikka’s Lemma). Due to the presence of higher-
order variables and lambda abstractions, the proof of the EFO model existence
lemma is much harder than it is for first-order logic. We employ the possible-
values technique [6], which has been used in [2] to obtain Henkin models, and

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 164–179, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Extended First-Order Logic 165

in [7] to obtain standard models. We generalize the model existence theorem such
that we can obtain countable models using the abstract consistency technique.
In a preceding paper [7], we develop a tableau-based decision procedure for
the quantifier- and lambda-free fragment of EFO and introduce the possible-
values-based construction of standard models. In this paper we extend the model
construction to first-order quantification and lambda abstraction. We introduce
a novel subterm restriction for the universal quantifier and employ an abstract
normalization operator, both essential for proof search and decision procedures.
Due to space limitations we have to omit some proofs. They can be found in
the full paper at www.ps.uni-sb.de/Papers.

2 Basic Definitions
Types (σ, τ , μ) are obtained with the grammar τ ::= o | ι | τ τ . The elements
of o are the two truth values, ι is interpreted as a nonempty set, and a function
type στ is interpreted as the set of all total functions from σ to τ . For simplicity,
we provide only one sort ι. Everything generalizes to countably many sorts.
We distinguish between two kinds of names, called constants and variables.
Every name comes with a type. We assume that there are only countably many
names, and that for every type there are infinitely many variables of this type.
If not said otherwise, the letter a ranges over names, c over constants, and x
and y over variables.
Terms (s, t, u, v) are obtained with the grammar t ::= a | tt | λx.t where
an application st is only admitted if s : τ μ and t : τ for some types τ and μ.
Terms of type o are called formulas. A term is lambda-free if it does not contain a
subterm that is a lambda abstraction. We use N s to denote the set of all names
that have a free occurrence in the term s.
We assume that ⊥ : o, ¬ : oo, ∧ : ooo, =σ : σσo, and ∀σ : (σo)o are constants
for all types σ. We write ∀x.s for ∀σ (λx.s). An interpretation is a function I
that is defined on all types and all names and satisfies the following conditions:
– Io = {0, 1}
– I(στ ) is the set of all total functions from Iσ to Iτ
– I⊥ = 0
– I(¬), I(∧), I(=σ ), and I(∀σ ) are the standard interpretations of the respec-
tive logical constants.
We write Îs for the value the term s evaluates to under the interpretation I.
We say that an interpretation I is countable [finite] if Iι is countable [finite].
An interpretation I is a model of a set A of formulas if Îs = 1 for every formula
s ∈ A. A set of formulas is satisfiable if it has a model.
The constants ⊥, ¬, ∧, =ι , and ∀ι are called EFO constants. An EFO term is
a term that contains no other constants but EFO constants. We write EFOσ for
the set of all EFO terms of type σ. For simplicity, we work with a restricted set of
EFO constants. Everything generalizes to the remaining propositional constants,
the identity =o , and the existential quantifier ∃ι .
166 C.E. Brown and G. Smolka

3 Normalization
We assume a normalization operator [] that provides for lambda conversion. The
normalization operator [] must be a type preserving total function from terms
to terms. We call [s] the normal form of s and say that s is normal if [s] = s.
There are several possibilities for the normalization operator []: β-, long β-, or
βη-normal form, all possibly with standardized bound variables [8]. We will not
commit to a particular operator but state explicitly the properties we require
for our results. To start, we require the following properties:
N1 [[s]] = [s]
N2 [[s]t] = [st]
N3 [as1 . . . sn ] = a[s1 ] . . . [sn ] if the type of as1 . . . sn is o or ι
N4 Î[s] = Îs
Note that a ranges over names and I ranges over interpretations. N3 also applies
for n = 0.
We need further properties of the normalization operator that can only be
expressed with substitutions. A substitution is a type preserving partial function
from variables to terms. If θ is a substitution, x is a variable, and s is a term
that has the same type as x, we use θxs to denote the substitution that agrees
everywhere with θ but possibly on x where it yields s. We assume that every
substitution θ can be extended to a type preserving total function θ̂ from terms
to terms such that the following conditions hold:
S1 θ̂a = if a ∈ Dom θ then θa else a
S2 θ̂(st) = (θ̂s)(θ̂t)
S3 [(θ̂(λx.s))t] = [θxt s]
S4 ˆ
[∅s] = [s]

S5 N [s] ⊆ N s and N (θ̂s) ⊆ { N (θ̂a) | a ∈ N s }
Note that a ranges over names and that ∅ (the empty set) is the substitution
that is undeﬁned on every variable.

4 Tableau System
The results of this paper originate with the tableau system T shown in Figure 1.
The rules in the ﬁrst two lines of Figure 1 are the familiar rules from ﬁrst-order
logic. The rules in the third and fourth line deal with embedded formulas. The
mating rule Tmat decomposes complementary atomic formulas by introducing
disequations that confront corresponding subterms. Disequations can be further
decomposed with Tdec . Embedded formulas are eventually raised to the top level
by Rule Tbe , which incorporates Boolean extensionality. Rule Tfe incorporates
functional extensionality. It reduces disequations at functional types to disequa-
tions at lower types. The confrontation rule Tcon deals with positive equations
at type ι. A discussion of the confrontation rule can be found in [7]. The tableau
rules are such that they add normal formulas if they are applied to normal
formulas.
Extended First-Order Logic 167

s , ¬s ¬¬s s∧t ¬(s ∧ t)

T¬ T¬¬ T∧ T¬∧
⊥ s s, t ¬s | ¬t

∀ι s ¬∀ι s
T∀ t:ι T¬∀ x : ι fresh
[st] ¬[sx]

xs1 . . . sn , ¬xt1 . . . tn xs1 . . . sn =ι xt1 . . . tn

Tmat Tdec
s1 = t1 | · · · | sn = tn s1 = t1 | · · · | sn = tn

s = s s =o t s =στ t
T= Tbe Tfe x : σ fresh
⊥ s , ¬t | ¬s , t [sx] = [tx]

s =ι t , u =ι v
Tcon
s = u, t = u | s = v, t = v

Fig. 1. Tableau system T

Example 4.1. The following tableau refutes the formula pf ∧¬p(λx.¬¬f x) where
p : (ιo)o and f : ιo.
pf ∧ ¬p(λx.¬¬f x)
pf, ¬p(λx.¬¬f x)
f = (λx.¬¬f x)
f x = ¬¬f x
f x, ¬¬¬f x ¬f x, ¬¬f x
¬f x ⊥
⊥

The rules used are T∧ , Tmat , Tfe , Tbe , T¬¬ , and T¬ .

5 Evidence

A quasi-EFO formula is a disequation s =σ t such that s and t are EFO terms

and σ = ι. Note that the rules Tmat and Tdec may yield quasi-EFO formulas
when they are applied to EFO formulas. A branch is a set of normal formulas s
such that s is either EFO or quasi-EFO.
A term s : ι is discriminating in a branch A if A contains a disequation s=t
or t=s for some term t. We use DA to denote the set of all terms that are
discriminating in a branch A.
A branch E is evident if it satisﬁes the evidence conditions in Figure 2. The
evidence conditions correspond to the tableau rules and are designed such that a
branch that is closed under the tableau rules and does not contain ⊥ is evident.
Note that the evidence conditions require less than the tableau rules:
168 C.E. Brown and G. Smolka

E⊥ ⊥ is not in E.
E¬ If ¬x is in E, then x is not in E.
E¬¬ If ¬¬s is in E, then s is in E.
E∧ If s ∧ t is in E, then s and t are in E.
E¬∧ If ¬(s ∧ t) is in E, then ¬s or ¬t is in E.
E∀ If ∀ι s is in E, then [st] is in E for all t ∈ DE,
and [st] is in E for some t ∈ EFOι .
E¬∀ If ¬∀ι s is in E, then ¬[st] is in E for some t ∈ EFOι .
Emat If xs1 . . . sn and ¬xt1 . . . tn are in E where n ≥ 1,
then si = ti is in E for some i ∈ {1, . . . , n}.
Edec If xs1 . . . sn =ι xt1 . . . tn is in E where n ≥ 1,
then si = ti is in E for some i ∈ {1, . . . , n}.
E= If s =ι t is in E, then s and t are diﬀerent.
Ebe If s =o t is in E, then either s and ¬t are in E or ¬s and t are in E.
Efe If s =στ t is in E, then [sx] = [tx] is in E for some variable x.
Econ If s =ι t and u =ι v are in E,
then either s = u and t = u are in E or s = v and t = v are in E.

Fig. 2. Evidence conditions

1. E¬ is restricted to variables.
2. E∀ requires less instances than T∀ admits.
3. E¬∀ admits all EFO terms as witnesses.
4. E= is restricted to type ι.

In § 7 we will show that every evident branch is satisﬁable. In § 9 we will prove

the completeness of a tableau system R that restricts the rule T∀ as suggested
by the evidence condition E∀ .

Example 5.1. Let p : (ιιι)o. The following branch is evident.

p(λxy.x), ¬p(λxy.y), (λxy.x) = (λxy.y), (λy.x) = (λy.y), x = y

6 Carriers

A carrier for an evident branch E consists of a set D and a relation ι ⊆ EFOι ×D

such that certain conditions are satisﬁed. We will show that every evident branch
has carriers, and that for every carrier (D, ι ) for an evident branch E we can
obtain a model I of E such that Iι = D and s ι Îs for all s ∈ EFOι . We call
ι a possible-values relation and read s ι a as s can be a. Given s ι a, we say
that a is a possible value for s.
Extended First-Order Logic 169

We assume that some evident branch E is given. We say that a set T ⊆ EFOι
is compatible if there are no terms s, t ∈ T such that ([s]=[t]) ∈ E. We write s t
if E contains the disequation s=t or t=s.
Let a non-empty set D and a relation ι ⊆ EFOι × D be given. For T ⊆ EFOι
and a ∈ D we write T ι a if t ι a for every t ∈ T . For all terms s, t ∈ EFOι , all
values a, b ∈ D, and every set T ⊆ EFOι we require the following properties:
B1 s ι a iff [s] ι a.
B2 T compatible iff T ι a for some a ∈ D.
B3 If (s=ι t) ∈ E and s ι a and t ι b, then a = b.
B4 For every a ∈ D either t ι a for some t ∈ DE or t ι a for every t ∈ EFOι .
Given an evident branch E, a carrier for E is a pair (D, ι) as specified above.

6.1 Quotient-Based Carriers

A branch A is complete if for all s, t ∈ EFOι either [s=t] is in A or [s=t] is in A.
We will show that complete evident branches have countable carriers that can
be obtained as quotients of EFOι with respect to the equations contained in the
branch.
Let E be a complete evident branch in the following. We write s ∼ t if s
and t are EFO terms of type ι and [s=ι t] ∈ E. We deﬁne s̃ := { t | t ∼ s } for
s ∈ EFOι .

Proposition 6.1. For all s, t ∈ EFOι : s t iﬀ [s=t] ∈ E.

Proposition 6.2. ∼ is an equivalence relation on EFOι .

Proposition 6.3. Let T ⊆ EFOι . Then T is compatible iﬀ s ∼ t for all s, t ∈ T .

Proof. By deﬁnition and N3, T is compatible if [s=t] ∈ E for all s, t ∈ T . The

claim follows with Proposition 6.1.

Lemma 6.4. Every complete evident branch has a countable carrier.

Proof. Let E be a complete evident branch. We deﬁne:

D := { s̃ | s ∈ EFOι }
s ι t̃ :⇐⇒ s ∼ t

We will show that (D, ι ) is a carrier for E. Note that ι is well-deﬁned since ∼
is an equivalence relation. D is countable since EFOι is countable.

B1. We have to show that s ∼ t iff [s] ∼ t. This follows with N3 and N1 since
s ∼ t iff [s=t] ∈ E and [s] ∼ t iff [[s]=t] ∈ E.
B2. If T is empty, B2 holds vacuously. Otherwise, let t ∈ T . Then T is compatible
iff s ∼ t for all s ∈ T by Propositions 6.3 and 6.2. Hence T is compatible iff s ι t̃
for all s ∈ T . The claim follows.
170 C.E. Brown and G. Smolka

B3. Let s=ι t in E and s ι ũ and t ι ṽ. Since s=t is normal, we have s ∼ t. By
deﬁnition of ι we have s ∼ u and t ∼ v. Hence ũ = ṽ since ∼ is an equivalence
relation.
B4. If DE is empty, then s ι t̃ for all s, t ∈ EFOι and hence the claim holds.
Otherwise, let DE be nonempty. We show the claim by contradiction. Suppose
there is a term t ∈ EFOι such that s ι t̃ for all s ∈ DE. Then [s=t] ∈ E for
all s ∈ DE by Proposition 6.1. Since DE is nonempty, we have [t] ∈ DE by N3.
Thus ([t]=[t]) ∈ E by N3. Contradiction by E= .

6.2 Discriminant-Based Carriers

We will now show that every evident branch has a carrier. Let an evident
branch E be given. We will call a term discriminating if it is discriminating
in E. A discriminant is a maximal set a of discriminating terms such that there
is no disequation s=t ∈ E such that s, t ∈ a. We will construct a carrier for E
whose values are the discriminants.

Example 6.5. Suppose E = {x=y, x=z, y =z} and x, y, z : ι. Then there are 3
discriminants: {x}, {y}, {z}.

Example 6.6. Suppose E = { an =ι bn | n ∈ N } where the an and bn are

pairwise distinct constants. Then E is evident and there are uncountably many
discriminants.

Proposition 6.7. If E contains exactly n disequations at ι, then there are at

most 2n discriminants. If E contains no disequation at ι, then ∅ is the only
discriminant.

Proposition 6.8. Let a and b be diﬀerent discriminants. Then:

1. a and b are separated by a disequation in E, that is, there exist terms s ∈ a

and t ∈ b such that s t.
2. a and b are not connected by an equation in E, that is, there exist no terms
s ∈ a and t ∈ b such that (s=t) ∈ E.

Proof. The ﬁrst claim follows by contradiction. Suppose there are no terms s ∈ a
and t ∈ b such that s t. Let s ∈ a. Then s ∈ b since b is a maximal compatible
set of discriminating terms. Thus a ⊆ b and hence a = b since a is maximal.
Contradiction.
The second claim also follows by contradiction. Suppose there is an equation
(s1 =s2 ) ∈ E such that s1 ∈ a and s2 ∈ b. By the ﬁrst claim we have terms s ∈ a
and t ∈ b such that s t. By Econ we have s1 s or s2 t. Contradiction since a
and b are discriminants.

Lemma 6.9. Every (ﬁnite) evident branch has a (ﬁnite) carrier.

Extended First-Order Logic 171

Proof. Let E be an evident branch. We deﬁne:

D := set of all discriminants

s ι a :⇐⇒ ([s] discriminating =⇒ [s] ∈ a)

We will show that (D, ι ) is a carrier for E. By Proposition 6.7 we know that D
is finite if E is finite.
B1. Holds by N1.
For the remaining carrier conditions we distinguish two cases. If DE = ∅, then ∅
is the only discriminant and B2, B3, and B4 are easily verified. Otherwise, let
DE = ∅.
B2⇒. Let T be compatible. Then there exists a discriminant a that contains all
the discriminating terms in { [t] | t ∈ T }. The claim follows since T a.
B2⇐. By contradiction. Suppose T a and T is not compatible. Then there are
terms s, t ∈ T such that ([s]=[t]) ∈ E. Thus [s] and [t] cannot be both in a. This
contradicts s, t ∈ T a since [s] and [t] are discriminating.
B3. Let (s=t) ∈ E and s ι a and t ι b. We show a = b. Since there are
discriminating terms, E contains at least one disequation at type ι, and hence
s and t are discriminating by Econ . By N3 s and t are normal and hence s ∈ a
and t ∈ b. Now a = b by Proposition 6.8 (2).
B4. Since there are discriminating terms, we know by E= that every discriminant
contains at least one discriminating term. Since discriminating terms are normal,
we have the claim.

7 Model Existence
We will now show that every evident branch has a model.
Lemma 7.1 (Model Existence). Let (D, ι ) be a carrier for an evident
branch E. Then E has a model I such that Iι = D.
We start the proof of Lemma 7.1. Let (D, ι ) be a carrier for an evident branch E.
For the rest of the proof we only consider interpretations I such that Iι = D.

7.1 Possible Values

To obtain a model of E, we need suitable values for all variables. We address this
problem by deﬁning possible-values relations σ ⊆ EFOσ ×Iσ for all types σ = ι:

s o 0 :⇐⇒ [s] ∈
/E
s o 1 :⇐⇒ ¬[s] ∈/E
s στ f :⇐⇒ st τ f a whenever t σ a

Note that we already have a possible-values relation for ι and that the deﬁnition
of the possible values relations for functional types is by induction on types. Also
172 C.E. Brown and G. Smolka

note that if s is an EFO formula such that [s] ∈ / E and ¬[s] ∈/ E, then both 0
and 1 are possible values for s. We will show that every EFO term has a possible
value and that we obtain a model of E if we deﬁne Ix as a possible value for x
for every variable x.

Proposition 7.2. Let s ∈ EFOσ and a ∈ Iσ. Then s σ a ⇐⇒ [s] σ a.

Proof. By induction on σ. For o the claim follows with N1. For ι the claim follows
with B1. Let σ = τ μ.
Suppose s σ a. Let t τ b. Then st μ ab. By inductive hypothesis [st] μ ab.
Thus [[s]t] μ ab by N2. By inductive hypothesis [s]t μ ab. Hence [s] σ a.
Suppose [s] σ a. Let t τ b. Then [s]t μ ab. By inductive hypothesis [[s]t] μ ab.
Thus [st] μ ab by N2. By inductive hypothesis st μ ab. Hence s σ a.

Lemma 7.3. For every EFO constant c: c Ic.

Proof. c = ⊥. The claim follows by E⊥ and N3.

c = ¬. Assume s o a. We show ¬s I(¬)a by contradiction. Suppose ¬s I(¬)a.
Case analysis.
a = 0. Then [s] ∈ / E and ¬[¬s] ∈ E. Thus ¬¬[s] ∈ E by N3. Hence [s] ∈ E
by E¬¬ . Contradiction.
a = 1. Then ¬[s] ∈ / E and [¬s] ∈ E. Contradiction by N3.
c = ∧. Assume s o a and t o b. We show s ∧ t I(∧)ab by contradiction. Suppose
s ∧ t I(∧)ab. Case analysis.
a = b = 1. Then ¬[s], ¬[t] ∈ / E and ¬[s ∧ t] ∈ E. Contradiction by N3 and
E¬∧ .
a = 0 or b = 0. Then [s] ∈ / E or [t] ∈
/ E, and [s ∧ t] ∈ E. Contradiction by
N3 and E∧ .
c = (=ι ). Assume s ι a and t ι b. We show (s=t) I(=ι )ab by contradiction.
Suppose (s=t) I(=ι )ab. Case analysis.
a = b. Then ¬[s=t] ∈ E and s, t ι a. By B2 {s, t} is compatible. Contradic-
tion by N3.
a = b. Then ([s]=[t]) ∈ E by N3. Hence a = b by B1 and B3. Contradiction.
c = ∀ι . Assume s ιo f . We show ∀ι s o I∀ι f by contradiction. Suppose ∀ι s o
I∀ι f . Case analysis.
I∀ι f = 0. Then ∀ι [s] ∈ E by N3 and f a = 0 for some value a. By E∀ and
B4 there exists a term t such that [[s]t] ∈ E and t ι a. Thus st f a = 0
and hence [st] ∈
/ E. Contradiction by N2.
I∀ι f = 1. Then ¬∀ι [s] ∈ E by N3. By E¬∀ we have ¬[[s]t] ∈ E for some term
t ∈ EFOι . By E= and B2 we have t a for some value a. Now st f a = 1.
Thus ¬[st] ∈/ E. Contradiction by N2.

We call an interpretation I admissible if it satisﬁes x Ix for every variable x.

We will show that admissible interpretations exist and that every admissible
interpretation is a model of E.

Lemma 7.4 (Admissibility). Let I be admissible and θ be a substitution such

that θx Ix for all x ∈ Dom θ. Then θ̂s Îs for every EFO term s.
Extended First-Order Logic 173

Proof. By induction on s. Let s be an EFO term. By assumption, θx is EFO for

all x ∈ Dom θ. Hence θ̂s is EFO by S5. Case analysis.
s = a. If a ∈ Dom θ, the claim holds by assumption. If a ∈
/ Dom θ, then θ̂s = a
by S1. If a is a constant, the claim holds by Lemma 7.3. If a is a variable, the
claim holds by assumption.
s = tu. Then θ̂s = (θ̂t)(θ̂u) by S2. Now θ̂t Ît and θ̂u Îu by the inductive
hypothesis. Now θ̂s = (θ̂t)(θ̂u) (Ît)(Îu) = Îs.
s = λx.t and x : σ. Moreover, let u σ a. We show (θ̂s)u (Îs)a. By Proposition 7.2
it suﬃces to show [(θ̂s)u] (Îs)a. We have [(θ̂s)u] = [θ xa t
x t] by S3 and (Îs)a = I
u
where I a denotes the interpretation that agrees everywhere with I but possibly
x

on x where it yields a. By inductive hypothesis we have θ xt I

u
x t. The claim
a
follows with Proposition 7.2.

7.2 Compatibility
It remains to show that there is an admissible interpretation and that every ad-
missible interpretation is a model of E. For this purpose we deﬁne compatibility
relations σ ⊆ EFOσ × EFOσ for all types:

s o t :⇐⇒ {[s], ¬[t]} ⊆ E and {¬[s], [t]} ⊆ E

s ι t :⇐⇒ not [s] [t]
s στ t :⇐⇒ su τ tv whenever u σ v

Note that the definition of the compatibility relations for functional types is by
induction on types. We say that s and t are compatible if s t. A set T of equi-
typed terms is compatible if s t for all terms s, t ∈ T . If T ⊆ EFOσ , we write
T a if a is a common possible value for all terms s ∈ T . We will show that a
set of equi-typed terms is compatible if and only if all its terms have a common
possible value.
The compatibility relations are reflexive. We first show x x for all vari-
ables x. For the induction to go through we strengthen the hypothesis.

Lemma 7.5 (Reﬂexivity). For every type σ and all EFO terms s, t, xs1 . . . sn ,
xt1 . . . tn of type σ with n ≥ 0:
1. Not both s σ t and [s] [t].
2. Either xs1 . . . sn σ xt1 . . . tn or [si ] [ti ] for some i ∈ {1, . . . , n}.

Proof. By mutual induction on σ. A similar proof can be found in [7].

Lemma 7.6 (Common Value). Let T ⊆ EFOσ . Then T is compatible if and

only if there exists a value a such that T σ a.

Proof. By induction on σ. A similar proof can be found in [7].

Lemma 7.7. Every admissible interpretation is a model of E.

174 C.E. Brown and G. Smolka

Proof. Let I be an admissible interpretation and s ∈ E. We show Îs = 1. Case

analysis.
Suppose s is a normal EFO term. Then s = [s] = [ˆ∅s] by S4 and s 0.
Moreover, ˆ∅s Îs by Lemma 7.4 and s Îs by Lemma 7.2. Hence Îs = 1.
Suppose s = (t=u) where t and u are normal EFO terms. Then t = [t] = [ˆ∅t]
and u = [u] = [ˆ∅u] by S4. We prove the claim by contradiction. Suppose Îs = 0.
Then Ît = Îu. Thus ˆ∅t, ˆ
∅u Ît by Lemma 7.4 and t, u Ît by Lemma 7.2. Hence
t u by Lemma 7.6. Thus not [t] [u] by Lemma 7.5 (1). Contradiction since
([t]=[u]) ∈ E.

We can now prove Lemma 7.1. By Lemma 7.5 (2) we know x x for every
variable x. Hence there exists an admissible interpretation I by Lemma 7.6. By
Lemma 7.7 we know that I is a model of E. This ﬁnishes the proof of Lemma 7.1.

Theorem 7.8 (Finite Model Existence)

Every ﬁnite evident branch has a ﬁnite model.

Proof. Follows with Lemmas 6.9 and 7.1.

Lemma 7.9 (Model Existence). Let E be an evident branch. Then E has a

model. Moreover, E has a countable model if E is complete.

Proof. Follows with Lemmas 6.9, 7.1, and 6.4.

8 Abstract Consistency
To obtain our main results, we boost the model existence lemma with the ab-
stract consistency technique. Everything works out smoothly.
An abstract consistency class is a set Γ of branches such that every branch
A ∈ Γ satisﬁes the conditions in Figure 3. An abstract consistency class Γ is
complete if for every A ∈ Γ and all s, t ∈ EFOι either A ∪ {[s=t]} is in Γ or
A ∪ {[s=t]} is in Γ .

Lemma 8.1 (Extension Lemma). Let Γ be an abstract consistency class and

A ∈ Γ . Then there exists an evident branch E such that A ⊆ E. Moreover, if Γ
is complete, a complete evident branch E exists such that A ⊆ E.

Proof. Let u0 , u1 , u2 , . . . be an enumeration of all formulas that can occur on a

branch. We construct a sequence A0 ⊆ A1 ⊆ A2 ⊆ · · · of branches such that
every An ∈ Γ . Let A0 := A. We define An+1 by cases. If there is no B ∈ Γ such
that An ∪ {un } ⊆ B, then let An+1 := An . Otherwise, choose some B ∈ Γ such
that An ∪ {un } ⊆ B. We consider four subcases.
1. If un is of the form ∀ι s, then choose An+1 to be B ∪ {[st]} ∈ Γ for some
t ∈ EFOι . This is possible since Γ satisfies C∀ .
2. If un is of the form ¬∀ι s, then choose An+1 to be B ∪ {¬[st]} ∈ Γ for some
t ∈ EFOι . This is possible since Γ satisfies C¬∀ .
Extended First-Order Logic 175

C⊥ ⊥ is not in A.
C¬ If ¬x is in A, then x is not in A.
C¬¬ If ¬¬s is in A, then A ∪ {s} is in Γ .
C∧ If s ∧ t is in A, then A ∪ {s, t} is in Γ .
C¬∧ If ¬(s ∧ t) is in A, then A ∪ {¬s} or A ∪ {¬t} is in Γ .
C∀ If ∀ι s is in A, then A ∪ {[st]} is in Γ for all t ∈ DA,
and A ∪ {[st]} is in Γ for some t ∈ EFOι
C¬∀ If ¬∀ι s is in A, then A ∪ {¬[st]} is in Γ for some t ∈ EFOι .
Cmat If xs1 . . . sn is in A and ¬xt1 . . . tn is in A where n ≥ 1,
then A ∪ {si = ti } is in Γ for some i ∈ {1, . . . , n}.
Cdec If xs1 . . . sn =ι xt1 . . . tn is in A where n ≥ 1,
then A ∪ {si = ti } is in Γ for some i ∈ {1, . . . , n}.
C= If s =ι t is in A, then s and t are diﬀerent.
Cbe If s =o t is in A, then either A ∪ {s, ¬t} or A ∪ {¬s, t} is in Γ .
Cfe If s =στ t is in A, then A ∪ {[sx] = [tx]} is in Γ for some variable x.
Ccon If s =ι t and u =ι v are in A,
then either A ∪ {s = u, t = u} or A ∪ {s = v, t = v} is in Γ .

Fig. 3. Abstract consistency conditions (must hold for every A ∈ Γ )

3. If un is of the form s =στ t, then choose An+1 to be B ∪ {[sx] = [tx]} ∈ Γ

for some variable x. This is possible since Γ satisfies Cfe .
4. If un has none of these forms, then let An+1 be B.

Let E := An . Note that DE = DAn . It is not difficult to verify that E is
n∈N n∈N
evident. For space reasons we will only show that E¬∀ is satisfied. Assume ¬∀ι s
is in E. Let n be such that un = ¬∀ι s. Let r ≥ n be such that ¬∀ι s is in Ar .
Hence ¬[st] ∈ An+1 ⊆ E for some t.
It remains to show that E is complete if Γ is complete. Let Γ be complete
and s, t ∈ EFOι . We show that [s = t] or [s = t] is in E. Let m, n be such that
um = [s=t] and un = [s=t]. We consider m < n, the case m > n is symmetric.
If [s=t] ∈ An , we have [s=t] ∈ E. If [s=t] ∈
/ An , then An ∪ {[s=t]} is not in Γ .
Hence An ∪ {[s = t]} is in Γ since Γ is complete. Hence [s = t] ∈ An+1 ⊆ E.

9 Completeness

We will now show that the tableau system T is complete. In fact, we will show
the completeness of a tableau system R that is obtained from T by restricting
the applicability of some of the rules. We consider R since it provides for more
focused proof search and also yields a decision procedure for three substantial
176 C.E. Brown and G. Smolka

fragments of EFO. R is obtained from T by restricting the applicability of the

rules T∀ , T¬∀ , and Tfe as follows:
– T∀ can only be applied to ∀ι s ∈ A with a term t ∈ EFOι if either t ∈ DA or
the following conditions are satisfied:
1. DA = ∅ and t is a variable.
2. t ∈ N A or N A = ∅.
3. There is no u ∈ EFOι such that [su] ∈ A.
– T¬∀ can only be applied to ¬∀ι s ∈ A if there is no t ∈ EFOι such that
¬[st] ∈ A.
– Tfe can only be applied to an equation (s=στ t) ∈ A if there is no variable
x : σ such that ([sx]=[tx]) ∈ A.
We use R∀ , R¬∀ , and Rfe to refer to the restrictions of T∀ , T¬∀ , and Tfe , respec-
tively. Note that R∀ provides a novel subterm restriction that may be useful for
proof search. We say a branch A is refutable if it can be refuted with R. Let ΓT
be the set of all finite branches that are not refutable.
Lemma 9.1. ΓT is an abstract consistency class.
Proof. For space limitations we only verify C¬∀ . Let ¬∀ι s ∈ A ∈ ΓT . Suppose
/ ΓT for every t ∈ EFOι . Then A ∪ {¬[st]} is refutable for every
A ∪ {¬[st]} ∈
t ∈ EFOι . Hence A is refutable using T¬∀ and the finiteness of A. Contradiction.

Theorem 9.2 (Completeness)

T and R can refute every unsatisfiable finite branch.
Proof. It suffice to show the claim for R. We prove the claim by contradiction.
Let A be an unsatisfiable finite branch that is not refutable. Then A ∈ ΓT and
hence A is satisfiable by Lemmas 9.1, 8.1, and 7.9.

10 Compactness and Countable Models

A branch A is sufficiently pure if for every type σ there are infinitely many
variables of type σ that do not occur in any formula of A. Let ΓC be the set of
all sufficiently pure branches A such that every finite subset of A is satisfiable.
We write ⊆f for the finite subset relation.
Lemma 10.1. ΓC is a complete abstract consistency class.
Proof. Omitted for space limitations. Not difficult.
Theorem 10.2 (Compactness)
A branch is satisfiable if each of its finite subsets is satisfiable.
Proof. Let A be a branch such that every finite subset of A is satisfiable. Without
loss of generality we assume A is sufficiently pure. Then A ∈ ΓC . Hence A is
satisfiable by Lemmas 10.1, 8.1, and 7.9.
Extended First-Order Logic 177

Theorem 10.3 (Countable Models)

Every satisfiable branch has a countable model.
Proof. Let A be a satisfiable branch. Without loss of generality we assume that A
is sufficiently pure. Hence A ∈ ΓC . By Lemmas 10.1 and 8.1 we have a complete
evident set E such that A ⊆ E. By Lemma 7.9 we have a countable model for E
and hence for A.
Theorem 10.4 (Countable Model Existence)
Every evident branch has a countable model.
Proof. Let E be an evident branch. By Lemma 7.9 we know that E is satisfiable.
By Theorem 10.3 we know that E has a countable model.

11 Decidability
The tableau system R defined in § 9 yields a procedure that decides the satisfi-
ability of three substantial fragments of EFO. Starting with the initial branch,
the procedure applies tableau rules until it reaches a branch that contains ⊥ or
cannot be extended with the tableau rules. The procedure returns “satisfiable”
if it arrives at a terminal branch that does not contain ⊥, and “unsatisfiable”
if it finds a refutation. There are branches on which the procedure does not
terminate (e.g., {∀ι x. f x=x}). We first establish the partial correctness of the
procedure.
Proposition 11.1 (Verification Soundness). Let A be a finite branch that
does not contain ⊥ and cannot be extended with R. Then A is evident and has
a finite model.
Proposition 11.2 (Refutation Soundness)
Every refutable branch is unsatisfiable.
For the termination of the procedure we consider the relation A → A that holds
if A and A are branches such that ⊥ ∈ / A A and A can be obtained from
A by applying a rule of R. We say that R terminates on a set Δ of branches if
there is no infinite derivation A → A → A → · · · such that A ∈ Δ.
Proposition 11.3. Let R terminate on a set Δ of finite branches. Then satis-
fiability of the branches in Δ is decidable and every satisfiable branch in Δ has
a finite model.
Proof. Follows with Propositions 11.2 and 11.1 and Theorem 7.8.
The decision procedure depends on the normalization operator employed with R.
A normalization operator that yields β-normal forms provides for all termination
results proven in this section. Note that the tableau system applies the normaliza-
tion operator only to applications st where s and t are both normal and t has type
ι if it is not a variable. Hence at most one β-reduction is needed for normalization
if s and t are β-normal. Moreover, no α-renaming is needed if the bound variables
are chosen differently from the free variables. For clarity, we continue to work with
an abstract normalization operator and state a further condition:
178 C.E. Brown and G. Smolka

N5 The least relation on terms such that

1. as1 . . . sn si if i ∈ {1, . . . , n}
2. s [sx] if s : στ and x : σ
terminates on normal terms.

A type is pure if it does not contain o. A term is pure if the type of every name
occurring in it (bound or unbound) is pure. An equation s = t or disequation
s = t is pure if s and t are pure terms.

Proposition 11.4 (Pure Termination). Let the normalization operator sat-

isfy N5. Then R terminates on ﬁnite branches containing only pure disequations.

Proof. Let A → A1 → A2 → · · · be a possibly inﬁnite derivation that issues from

a ﬁnite branch containing only pure disequations. Then no other rules but possibly
Tdec , Rfe , and T= apply and thus no Ai contains a formula that is not ⊥ or a pure
disequations (using S5). Using N5 it follows that the derivation is ﬁnite.

We now know that the validity of pure equations is decidable, and that the inva-
lidity of pure equations can be demonstrated with finite interpretations (Propo-
sition 11.1). Both results are well-known [9,10], but it is remarkable that we
obtain them with different proofs and as a byproduct.
It is well-known that satisfiability of Bernays-Schönfinkel-Ramsey formulas
(first-order ∃∗ ∀∗ -prenex formulas without functions) is decidable and the frag-
ment has the finite model property [3]. We reobtain this result by showing that
R terminates for the respective fragment. We call a type BSR if it is ι or o or
has the form ι . . . ιo. We call an EFO formula s BSR if it satisfies two conditions:

1. The type of every variable that occurs in s is BSR.

2. ∀ι does not occur below a negation in s.

For simplicity, our BSR formulas don’t provide for outer existential quantiﬁca-
tion. We need one more condition for the normalization operator:

N6 If s : ιo is BSR and x : ι, then [sx] is BSR.

Proposition 11.5 (BSR Termination). Let the normalization operator sat-

isfy N5 and N6. Then R terminates on ﬁnite branches containing only BSR
formulas.

Proof. Let A → A1 → A2 → · · · be a possibly inﬁnite derivation that issues

from a finite branch containing only BSR formulas. Then R¬∀ and Rfe are not
applicable and all Ai contain only BSR formulas (using N6). Furthermore, at
most one new variable is introduced. Since all terms of type ι are variables, there
is only a finite supply. Using N5 it follows that the derivation is finite.

In [7] we study lambda- and quantifier-free EFO and show that the concomitant
subsystem of R terminates on finite branches. The result extends to lambda-free
branches containing quantifiers (e.g., {∀ι f }).
Extended First-Order Logic 179

Proposition 11.6 (Lambda-Free Termination). Let the normalization op-

erator satisfy [s] = s for every lambda-free EFO term s. Then R terminates on
finite lambda-free branches.
Proof. An application of Rfe disables a disequation s=στ t and introduces new
subterms as follows: a variable x : σ, two terms sx : τ and tx : τ , and two
formulas sx=tx and sx=tx. Since the types of the new subterms are smaller
than the type of s and t, and the new subterms introduced by the other rules
always have type o or ι, no derivation can employ Rfe infinitely often.
Let A → A1 → A2 → · · · be a possibly infinite derivation that issues from a
finite lambda-free branch and does not employ Rfe . It suffices to show that the
derivation is finite. Observe that no new subterms of the form ∀ι s are introduced.
Hence only finitely many new subterms of type ι are introduced. Consequently,
only finitely many new subterms of type o are introduced. Hence the derivation
is finite.

12 Conclusion
In this paper we have shown that the EFO fragment of Church’s type theory en-
joys the characteristic properties of first-order logic. We have devised a complete
tableau system that comes with a new treatment of equality (confrontation) and
a novel subterm restriction for the universal quantifier (discriminating terms).
The tableau system decides lambda-free formulas, Bernays-Schönfinkel-Ramsey
formulas, and equations between pure lambda terms.

References
1. Andrews, P.B.: Classical type theory. In: Robinson, A., Voronkov, A. (eds.) Hand-
book of Automated Reasoning, vol. 2, pp. 965–1007. Elsevier Science, Amsterdam
(2001)
2. Brown, C.E.: Automated Reasoning in Higher-Order Logic: Set Comprehension
and Extensionality in Church’s Type Theory. College Publications (2007)
3. Börger, E., Grädel, E., Gurevich, Y.: The Classical Decision Problem. Springer,
Heidelberg (1997)
4. Smullyan, R.M.: First-Order Logic. Springer, Heidelberg (1968)
5. Fitting, M.: First-Order Logic and Automated Theorem Proving. Springer, Hei-
delberg (1996)
6. Prawitz, D.: Hauptsatz for higher order logic. J. Symb. Log. 33, 452–457 (1968)
7. Brown, C.E., Smolka, G.: Terminating tableaux for the basic fragment of simple
type theory. In: Giese, M., Waaler, A. (eds.) TABLEAUX 2009. LNCS (LNAI),
vol. 5607, pp. 138–151. Springer, Heidelberg (2009)
8. Hindley, J.R.: Basic Simple Type Theory. Cambridge Tracts in Theoretical Com-
puter Science, vol. 42. Cambridge University Press, Cambridge (1997)
9. Friedman, H.: Equality between functionals. In: Parikh, R. (ed.) Proc. Logic Col-
loquium 1972-73. Lectures Notes in Mathematics, vol. 453, pp. 22–37. Springer,
Heidelberg (1975)
10. Statman, R.: Completeness, invariance and lambda-deﬁnability. J. Symb.
Log. 47(1), 17–26 (1982)
Formalising Observer Theory for
Environment-Sensitive Bisimulation

Jeremy E. Dawson and Alwen Tiu

Logic and Computation Group

College of Engineering and Computer Science
Australian National University
Canberra ACT 0200, Australia
http://users.rsise.anu.edu.au/~ jeremy/
http://users.rsise.anu.edu.au/~ tiu/

Abstract. We consider a formalisation of a notion of observer (or in-

truder) theories, commonly used in symbolic analysis of security pro-
tocols. An observer theory describes the knowledge and capabilities of
an observer, and can be given a formal account using deductive sys-
tems, such as those used in various “environment-sensitive” bisimulation
for process calculi, e.g., the spi-calculus. Two notions are critical to the
correctness of such formalisations and the eﬀectiveness of symbolic tech-
niques based on them: decidability of message deduction by the observer
and consistency of a given observer theory. We consider a formalisation,
in Isabelle/HOL, of both notions based on an encoding of observer theo-
ries as pairs of symbolic traces. This encoding has recently been used in
a theory of open bisimulation for the spi-calculus. We machine-checked
some important properties, including decidability of observer deduction
and consistency, and some key steps which are crucial to the automation
of open bisimulation checking for the spi-calculus, and highlight some
novelty in our Isabelle/HOL formalisations of decidability proofs.

1 Introduction

In most symbolic techniques for reasoning about security protocols, certain as-
sumptions are often made concerning the capability of an intruder that tries
to compromise the protocols. A well-known model of intruder is the so-called
Dolev-Yao model [10], which assumes perfect crytography. We consider here a
formal account of Dolev-Yao intruder model, formalised as some sort of de-
duction system. This deductive formulation is used in formalisations of various
“environment-sensitive” bisimulations (see e.g., [6]) for process calculi designed
for modeling security protocols, such as the spi-calculus [3]. An environment-
sensitive bisimulation is a bisimulation relation which is indexed by a structure
representing the intruder’s knowledge, which we call an observer theory.
An important line of work related to the spi-calculus, or process calculi in
general, is that of automating bisimulation checking. The transition semantics of
these calculi often involve processes with inﬁnite branching (e.g., transitions for

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 180–195, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Formalising Observer Theory for Environment-Sensitive Bisimulation 181

input-preﬁxed processes in the π-calculus [12]), and therefore a symbolic method

is needed to deal with potential infinite branches lazily. The resulting bisimu-
lation, called symbolic bisimulation, has been developed for the spi-calculus [7].
The work reported in [7] is, however, only aimed at finding an effective approxi-
mation of environment-sensitive bisimulation, and there has been no metatheory
developed for this symbolic bisimulation so far. A recent work by the second au-
thor [14] attempts just that: to establish a symbolic bisimulation that has good
metatheory, in particular, a symbolic bisimulation which is also a congruence.
The latter is also called open bisimulation [13]. One important part of the for-
mulation of open bisimulation for the spi-calculus is a symbolic representation
of observer theories, which needs to satisfy certain consistency properties, in ad-
dition to closure under a certain notion of “respectful substitutions”, as typical
in formulations of open bisimulation.
A large part of the work on open bisimulation in [14] deals with establishing
properties of observer theories and their symbolic counterparts. This paper is
essentially about formally verifying the results of [14] concerning properties of
(symbolic) observer theories in Isabelle/HOL. In particular, it is concerned with
proving decidability of the deduction system for observer theory, correctness
of a finite characterisation of consistency of observer theories (hence decidabil-
ity of consistency of observer theories), and preservation of consistency under
respectful substitutions. Additionally, we also verify some key steps towards a
decision procedure for checking consistency of symbolic observer theories, which
is needed in automation of open bisimulation. A substantial formalisation work
described here concerns decidability proofs. Such proofs are difficult to formalise
in Isabelle/HOL, as noted in [17], due to the fact that Isabelle/HOL is based
on classical logic. We essentially follow [17] in that decidability in this case can
be inferred straightforwardly by inspection on the way we define total functions
corresponding to the decidability problems in question. That is, we show, by
meta-level inspection, that the definitions of the functions do not introduce any
infinite aspect and are therefore are finitely computable.
There is a recent work [11] in formalising the spi-calculus and a notion of
environment-sensitive bisimulation (called the hedged bisimulation [8]) in
Isabelle/HOL. However, this notion of bisimulation is a “concrete” bisimulation
(as opposed to symbolic), which means that the structure of observer theories
is less involved and much easier to deal with compared to its symbolic counter-
part. Our work on observer theories is mostly orthogonal to their work, and it
can eventually be integrated into their formalisation to provide a completely for-
malised open bisimulation for the spi-calculus. Such an integration may not be
too difficult, given that much of their work, e.g., formalisation of the operational
semantics of the spi-calculus, can be reused without modifications.
We assume that the reader is familar with Isabelle proof assistant, its object
logic HOL and logical frameworks in general. In the remainder of this section we
briefly describe relevant Isabelle notations used throughout the paper. In Section 2,
we give an overview of observer theories and an intuition behind them. We also give
a brief description of two problems that will be the focus of subsequent sections,
182 J.E. Dawson and A. Tiu

namely, those that concern decidability of consistency checking for (symbolic) ob-
server theories. In Section 3 we consider formalisation of a notion of theory reduc-
tion and decidability of consistency checking for observer theories. In Section 4
we discuss a symbolic representation of observer theories using pairs of symbolic
traces [5], called bi-traces, their consistency requirements and a notion of respectful
substitutions. We prove a key lemma which relates a symbolic technique for trace
reﬁnement [5] to bi-traces, and discuss how this may lead to a decision procedure
for testing bi-trace consistency. Section 5 concludes.

Isabelle notation. The Isabelle codes for the results of this paper can be found
at http://users.rsise.anu.edu.au/~jeremy/isabelle/2005/spi/. In the
statement of lemma or theorem, a name given in typewriter font indicates
the name of the relevant theorem in our Isabelle development. We show selected
theorems and deﬁnitions in the text, and more in the Appendix. A version of
the paper, including the Appendix, is in http://users.rsise.anu.edu.au/
~jeremy/pubs/spi/fotesb/. So now we indicate some key points of the Isabelle
notation.
– A name preceded by ? indicates a variable: other names are entities which
have been deﬁned as part of the theory
– Conclusion β depending on assumptions αi is [| α1 ;α2 ; . . . ;αn |] ==> β
– ∀, ∃ are written as ALL, EX
– ⊆, ⊇, ∈ are written as <=, >=, :

2 Observer Theory
An observer theory describes the knowledge accumulated by an observer in its
interaction with a process (in the form of messages sent over networks), and
its capability in analyzing and synthesizing messages. Since messages can be en-
crypted, and the encryption key may be unknown to the observer, it is not always
the case that the observer can decompose all messages sent over the networks.
In the presence of an active intruder, the traditional notion of bisimulation is
not fine grained enough to prove interesting equivalence of protocols. A notion
of bisimulation in which the knowledge and capability of the intruder is taken
into account is often called an environment-sensitive bisimulation.
Messages are expressions formed from names, pairing constructor, e.g.,
M, N , and symmetric encryption, e.g., {M }K , where K is the encryption key
and M is the message being encrypted. Note that we restrict to pairing and
encryption to simplify discussion; there is no difficulty in extending the set of
messages to include other constructors, including asymmetric encryption, natu-
ral numbers, etc. For technical reasons, we shall distinguish two kinds of names:
flexible names and rigid names. We shall refer to flexible names as simply names.
Names will be denoted with lower-case letters, e.g., a, x, y, etc., and rigid names
will be denoted with bold letters, e.g., a, b, etc. We let N denote the set of
names and N = denote the set of pairs (x, x) of the same name. A name is really
just a variable, i.e., a site for substitutions, and rigid names are just constants.
Formalising Observer Theory for Environment-Sensitive Bisimulation 183

This slightly diﬀerent terminology is to conform with a “tradition” in name-

passing process calculi where names are sometimes confused with variables (see
e.g., [13]). In the context of open bisimulation for the spi-calculus [14], names
stand for undetermined messages which can be synthesized by the observer.
There are two aspects of an observer theory which are relevant to bisimulation
methods for protocols verification (for a more detailed discussion, see, e.g., [2]):
– Message analysis and synthesis: This is often formalised as a deduction sys-
tem with judgments of the form Σ M , where Σ is a set of messages and
M is a message. The intuitive meaning is that the observer can derive M
given Σ. The deduction system is given in Figure 1 using sequent calculus.
The usual formulation is based on natural deduction, but there is an easy
correspondence between the two presentations (see [16] for details). One can
derive, for example, that Σ M holds if Σ {M }K and Σ K hold, i.e.,
if the observer can derive {M }K and the key K, then it can derive M .
– Indistinguishability of messages: This notion arises when an observer tries to
differentiate two processes based on the messages output by the processes. In
the presence of encryption, indistinguishability does not simply mean syntac-
tic equality. The judgment of interest in this case takes the form Γ M ↔ N
where Γ is a finite set of pairs of messages. It means, intuitively, that the
observer cannot distinguish between M and N , given the indistinguishabil-
ity assumption Γ. We shall not go into detailed discussion on this notion of
indistinguishability; it has been discussed extensively in the literature [2, 6,
8, 14]. Instead we give a proof system for message indistinguishability (or
message equivalence) in Figure 2.
Note that there are some minor differences between the inference rules in Figure 1
and Figure 2 and those given in [14]. That is, the “principal” message pairs for
the rules (pl) and (el) in [14], (Ma , Mb ,Na , Nb ) and ({Mp }Mk ,{Np }Nk ), are
also in the premises. We proved that the alternative system is equivalent and
that, in both systems, weakening on the left of is admissible: see Appendix A.1.
We note that, by a cut-admissibility-like result, it is possible to further remove
(Mk , Nk ) from the second premise of (el): see Appendix A.2.
Subsequent results in this paper are concerned mainly with the above notion of
indistinguishability. We therefore identify an observer theory with its underlying
indistinguishability assumptions (i.e., Γ in the second item above). Hence, from
now on, an observer theory (or theory) is a just finite set of pairs of messages,
and will be denoted with Γ . Given a theory Γ , we write π1 (Γ ) to denote the set
obtained by projecting on the first components of the pairs in Γ . The set π2 (Γ )
is defined analogously.
Observer theory consistency: An important notion in the theory of environ-
ment sensitive bisimulation is that of consistency of an observer theory. This
amounts to the requirement that any observation (i.e., any “destructive” oper-
ations related to constructors of the messages, e.g., projection, decryption) that
is applicable to the first projection of the theory is also applicable to the second
projection. For example, the theory {({a}b , {c}d ), (b, c)} is not consistent, since
on the first projection (i.e., the set {{a}b , b}), one can decrypt the first message
184 J.E. Dawson and A. Tiu

x∈N Σ M Σ N
(var) (id) (pr)
Σ x Σ, M M Σ M, N
Σ M Σ N Σ, M, N R Σ N Σ, M, N R
(er) (pl) (el)
Σ {M }N Σ, M, N R Σ, {M }N R

Fig. 1. A proof system for message synthesis

x∈N (M, N ) ∈ Γ
(var) (id)
Γ
x↔x Γ M ↔N
Γ Ma ↔ Na Γ Mb ↔ Nb Γ Mp ↔ Np Γ Mk ↔ Nk
(pr) (er)
Γ Ma , Mb ↔ Na , Nb Γ {Mp }Mk ↔ {Np }Nk
Γ, (Ma , Na ), (Mb , Nb ) M ↔ N
(pl)
Γ, (Ma , Mb , Na , Nb ) M ↔ N
Γ Mk ↔ Nk Γ, (Mp , Np ), (Mk , Nk ) M ↔ N
(el)
Γ, ({Mp }Mk , {Np }Nk ) M ↔ N

Fig. 2. A proof system for deducing message equivalence

{a}b using the second message b, but the same operation cannot be done on
the second projection. The formal definition of consistency involves checking all
message pairs (M, N ) such that Γ M ↔ N is derivable for certain similarity
of observations. The first part of this paper is about verifying that this infinite
quantification is not necessary. This involves showing that for every theory Γ ,
there is a corresponding reduced theory that is equivalent, but for which consis-
tency checking requires only checking finitely many message pairs.
Symbolic observer theory: The definition of open bisimulation for name-
passing calculi, such as the π-calculus, typically includes closure under a certain
notion of respectful substitutions [13]. In the π-calculus, this notion of respectful-
ness is defined w.r.t. to a notion of distinction among names, i.e., an irreflexive
relation on names which forbids identification of certain names. In the case of
the spi-calculus, things get more complicated because the bisimulation relation
is indexed by an observer theory, not just a simple distinction on names. We
need to define a symbolic representation of observer theories, and an appropri-
ate notion of consistency for the symbolic theories. These are addressed in [14]
via a structure called bi-traces. A bi-trace is essentially a list of pairs of messages.
It can be seen as a pair of symbolic traces, in the sense of [5]. The order of the
message pairs in the list indicates the order of their creation (i.e., by the intruder
or by the processes themselves). Names in a bi-trace indicate undetermined mes-
sages, which are open to instantiations. Therefore the notion of consistency of
bi-traces needs to take into account these possible instantiations. Consider the
following sequence of message pairs: (a, d), ({a}b , {d}k ), ({c}{x}b , {k}l ). Con-
sidered as a theory, it is consistent, since none of the encryption keys are known
to the observer. However, if we allow x to be instantiated to a, then the result-
ing theory {(a, d), ({a}b , {d}k ), ({c}{a}b , {k}l )} is inconsistent, since on the first
Formalising Observer Theory for Environment-Sensitive Bisimulation 185

projection, {a}b can be used as a key to decrypt {c}{a}b , while in the second
projection, no decryption is possible. Therefore to check consistency of a bi-
trace, one needs to consider potentially inﬁnitely many instances of the bi-trace.
Section 4 shows some key steps to simplify consistency checking for bi-traces.

3 Observer Theory Reduction and Consistency

We now discuss our formalisation of observer theory and its consistency proper-
ties in Isabelle/HOL.
The datatype for messages is represented in Isabelle/HOL as follows.
datatype msg = Name nat | Rigid nat | Mpair msg msg | Enc msg msg
A observer theory, as already noted, is a finite set of pairs of messages. In Isabelle,
we just use a set of pairs, so the finiteness condition appears in the Isabelle
statements of many theorems. The judgment Γ M ↔ N is represented by
(Γ, (M, N )), or, equivalently in Isabelle, (Γ, M, N ).
In Isabelle we define, inductively, a set of sequents indist which is the set of
sequents derivable in the proof system for message equivalence (Figure 2). Sub-
sequently we found it helpful to define the corresponding set of rules explicitly,
calling them indpsc. The rules for message synthesis, given in Figure 1, are just
a projection to one component of the rule set indpsc; we call this projection
smpsc. It is straightforward to extend the notion of a projection on rule sets,
so we can define the rules for message synthesis as simply smpsc = π1 (indpsc).
The formal expression in Isabelle is more complex: see Appendix A.3. Likewise,
we write pair (X) to turn each message M into the pair (M, M ) in a theory,
sequent, rule or bi-trace X.
The following lemma relates message synthesis and message equivalence.
Lemma 1(d) depends on theory consistency, to be introduced later.

Lemma 1. (a) (smpsc alt) Rule R ∈ smpsc iﬀ pair(R) ∈ indpsc

(b) (slice derrec smpsc empty) if Γ M ↔ N then π1 (Γ ) M
(c) (derrec smpsc eq) Σ M if and only if pair(Σ) M ↔ M
(d) (smpsc ex indpsc der) if π1 (Γ ) M and Γ is consistent, then there exists
N such that Γ M ↔ N

3.1 Decidability of and Computability of Theory Reduction

The first step towards deciding theory consistency is to define a notion of theory
reduction. Its purpose is to extract a “kernel” of the theory with no redundancy,
that is, no pairs in the kernel are derivable from the others. We need to establish
the decidability of , and then termination of the theory reduction. In [14],
Tiu observes that Γ M ↔ N is decidable, because the right rules (working
upwards) make the right-hand side messages smaller, and the left rules saturate
the antecedent theory with more pairs of smaller messages. Hence for a given
end sequent, there are only finitely many possible sequents which can appear in
186 J.E. Dawson and A. Tiu

any proof of the sequent. Some results relevant to this argument for decidability
are presented in Appendix A.4. Here we present an alternative proof for the
decidability of and termination of theory reduction.
Tiu [14, Deﬁnition 4] deﬁnes a reduction relation of observer theories:

Γ, (Ma , Mb , Na , Nb ) −→ Γ, (Ma , Na ), (Mb , Nb )

Γ, ({Mp }Mk , {Np }Nk ) −→ Γ, (Mp , Np ), (Mk , Nk )
if Γ, ({Mp }Mk , {Np }Nk ) Mk ↔ Nk

We assume that Γ does not contain (Ma , Mb ,Na , Nb ) and ({Mp }Mk ,{Np }Nk )
respectively (otherwise reduction would not terminate). This reduction relation
is terminating and conﬂuent, and so every theory Γ reduces to a unique normal
form Γ⇓. It also preserves the entailment .

Lemma 2. (a) [15, Lemma 15] (red nc) If Γ −→ Γ then Γ M ↔ N if and

only if Γ M ↔ N
(b) (nf nc) Assuming that Γ⇓ exists, Γ M ↔ N if and only if Γ⇓ M ↔ N

It is easy to show that −→ is well-founded, since the sum of the sizes of [the
first member of each of] the message pairs reduces each time. Confluence is
reasonably easy to see since the side condition for the second rule is of the
form Γ Mk ↔ Nk where Γ is exactly the theory being reduced, and, from
Lemma 2, this condition (for a particular Mk , Nk ) will continue to hold, or not,
when other reductions have changed Γ . Actually, proving confluence in Isabelle
was not so easy, and we describe the difficulty and our proof in Appendix A.6.
Then it is a standard result, and easy in Isabelle, that confluence and termination
give normal forms.

Theorem 3 (nf oth red). Any theory Γ has a −→-normal form Γ⇓ .

A diﬀerent reduction relation. As a result of Lemma 2, to decide whether

Γ M ↔ N one might calculate Γ ⇓ and determine whether Γ ⇓ M ↔ N ,
which is easier (see Lemma 5). However to calculate Γ ⇓ requires determining
whether Γ Mk ↔ Nk , so the decidability of this procedure is not obvious.
We deﬁned an alternative version, −→ , of the reduction relation, by changing
the condition in the second rule, so our new relation is:

Γ, (Ma , Mb , Na , Nb ) −→ Γ, (Ma , Na ), (Mb , Nb )

Γ, ({Mp }Mk , {Np }Nk ) −→ Γ, (Mp , Np ), (Mk , Nk ) if Γ Mk ↔ Nk

This deﬁnition does not give the same relation, but we are able to show that
the two relations have the same normal forms. Using this reduction relation,
the procedure to decide whether Γ M ↔ N is: calculate Γ ⇓ and determine
whether Γ ⇓ M ↔ N . Calculating Γ ⇓ requires deciding questions of the form
Γ Mk ↔ Nk , where Γ is smaller than Γ (because a pair ({Mp }Mk ,{Np }Nk )
is omitted). Thus this procedure terminates.
Formalising Observer Theory for Environment-Sensitive Bisimulation 187

Note that Lemma 2 also holds for −→ since −→ ⊆ −→.
To show the two relations have the same normal forms, we ﬁrst show (in
Theorem 4(b)) that if Γ is −→-reducible, then it is −→ -reducible, even though
the same reduction may not be available.
Theorem 4. (a) (red alt lem) If Γ Mk ↔ Nk then either
Γ \ {({Mp }Mk , {Np }Nk )} Mk ↔ Nk or there exists Γ such that Γ −→ Γ
(b) (oth red alt lem) If Γ −→ Δ then there exists Δ such that Γ −→ Δ
(c) (rsmin or alt) If Γ is −→ -minimal (i.e., cannot be reduced further) then
Γ is −→-minimal
(d) (nf acc alt) Γ −→ Γ⇓ (where Γ⇓ is the −→-normal form of Γ )
(e) (nf alt, nf same) Γ⇓ is also the −→ -normal form of Γ

Proof. We show a proof of (a) here. We prove a stronger result namely: If Γ

M ↔ N and size M ≤ size Qk then either Γ = Γ \ {({Qp}Qk , {Rp }Rk )} M ↔
N or there exists Δ such that Γ −→ Δ.
We prove it by induction on the derivation of Γ M ↔ N . If the derivation
is by the (var) rule, ie, (M, N ) = (x, x), then clearly Γ M ↔ N by the (var)
rule. If the derivation is by the (id) rule, ie, (M, N ) ∈ Γ , then the size condition
shows that (M, N ) ∈ Γ , and so Γ M ↔ N by the (id) rule.
If the derivation is by either of the right rules (pr) or (er), then we have
Γ M ↔ N and Γ M ↔ N , according to the rule used, with M and M
smaller than M . Then, unless Γ −→ Δ for some Δ, we have by induction Γ
M ↔ N and Γ M ↔ N , whence, by the same right rule, Γ M ↔ N .
If the derivation is by the left rule (pl), then Γ −→ Δ for some Δ.
If the derivation is by the left rule (el), then we apply the inductive hypothesis
to the first premise of the rule. Let the “principal” message pair for the rule be
({Mp }Mk , {Np }Nk ), so the first premise is Γ Mk ↔ Nk . Note that we apply
the inductive hypothesis to a possibly different pair of encrypts in Γ , namely
({Mp }Mk , {Np }Nk ) instead of ({Qp }Qk , {Rp }Rk ).
By induction, either Γ −→ Δ for some Δ or (since size Mk ≤ size Mk ), we
have Γ \{({Mp }Mk , {Np }Nk )} Mk ↔ Nk . Then we have Γ −→ Δ, as required,
where Δ = Γ \ {({Mp }Mk , {Np }Nk )}, (Mp , Np ), (Mk , Nk ).

Since the process of reducing a theory essentially replaces pairs of compound

messages with more pairs of simpler messages, this suggests that to show that
Γ M ↔ N for a reduced Γ , one need only use the rules which build up pairs
of compound messages on the right. That is, one would use the right rules (pr)
and (er), but not the left rules (pl) and (el). Let us define Γ r M ↔ N to
mean that Γ M ↔ N can be derived using the rules (var), (id), (pr) and (er)
of Figure 2. We call the set of these rules indpsc virt.
We define a function is der virt which shows how to test Γ r M ↔ N , and,
in Lemma 5(b), prove that it does this. It terminates because at each recursive
call, the size of M gets smaller. When we define a function in this way, Isabelle
requires termination to be proved (usually it can do this automatically). Then
inspection of the function definition shows that, assuming the theory oth is finite,
the function is finitely computable. We discuss this idea further later. We also
188 J.E. Dawson and A. Tiu

deﬁne a simpler function is der virt red, as an alternative to is der virt,

which gives the same result when Γ is reduced, see Appendix A.13.

recdef "is_der_virt" "measure (%(oth, M, N). size M)"

"is_der_virt (oth, Name i, Name j) = ((Name i, Name j) : oth | i = j)"
"is_der_virt (oth, Mpair Ma Mb, Mpair Na Nb) =
((Mpair Ma Mb, Mpair Na Nb) : oth |
is_der_virt (oth, Ma, Na) & is_der_virt (oth, Mb, Nb))"
"is_der_virt (oth, Enc Mp Mk, Enc Np Nk) =
((Enc Mp Mk, Enc Np Nk) : oth |
is_der_virt (oth, Mp, Np) & is_der_virt (oth, Mk, Nk))"
"is_der_virt (oth, M, N) = ((M, N) : oth)"

Lemma 5. (a) (nf no left) If Γ is reduced and Γ M ↔ N then Γ r M ↔ N

(b) (virt dec) Γ r M ↔ N if and only if is der virt (Γ, (M, N ))

We can now deﬁne a function reduce which computes a −→ -normal form.

recdef (permissive) "reduce" "measure (setsum (size o fst))"

"reduce S = (if infinite S then S else
let P = (%x. x : Mpairs <*> Mpairs & x : S) ;
Q = (%x. (if x : Encs <*> Encs & x : S then
is_der_virt (reduce (S - {x}), keys x) else False))
in if Ex P then reduce (red_pair (Eps P) (S - {Eps P}))
else if Ex Q then reduce (red_enc (Eps Q) (S - {Eps Q}))
else S)"

To explain this: P (M, N ) means (M, N ) ∈ S and M, N are both pairs;

Q (M, N ) means (M, N ) ∈ S and M, N are both encrypts, say {Mp }Mk , {Np }Nk ,
where S \ {(M, N )} r (Mk , Nk ); red pair and red enc do a single step reduc-
tion based on the message pairs or encrypts given as their argument, Ex P means
∃x. P x, and Eps P means some x satisfying P , if such exists. Thus the func-
tion selects arbitrarily a pair of message pairs or encrypts suitable for a single
reduction step, performs that step, and then reduces the result.
The expression measure (setsum (size o fst)) is the termination mea-
sure, the sum of the sizes of the first member of each message pair in a theory.
The function reduce is recursive, and necessarily terminates since at each iter-
ation this measure function, applied to the argument, is smaller. However this
function definition is sufficiently complicated that Isabelle cannot automatically
prove that it terminates — thus the notation (permissive) in the definition.
Isabelle produces a complex definition dependent on conditions that if we
change a theory by applying red pair or red enc, or by deleting a pair, then
we get a theory which is smaller according to the measure function. Since in the
HOL logic of Isabelle all functions are total, we have a function reduce in any
event; we need to prove the conditions to prove that reduce conforms to the
definition given above. We then get Theorem 6(a) and (b), which show how to
test Γ M ↔ N as a manifestly finitely computable function. We also prove a
useful characterisation of Γ⇓.
Formalising Observer Theory for Environment-Sensitive Bisimulation 189

Theorem 6. (a) (reduce nf, reduce nf alt) reduce Γ = Γ⇓

(b) (virt reduce, idvr reduce) Γ M ↔ N if and only if is der virt
(Γ⇓, (M, N )), equivalently, if and only if is der virt red (Γ⇓, (M, N ))
(c) (reduce alt) For (M, N ) ∈ N = , (M, N ) ∈ Γ⇓ \N = iﬀ
(i) Γ M ↔ N ,
(ii) M and N are not both pairs, and
(iii) if M = {Mp }Mk , N = {Np }Nk , then Γ Mk ↔ Nk

As Urban et al point out in [17] formalising decidability — or computability

— is difficult. It would require formalising the computation process, as distinct
from simply defining the quantity to be computed. However, as is done in [17,
§3.4], it is possible to define a quantity in a certain way which makes it reason-
able to assert that it is computable. This is what we have aimed to do in defining
the function reduce. It specifies the computation to be performed (with a caveat
mentioned later). Isabelle requires us to show that this computation is termi-
nating, and we have shown that it produces the −→ -normal form. To ensure
termination, we needed to base the definition of reduce on −→ , not on −→,
but by Theorem 4(e), −→ and −→ have the same normal forms.
Certain terminating functions are not obviously computable, for example
f x = (∃y. P y) (even where P is computable). So our definition of reduce
requires inspection to ensure that it contains nothing which makes it not com-
putable. It does contain existential quantifiers, but they are in essence quan-
tification over a finite set. The only problem is the subterms Eps P and Eps Q,
that is x. P x and x. Q x. These mean “some x satisfying P ” (similarly Q).
In Isabelle’s logic, this means some x, but we have no knowledge of which one
(and so we cannot perform precisely this computation). But our proofs went
through without any knowledge of which x is denoted by x. P x. Therefore it
would be safe to implement a computation which makes any choice of x. P x,
and we can safely assert that our proofs would still hold for that computation.1
That is, in general we assert that if a function involving x. P x can be proven
to have some property, then a function which replaces x. P x by some other
choice of x (satisfying P if possible) would also have that property. Based on
this assertion we say that our definition of reduce shows that the −→-normal
form is computable, and so that Γ M ↔ N is decidable.
We found that although the definition of reduce gives the function in a com-
putable form, many proofs are much easier using the characterisation as the
normal form. For example Lemma 2(b) is much easier using Lemma 2(a) than
using the definition of reduce. We found this with some other results, such as:
if Γ is finite, then so is Γ⇓, and if Γ consists of identical pairs then so does Γ⇓.
Since also Γ and Γ ⇓ entail the same message pairs, it is reasonable to ask
which theories, other than those with the same normal form as Γ , entail the
same message pairs as Γ . Now it is clear, due to the (var) rule, that deleting
(x, x) from a theory does not change the set of entailed message pairs or the
1
In general a repeated choice must be made consistently; the HOL logic does imply
x. P x = x. P x. This point clearly won’t arise for the reduce function.
190 J.E. Dawson and A. Tiu

reductions available. However we ﬁnd that the condition is that theories entail
the same pairs iﬀ their normal forms are equal, modulo N = .
We could further change −→ by deleting the (Mk , Nk ) from the second rule.
Lemma 2(a) holds for this new relation. For further discussion see Appendix A.7.

Theorem 7. (a) (rsmin names) Γ is reduced if and only if Γ \ N = is reduced

(b) [15, Lemma 8] (name equivd) Γ M ↔ N if and only if Γ \ N = M ↔ N
(c) (nf equiv der) Theories Γ1 and Γ2 entail the same message pairs if and
only if Γ1⇓ \N = = Γ2⇓ \N =

3.2 Theory Consistency

Deﬁnition 8. [15, Deﬁnition 11] A theory Γ is consistent if for every M and
N , if Γ M ↔ N then the following hold:
(a) M and N are of the same type of expressions, i.e., M is a pair (an encrypted
message, a (rigid) name) if and only if N is.
(b) If M = {Mp }Mk and N = {Np }Nk then π1 (Γ ) Mk implies Γ Mk ↔ Nk
and π2 (Γ ) Nk implies Γ Mk ↔ Nk .
(c) For any R, Γ M ↔ R implies R = N and Γ R ↔ N implies R = M.

This definition of consistency involves infinite quantification. We want to elim-

inate this quantification by finding a finite characterisation on reduced theories.
But first, let us define another equivalent notion of consistency, which is simpler
for verification, as it does not use the deduction system for message synthesis.

Definition 9. A theory Γ satisfies the predicate thy cons if for every M and
N , if Γ M ↔ N then the following hold:
(a) M and N are of the same type of expressions, i.e., as in Definition 8(a)
(b) for every M, N , Mp , Np if Γ M ↔ N or Γ {Mp }M ↔ {Np }N , then
M = M iff N = N

Lemma 10. (a) (thy cons equiv) Γ is consistent iff it satisfies Definition 9

(b) (thy cons equivd) Γ is consistent if and only if Γ \ N = is consistent
(c) [15, Lemma 19] (nf cons) Γ is consistent if and only if Γ⇓ is consistent
(d) (cons der same) If Γ1 and Γ2 entail the same message pairs then Γ1 is
consistent if and only if Γ2 is consistent

Tiu [15, Proposition 20] gives a characterisation of consistency (reproduced be-

low in Proposition 11) which is finitely checkable. In Definition 12 we define
a predicate thy cons red which is somewhat similar. In Theorem 13 we show
that, for a reduced theory, that our thy cons red is equivalent to consistency
and to the conditions in Proposition 11. Decidability of consistency then follows
from decidability of , and termination of normal form computation.

Proposition 11. [15, Proposition 20] A theory Γ is consistent if and only if

Γ⇓ satisﬁes the following conditions: if (M, N ) ∈ Γ⇓ then
Formalising Observer Theory for Environment-Sensitive Bisimulation 191

(a) M and N are of the same type of expressions, in particular, if M = x, for

some name x, then N = x and vice versa,
(b) if M = {Mp }Mk and N = {Np }Nk then π1 (Γ⇓) Mk and π2 (Γ⇓) Nk .
(c) for any (U, V ) ∈ Γ⇓, U = M if and only if V = N .
Definition 12. A theory Γ satisfies the predicate thy cons red if
(a) for all (M, N ) ∈ Γ , M and N satisfy Proposition 11(a)
(b) for all (M, N ) and (M , N ) ∈ Γ , M = M iff N = N
(c) for all ({Mp }Mk , {Np }Nk ) ∈ Γ , for all M, N such that Γ M ↔ N ,
M = Mk and N = Nk
Theorem 13. (a) (tc red iff) Γ is consistent iff Γ⇓ satisfies thy cons red
(b) (thy cons red equiv) Γ ⇓ satisifes Proposition 11(a) to (c) iff it satisfies
thy cons red, ie, Definition 12(a) to (c)

4 Respectful Substitutions and Bi-trace Consistency

We now consider a symbolic representation of observer theories from [14], given
below. We denote with f n(M ) the set of names in M . This notation is extended
straightforwardly to pairs of messages, lists of (pairs of) messages, etc.
Definition 14. A bi-trace is a list of message pairs marked with i (indicating in-
put) or o (output), i.e., elements in a bi-trace have the form (M, N )i or (M, N )o .
Bi-traces are ranged over by h. We denote with π1 (h) the list obtained from h
by taking the first component of the pairs in h. The list π2 (h) is defined anal-
ogously. Bi-traces are subject to the following restriction: if h = h1 .(M, N )o .h2
then f n(M, N ) ⊆ f n(h1 ). We write {h} to denote the set of message pairs ob-
tained from h by forgetting the marking and the order.
Names in a bi-trace represent symbolic values which are input by a process at
some point. This explains the requirement that the free names of an output pair
in a bi-trace must appear before the output pair. We express this restriction on
name occurrences by defining a predicate validbt on lists of marked message
pairs, and we do not mention it in the statement of each result, although it does
appear in their statements in Isabelle. In our Isabelle representation the list is
reversed, so that the latest message pair is the first in the list. The theory {h}
obtained from a bi-trace h is represented by oth of h. Likewise for a list s of
marked messages (which can be seen as a symbolic trace [5]), we can define the
set {s} of messages by forgetting the annotations and ordering.
A substitution pair θ = (θ1 , θ2 ) replaces free names x ∈ N by messages, using
substitutions θ1 (θ2 ) for the first (second) component of each pair. For a bi-trace
h,
θ respects h, or is h-respectful [15, Definition 34], if for every free name x in
an input pair (M, N )i , {h }θ xθ1 ↔ xθ2 , where h is the part of h preceding
(M, N ) . This is expressed in Isabelle by h ∈ bt resp θ.
i

Definition 15. [15, Definition 35] The set of consistent bi-traces are defined
inductively (on the length of bi-traces) as follows:
192 J.E. Dawson and A. Tiu

(a) The empty bi-trace is consistent.

(b) If h is a consistent bi-trace then h.(M, N )i is also a consistent bi-trace,
provided that h M ↔ N .
(c) If h is a consistent bi-trace, then h = h.(M, N )o is a consistent bi-trace,

provided that for every h-respectful substitution pair θ,

if hθ is a consistent bi-trace then {h θ} is a consistent theory.
Given Lemma 16(c) below, it may appear that leaving out the underlined words
of Definition 15 would make no difference. This minor fact can indeed be proved
formally: details are given in Appendix A.16.
The following are significant lemmas from [15] which we proved in Isabelle.
As an illustration of the value of automated theorem proving, we found that the
original proof of (b) in a draft of [15] contained an error (which was easily fixed).
Lemma 16. (a) [15, Lemma 24] (subst indist) Let Γ M ↔ N and let

θ = (θ1 , θ2 ) be a substitution pair such that for every free name x in Γ , M
or N , Γ θ θ1 (x) ↔ θ2 (x). Then Γ θ M θ1 ↔ N θ2 .
(b) [15, Lemma 40] (bt resp comp) Let h be a consistent bi-trace, let θ =
(θ1 , θ2 ) be an h-respectful substitution pair, and let γ = (γ1 , γ2 ) be an hθ-
respectful substitution pair. Then θ ◦ γ is also h-respectful.
(c) [15, Lemma 41] (cons subs bt) If h is a consistent bi-trace and θ = (θ1 , θ2 )
respects h, then h θ is also a consistent bi-trace.

Respectfulness of a substitution relative to a theory. Testing consis-

tency of bi-traces involves testing whether a theory Γ is consistent after apply-
ing any respectful substitution pair θ to it. We will present some results that
(under certain conditions) if we reduce {h} ﬁrst, and then apply an h-respectful
substitution, then the result is a reduced theory, to which the simpler test for
consistency, thy cons red, applies.
The complication here is that reduction applies to a theory whereas the
deﬁnition of bi-trace consistency crucially involves the ordering of the pairs of
messages. We overcome this by devising the notion, thy strl resp, of a sub-
stitution being respectful with respect to an (unordered) theory and an ordered
list of sets of variable names. Importantly, this property holds for {h} where θ is
h-respectful, and it is preserved by reducing a theory. We use this to prove some
later results involving {h}⇓ and h-respectful substitutions, such as Theorem 17.
Details are in Appendix A.19.

Simplifying testing consistency after substitution. Recall that a theory

Γ is consistent if and only if Γ⇓ is consistent (Lemma 10(c)), and if and only if
Γ \N = is consistent (Lemma 10(b)). Thus, to determine whether Γ is consistent,
one may calculate Γ ⇓ or Γ ⇓ \N = (which is reduced, by Lemma 7(a)), and use
the function thy cons red (by virtue of Theorem 13(a)). Therefore, the naive
approach to testing bi-trace consistency is to apply θ to Γ and then reduce
the result, and delete pairs (x, x) ∈ N = . We can derive results which permit a
simpler approach.
Formalising Observer Theory for Environment-Sensitive Bisimulation 193

Theorem 17. Let h be a bi-trace, and let Γ = {h}. Let θ be an h-respectful

substitution pair, and denote its action on Γ by θ also.
(a) (nf subst nf Ne) Γ θ⇓ \N = = (Γ⇓ \N = ) θ⇓ \N =
(b) (subst nf Ne tc) Γ θ is consistent if and only if (Γ⇓ \N = )θ is consistent
This, given a bi-trace h and a respectful substitution pair θ, if one wants to
test whether Γ θ = {h
θ} is consistent, it makes no difference to the consistency
of the resulting theory if one reduces the theory and deletes pairs (x, x) before
substituting. This means that we need only consider substitution in a theory
which is reduced and has pairs (x, x) removed.
If we disallow encryption where keys are themselves pairs or encrypts, then
further simplification is possible. Thus we will require that keys are atomic (free
names or rigid names, Name n or Rigid n), both initially and after substitution.
Theorem 18 (subs not red ka). Let Γ be reduced, consistent and have atomic
keys. Then (Γ \ N = )
θ is reduced.
Thus, if keys are atomic, the effect of Theorem 18 is to simplify the consistency
test thus: to test the consistency of the substituted theory Γ θ, one reduces Γ to
Γ⇓ and deletes pairs (x, x) to get Γ = Γ⇓ \N = . One then considers substitution
θ of Γ , knowing that any Γ
pairs θ is reduced and so the simpler criterion for
theory consistency, thy cons red, applies to it. Thus we get:
Theorem 19. Let h be a bi-trace, and let Γ = {h}, where Γ is consistent with
atomic keys. Let
θ be an h-respectful substitution pair, and write Γ θ = {hθ}.
(a) (nfs comm) Γ
θ⇓ \N = = (Γ⇓ \N = ) θ \ N=
(b) (nfs comm tc) Γ θ is consistent iff thy cons red holds of (Γ⇓ \N = )θ \ N =

Unique Completion of a Respectful Substitution. A bi-trace can be pro-

jected into the fist or second component of each pair, giving lists of marked
messages. We can equally project the definition of a respectful substitution pair,
so that for a list s of marked messages, substitution θi respects s, s ∈ sm resp
θi , iff for every free name x in an input message M i , {s }θi xθi , where
is here the message synthesis relation, and {s } is the set of marked messages
prior to M i . Given h, whose projections are s1 , s2 , if θ respects h then clearly
θi respects si (proved as bt sm resp, see Appendix A.20). Conversely, given θi
which respects si (i = 1 or 2), can we complete θi to an h-respectful pair θ?

Theorem 20 (subst exists, subst unique). Given a consistent bi-trace h

whose projections to a single message trace are s1 and s2 , and a substitution
θ1 which respects s1 , there exists θ2 such that θ = (θ1 , θ2 ) respects h, and θ2 is
“unique” in the sense that any two such θ2 act the same on names in π2 (h).
We deﬁned a function which, given θ1 in this situation, returns θ2 . First we
deﬁned a function match rc1 which, given a theory Γ and a message M , “at-
tempts” to determine a message N such that Γ M ↔ N . By Theorem 20 such
a message is unique if Γ is consistent.
194 J.E. Dawson and A. Tiu

The definition of match rc1 (Appendix A.24) follows that of is der virt red
(Appendix A.13), so Theorem 21(a) holds whether or not Γ is actually reduced.
It will be seen that it involves testing for membership of a finite set, and
corresponding uses of the operator, (as in the case of reduce, as discussed
earlier). Therefore we assert that match rc1 is finitely computable.
The return type of match rc1 is message option, which is Some res if the
result res is successfully found, or None to indicate failure.
Theorem 21. (a) (match rc1 iff idvr) If Γ satisfies thy cons red, then
is der virt red (Γ, M, N ) iff match rc1 Γ M = Some N
(b) (match rc1 indist) If Γ is consistent, then
Γ M ↔ N iff match rc1 Γ⇓ M = Some N
Then we defined a function second sub which uses match rc1 to find the appro-
priate value of xθ2 for each new x which appears in the bi-trace, and we proved
that second sub does in fact compute the θ2 of Theorem 20. See Appendix A.26
for the definition of second sub and this result. The function second sub tests
membership of a finite set, and uses reduce and match rc1, so we assert that
second sub is also finitely computable.

5 Conclusions and Further Work

We have modelled observer theories and bi-traces in the Isabelle theorem prover,
and have confirmed, by proofs in Isabelle, the results of a considerable part of
[14]. This work constitutes a significant step formalising open bisimulation for
the spi-calculus in Isabelle/HOL, and ultimately towards a logical framework for
proving process equivalence.
We discussed the issue of showing finite computability in Isabelle/HOL, using
a mixed formal/informal argument, and building upon the discussion in Urban
et al [17]. We defined a function reduce in Isabelle, and showed that it computes
Γ⇓. Isabelle required us to show that the function terminates. We asserted, with
relevant discussion, that inspection shows that the definition does not introduce
any infinite aspect into the computation and so asserted that therefore the func-
tion is finitely computable. Similarly, we provided a finitely computable function
is der virt and proved that it tests Γ M ↔ N for a reduced theory Γ .
We then considered bi-traces and bi-trace consistency. The problem here is
that, to test bi-trace consistency, it is necessary to test whether Γ θ is consistent
for all θ satisfying certain conditions. We proved a number of lemmas which
simplify this task, and appear to lead to a finitely computable algorithm for this.
In particular, our result on the unique completion of respectful substitutions that
relates symbolic trace and bi-trace opens up the possibility to use symbolic trace
refinement algorithm [5] to compute a notion of bi-trace refinement, which will
be useful for bi-trace consistency checking.
Another approach to representating observer theories is to use equational
theories, instead of deduction rules, e.g., as in the applied-pi calculus [1]. In this
setting, the notion of consistency of a theory is replaced by the notion of static
Formalising Observer Theory for Environment-Sensitive Bisimulation 195

equivalence between knowledge of observers [1]. Baudet has shown that static
equivalence between two symbolic theories is decidable [4], for a class of theories
called subterm-convergent theories (which subsumes the Dolev-Yao model of
intruder). It will be interesting to work out the precise correspondence between
static equivalence and our notion of bi-trace consistency, as such correspondence
may transfer proof techniques from one approach to the other.
Acknowledgment. We thank the anonymous referees for their comments on an
earlier draft. This work is supported by the Australian Research Council through
the Discovery Projects funding scheme (project number DP0880549).

References
1. Abadi, M., Fournet, C.: Mobile values, new names, and secure communication. In:
POPL, pp. 104–115 (2001)
2. Abadi, M., Gordon, A.D.: A bisimulation method for cryptographic protocols.
Nord. J. Comput. 5(4), 267–303 (1998)
3. Abadi, M., Gordon, A.D.: A calculus for cryptographic protocols: The spi calculus.
Information and Computation 148(1), 1–70 (1999)
4. Baudet, M.: Sécurité des protocoles cryptographiques: aspects logiques et calcula-
toires. PhD thesis, École Normale Supérieure de Cachan, France (2007)
5. Boreale, M.: Symbolic trace analysis of cryptographic protocols. In: Orejas, F.,
Spirakis, P.G., van Leeuwen, J. (eds.) ICALP 2001. LNCS, vol. 2076, pp. 667–681.
Springer, Heidelberg (2001)
6. Boreale, M., De Nicola, R., Pugliese, R.: Proof techniques for cryptographic pro-
cesses. SIAM J. Comput. 31(3), 947–986 (2001)
7. Borgström, J., Briais, S., Nestmann, U.: Symbolic bisimulation in the spi calculus.
In: Gardner, P., Yoshida, N. (eds.) CONCUR 2004. LNCS, vol. 3170, pp. 161–176.
Springer, Heidelberg (2004)
8. Borgström, J., Nestmann, U.: On bisimulations for the spi calculus. Mathematical
Structures in Computer Science 15(3), 487–552 (2005)
9. Dawson, J.E., Goré, R.: Formalising cut-admissibility for provability logic (submit-
ted, 2009)
10. Dolev, D., Yao, A.: On the security of public-key protocols. IEEE Transactions on
Information Theory 2(29) (1983)
11. Kahsai, T., Miculan, M.: Implementing spi calculus using nominal techniques. In:
Beckmann, A., Dimitracopoulos, C., Löwe, B. (eds.) CiE 2008. LNCS, vol. 5028,
pp. 294–305. Springer, Heidelberg (2008)
12. Milner, R., Parrow, J., Walker, D.: A calculus of mobile processes, Part II. Infor-
mation and Computation, 41–77 (1992)
13. Sangiorgi, D.: A theory of bisimulation for the pi-calculus. Acta Inf. 33(1), 69–97
(1996)
14. Tiu, A.: A trace based bisimulation for the spi calculus: An extended abstract. In:
Shao, Z. (ed.) APLAS 2007. LNCS, vol. 4807, pp. 367–382. Springer, Heidelberg
(2007)
15. Tiu, A.: A trace based bisimulation for the spi calculus. Preprint (2009),
http://arxiv.org/pdf/0901.2166v1
16. Tiu, A., Goré., R.: A proof theoretic analysis of intruder theories. In: Proceedings
of RTA 2009 (to appear, 2009)
17. Urban, C., Cheney, J., Berghofer, S.: Mechanizing the metatheory of LF. In: LICS,
pp. 45–56. IEEE Computer Society, Los Alamitos (2008)
Formal Certiﬁcation of a Resource-Aware
Language Implementation

Javier de Dios and Ricardo Peña

Universidad Complutense de Madrid, Spain
C/ Prof. José Garcı́a Santesmases s/n. 28040 Madrid
Tel.: 91 394 7627; Fax: 91 394 7529
[email protected], [email protected]

Abstract. The paper presents the development, by using the proof as-
sistant Isabelle/HOL, of a compiler back-end translating from a func-
tional source language to the bytecode language of an abstract machine.
The Haskell code of the compiler is extracted from the Isabelle/HOL
specification and this tool is also used for proving the correctness of the
implementation. The main correctness theorem not only ensures func-
tional semantics preservation but also resource consumption preserva-
tion: the heap and stacks figures predicted by the semantics are confirmed
in the translation to the abstract machine.
The language and the development belong to a wider Proof Carrying
Code framework in which formal compiler-generated certificates about
memory consumption are sought for.

Keywords: compiler veriﬁcation, functional languages, memory

management.

1 Introduction
The first-order functional language Safe has been developed in the last few years
as a research platform for analysing and formally certifying two properties of
programs related to memory management: absence of dangling pointers and
having an upper bound to memory consumption.
Two features make Safe different from conventional functional languages:
(a) the memory management system does not need a garbage collector; and
(b) the programmer may ask for explicit destruction of memory cells, so that
they could be reused by the program. These characteristics, together with the
above certified properties, make Safe useful for programming small devices where
memory requirements are rather strict and where garbage collectors are a burden
both in space and in service availability.
The Safe compiler is equipped with a battery of static analyses which infer
such properties [15,16,17,22]. These analyses are carried out on an intermediate
language called Core-Safe (explained in Sec. 2.1), obtained after type-checking
and desugaring the source language called Full-Safe. The back-end comprises
two more phases:
1. A translation from Core-Safe to the bytecode language of an imperative
abstract machine of our own, called the Safe Virtual Machine (SVM). We
call this bytecode language Safe-Imp and it is explained in Sec. 2.4.

Work partially funded by the projects TIN2008-06622-C03-01/TIN (STAMP), and
S-0505/ TIC/ 0407 (PROMESAS).

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 196–211, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Formal Certiﬁcation of a Resource-Aware Language Implementation 197

2. A translation from Safe-Imp to the bytecode language of the Java Virtual

Machine (JVM) [13].
We have proved our analyses correct and are currently generating Isabelle/HOL
[20] scripts which, given a Core-Safe program and the annotations produced by
the analyses, will mechanically certify that the program satisfies the properties in-
ferred by the analyses. The idea we are trying to implement, consistently with the
Proof Carrying Code (PCC) paradigm [18], is sending the code generated by the
compiler together with the Isabelle/HOL scripts to a hypothetical code consumer
who, using another Isabelle/HOL system and a database of previously proved the-
orems, will check the property and consequently trust the code. The annotations
consist of special types in the case of the absence of dangling pointers property, and
will consist of some polynomials when the space consumption analysis is finished.
At this point of the development we were confronted with two alternatives:
• Either to translate the properties obtained at Core-Safe level to the level of
the JVM bytecode, by following for instance some of the ideas of [2].
• Or to provide the certificates at the Core-Safe level. Then the consumer
should trust that our back-end does not destroy the Core-Safe properties, or
better, we should provide evidence that these properties are preserved.
The first alternative was not very appealing in our case. Differently to [2], where
the certificate transformation is carried on at the same intermediate language,
here the distance between our Core-Safe language and the target language is
very large: the first one is functional and the second one is a kind of assembly
language; new structures such as the frames stack, the operand stack, or the
program counter are present in the second but not in the first; we have built a
complete memory management runtime system on top of the JVM in order to
avoid its built-in garbage collector, etc. The translated certificate should provide
invariants and properties for all these structures. Even if all this work were done,
the size of the certificates and the time needed to check them would very probably
be huge. The figures reported in [26] for JVM bytecode-level certificates seem to
confirm this assertion.
The second alternative has other drawbacks. One of them is that the Core-
Safe program must be part of the certificate, because the consumer must be
able to relate the properties stated at source level with the low-level code being
executed. Providing the source code is not allowed in some PCC scenarios. The
second drawback is that the back-end should be formally verified, and both
the translation algorithm, and the theorem proving its correctness must be in
the consumer database. We have chosen this second alternative because smaller
certificates can be expected, but also because we feel that proving the translation
correct once for all programs is more reasonable in our case than checking this
correctness again and again for every translated program.
Machine-assisted compiler certification has been developed by several authors
in the last few years. In Sec. 6 we review some of these works. For the certifi-
cation being really trusty, the code running in the compiler’s back-end should
be exactly the same which has been proved correct by the proof-assistant. For-
tunately, modern proof-assistants such as Coq [4] and Isabelle/HOL provide
code extraction facilities which deliver code written in some wider-use languages
such as Caml or Haskell. Of course, one must trust the translation done by the
proof-assistant.
198 J. de Dios and R. Peña

In this paper we present the certification of the first pass explained above
(Core-Safe to Safe-Imp). The second pass (Safe-Imp to JVM bytecode) is cur-
rently being completed. The reader can find a preliminary version of it in [21].
The main improvement of this work with respect to previous efforts in com-
piler certification is that we prove, not only the preservation of functional se-
mantics, but also the preservation of the resource consumption properties. As
it is asserted in [11], this property can be lost as a consequence of some com-
piler optimisations. For instance, some auxiliary variables not present in the
source may appear during the translation. In our framework, it is essential that
memory consumption is preserved during the translation, since we are trying
to certify exactly this property. To this aim, we introduce at Core-Safe level a
resource-aware semantics and then prove that this semantics is preserved in the
translation to the abstract machine.
With the aim of facilitating the understanding of the paper, and also avoiding
descending to many low level details, we have made available the Isabelle/HOL
scripts at http://dalila.sip.ucm.es/safe/theories. We recommend the reader
to consult this site while reading in order to match the concepts described here
with its definition in Isabelle/HOL. The paper is structured as follows: after
this introduction, in Sec. 2 we motivate our Safe language and then present the
syntax and semantics of the source and target languages. Then, Sec. 3 explains
the translation and gives a small example of the generated code. Sections 2
and 3 contain large portions of material already published in [14,16]. We felt
that this material was needed in order to understand the certification process.
Sec. 4 is devoted to explaining the main correctness theorem and a number of
auxiliary predicates and relations needed in order to state it. Sec. 5 summarises
the lessons learnt, and finally a Related Work section closes the paper.

2 The Source and Target Languages

2.1 Full-Safe and Core-Safe
Safe is a first-order polymorphic functional language with a syntax similar to
that of (first-order) Haskell, and with some facilities to manage memory. The
memory model is based on heap regions where data structures are built. A
region is a collection of cells and a cell stores exactly one constructor application.
However, in Full-Safe regions are implicit. These are inferred [15] when Full-Safe
is desugared into Core-Safe. The allocation and deallocation of regions are bound
to function invocations: a working region is allocated when entering the call and
deallocated when exiting it. All data structures allocated in this region are lost.
Inside a function, data structures may be built but they can also be destroyed
by using a destructive pattern matching, denoted by the symbol !, which deallo-
cates the cell corresponding to the outermost constructor. Using recursion the
recursive spine of the whole data structure may be deallocated. As an example,
we show an append function destroying the first list’s spine, while keeping its
elements in order to build the result:
appendD []! ys = ys
appendD (x:xs)! ys = x : appendD xs ys

This appending needs constant (in fact, zero) additional heap space, while the
usual version needs linear additional heap space. The fact that the ﬁrst list is lost
Formal Certiﬁcation of a Resource-Aware Language Implementation 199

n m
prog → data i ; dec j ; e {Core-Safe program}
nk l
n m
data → data T αi @ ρj = Ck tks @ ρm {recursive, polymorphic data type}
dec → f xi n @ rj l = e {recursive, polymorphic function}
e → a {atom: literal c or variable x}
|x@r {copy data structure x into region r}
| x! {reuse data structure x}
| a 1 ⊕ a2 {primitive operator application}
| f ai n @ rj l {function application}
| let x1 = be in e {non-recursive, monomorphic}
n
| case x of alt i {read-only case}
n
| case! x of alt i {destructive case}
n
alt → C xi → e {case alternative}
be → C ai n @ r {constructor application}
|e

Fig. 1. Core-Safe syntax

is reflected, by using the symbol ! in the type inferred for the function appendD ::
∀aρ1 ρ2 . [a]!@ρ1 → [a]@ρ2 → ρ2 → [a]@ρ2 , where ρ1 and ρ2 are polymorphic types
denoting the regions where the input and output lists should live. In this case,
due to the sharing between the second list and the result, these latter lists should
live in the same region. Another possibility is to destroy part of a data structure
and to reuse the rest in the result, as in the following destructive split function:
splitD 0 zs! = ([], zs!)
splitD n []! = ([], [])
splitD n (y:ys)! = (y:ys1, ys2) where (ys1, ys2) = splitD (n-1) ys
The righthand side zs! expresses reusing the remaining list. The inferred type is:
splitD :: ∀aρ1 ρ2 ρ3 . Int → [a]!@ρ2 → ρ1 → ρ2 → ρ3 → ([a]@ρ1 , [a]@ρ2 )@ρ3
Notice that the regions used to build the result appear as additional arguments.
The data structures which are not part of the function’s result are inferred to
be built in the local working region, which we call self, and they die at function
termination. As an example, the tuples produced by the internal calls to splitD
are allocated in their respective self regions and do not consume memory in the
caller regions. The type of these internal calls is Int → [a]!@ρ2 → ρ1 → ρ2 →
ρself → ([a]@ρ1 , [a]@ρ2 )@ρself , which is different from the external type because
we allow polymorphic recursion on region types. More information about Safe
and its type system can be found at [16].
The Safe front-end desugars Full-Safe and produces a bare-bones functional
language called Core-Safe. The transformation starts with region inference and
follows with Hindley-Milner type inference, desugaring pattern matching into
case expressions, where clauses into let expressions, collapsing several function-
defining equations into a single one, and some other transformations.
In Fig. 1 we show Core-Safe’s syntax, which is defined in Isabelle/HOL as a
collection of datatypes. A program prog is a sequence of possibly recursive poly-
morphic data and function definitions followed by a main expression e whose
value is the program result. The abbreviation xi n stands for x1 · · · xn . Destruc-
tive pattern matching is desugared into case! expressions. Constructor applica-
tions are only allowed in let bindings. Only atoms are used in applications, and
200 J. de Dios and R. Peña

only variables are used in case/case! discriminants, copy and reuse expressions.
Region arguments are explicit in constructor and function applications and in
the copy expression. Function deﬁnitions have additional region arguments rj l
where the function is allowed to build data structures. In the function’s body
only the rj and its working region self may be used.

2.2 Core-Safe Semantics

In Figure 2 we show the resource-aware big-step semantics of Core-Safe ex-
pressions. A judgement of the form E h, k, td, e ⇓ h , k, v, r means that the
expression e is successfully reduced to normal form v under runtime environment
E and heap h with k + 1 regions, ranging from 0 to k, and that a final heap h
with k + 1 regions is produced as a side effect. Arguments k can be considered
as attributes of their respective heaps. We highlight them in order to emphasise
that the evaluation starts and ends with the same number of regions, and also to
show when regions are allocated and deallocated. A value v is either a constant
or a heap pointer. The argument td and the result r have to do with resource
consumption and will be explained later. The semantics can be understood dis-
regarding them. Moreover, forgetting about resource consumption produces a
valid value semantics for the language.
A runtime environment E maps program variables to values and region vari-
ables to actual region identifiers which consist of natural numbers. As region
allocation/deallocation are done at function invocation/return time, the live re-
gions are organised in a region stack. A region identifier is just its offset from
the bottom of this stack. We adopt the convention that for all E, if c is a con-

E h, k, td , c ⇓ h, k, c, ([ ]k , 0, 1) [Lit]
E[x → v] h, k, td , x ⇓ h, k, v, ([ ]k , 0, 1) [Var ]
j ≤ k (h , p ) = copy(h, p, j) m = size(h, p)
[Var2 ]
E[x → p, r → j] h, k, td , x@r ⇓ h , k, p , ([j → m], m, 2)
fresh(q)
[Var3 ]
E[x → p] h [p → w], k, td , x! ⇓ h [q → w], k, q, ([ ]k , 0, 1)
c = c 1 ⊕ c2
[Primop]
E[a1 → c1 , a2 → c2 ] h, k, td , a1 ⊕ a2 ⇓ h, k, c, ([ ]k , 0, 2)
(f xi n @ rj l = e) ∈ Σ
n l
[xi → E(ai ) , rj → E(rj ) , self → k + 1] h, k + 1, n + l, e ⇓ h , k + 1, v, (δ, m, s)
l
[App]
E h, k, td , f ai n @ rj ⇓ h |k , k, v, (δ|k , m, max{n + l, s + n + l − td })
E h, k, 0, e1 ⇓ h , k, v1 , (δ1 , m1 , s1 )
E ∪ [x1 → v1 ] h , k, td + 1, e2 ⇓ h , k, v, (δ2 , m2 , s2 )
[Let1 ]
E h, k, td , let x1 = e1 in e2 ⇓ h , k, v, (δ1 + δ2 , max{m1 , |δ1 | + m2 }, max{2 + s1 , 1 + s2 })
j≤k fresh(p) E ∪ [x1 → p] h [p → (j, C vi n )], k, td + 1, e2 ⇓ h , k, v, (δ, m, s)
n
[Let2 ]
E[ai → vi , r → j] h, k, td , let x1 = C ai n @r in e2 ⇓ h , k, v, (δ + [j → 1], m + 1, s + 1)
E[x → p] h[p → (j, Cr vi n )] E ∪ [xri → vi nr ] h, k, td + nr , er ⇓ h , k, v, (δ, m, s)
n [Case]
E h, k, td , case x of Ci xij ni
→ ei ⇓ h , k, v, (δ, m, s + nr )
E[x → p] h+ = h [p → (j, Cr vi n )] E ∪ [xri → vi nr ] h, k, td + nr , er ⇓ h , k, v, (δ, m, s)
n [Case!]
E h+ , k, td , case! x of Ci xij ni → ei ⇓ h , k, v, (δ + [j → −1], max{0, m − 1}, s + nr )

Fig. 2. Resource-Aware Big-Step Operational Semantics of Core-Safe expressions

Formal Certiﬁcation of a Resource-Aware Language Implementation 201

stant, then E(c) = c. A heap h is a ﬁnite mapping from fresh variables p to

constructor cells w of the form (j, C vi n ), meaning that the cell resides in region
j. By h[p → w] we denote a heap h where the binding [p → w] is highlighted,
while h [p → w] denotes the disjoint union of heap h with the binding [p → w].
By h |k we denote the heap obtained by deleting from h those bindings living in
regions greater than k, and by dom(h), the set {p | [p → w] ∈ h}.
The semantics of a program is the semantics of the main expression in an
environment Σ, which is the set containing all the function and data declarations.
Rules Lit and Var 1 just say that basic values and heap pointers are normal forms.
Rule Var 2 executes a runtime system copy function copying the recursive part
of the data structure pointed to by p, and living in a region j , into a (possibly
different) region j. In rule Var 3 , the binding [p → w] in the heap is deleted and
a fresh binding [q → w] to cell w is added. This action may create dangling
pointers in the live heap, as some cells may contain free occurrences of p. Rule
App shows when a new region is allocated. Notice that the body of the function
is executed in a heap with k + 2 regions. The formal identifier self is bound to
the newly created region k + 1 so that the function body may create cells in this
region or pass this region as an argument to other functions. Before returning
from the function, all cells created in region k + 1 are deleted. Rules Let 1 , Let 2 ,
and Case are the usual ones for an eager language, while rule Case! expresses
what happens in a destructive pattern matching: the binding of the discriminant
variable disappears from the heap.
This semantics is defined in Isabelle/HOL as an inductive relation. The en-
vironment E is split into a pair (E1 , E2 ) separating program variables from
region arguments bindings. These and the heap are modelled as partial func-
tions. Even though all functions are total in Isabelle/HOL, a partial function,
denoted a b, can be easily defined as the total function a ⇒ b option ,
where f x = None represents that f is not defined at x.

2.3 Resource Consumption

The semantics relates the evaluation of an expression e to a resource vector
r = (δ, m, s) obtained as a side effect. The first component is a partial function
δ : N → Z giving for each region k in scope the signed difference between the
cells in the final and initial heaps. A positive difference means that new cells
have been created in this region. A negative one means that some cells have
been destroyed. By dom(δ) we denote the subset of N in which δ is defined.
By |δ| we mean the sum n∈dom(δ) δ(n) giving the total balance of cells. The
remaining components m and s respectively give the minimum number of fresh
cells in the heap and of words in the stack needed to successfully evaluate e.
When e is the main expression, these figures give us the total memory needs of
the Safe program. The additional argument td is the number of bindings in E
which can be discarded when a normal form is reached or at function invocation.
It coincides with the value returned by the function topDepth of Sec. 3. As we will
see there, the runtime environment E is kept in the evaluation stack and (part
of) this environment is discarded by the abstract machine in those situations.
By [ ]k we denote the function λn.0 if 0 ≤ n ≤ k, and λn.⊥ otherwise. By δ1 + δ2
we denote the function:
⎧
⎨ δ1 (x) + δ2 (x) if x ∈ dom(δ1 ) ∩ dom(δ2 )
(δ1 + δ2 )(x) = δi (x) if x ∈ dom(δi ) − dom(δ3−i ), i ∈ {1, 2}
⎩⊥ otherwise
202 J. de Dios and R. Peña

Function size in rule Var 2 gives the size of the recursive spine of a data
structure:
size(h[p → (j, C vi n )], p) = 1 + size(h, vi )
i∈RecPos(C )

where RecPos returns the recursive argument positions of a given construc-

tor. In rule App, by δ|k we mean a function like δ but undefined for values
greater than k. The computation of these resource consumption figures takes
into account how the translation will transform, and the abstract machine will
execute, the corresponding expression. For instance, in rule App the number
max{n + l, s + n + l − td } of fresh stack words takes into account that the first
n + l words are needed to store the actual arguments in the stack, then the
current environment of length td is discarded, and then the function body is
evaluated. In rule Let 1 , a continuation (2 words, see Sec. 2.4) is stacked before
evaluating e1 , and this evaluation leaves a value in the stack before evaluating
e2 . Hence, the computation max{2 + s1 , 1 + s2 }.

2.4 Safe-Imp Syntax and Semantics

Safe-Imp is the bytecode language of the SVM. Its syntax and semantics is de-
picted in Figure 3. A conﬁguration of the SVM consists of the ﬁve components
(is, (h, k), k0 , S, cs), where is is the current instruction sequence, (h, k) is the cur-
rent heap, k being its topmost region, S is a stack and cs is the code store where
the instruction sequences resulting from the compilation of program fragments
are kept. A code store is a partial function from code labels, denoted p, q, . . .,
to bytecode lists. The component k0 is a low watermark in the heap registering
which one must be the topmost region when a normal form is reached (see the
semantics of the DECREGION instruction). The property k0 ≤ k is an invari-
ant of the execution. By b, bi , . . . we denote heap pointers or any other item
stored in the stack. The stack contains three kinds of objects: values, regions
and continuations.
so → v | j | (k, p) {stack object}
S → so list {stack}

The semantics of the Safe-Imp instructions is shown in terms of conﬁguration

transitions. By Crm we denote the data constructor which is the r-th in its data
deﬁnition out of a total of m data constructors, and by S!j, the j-th element of
the stack S counting from the top and starting at 0. A more complete view on
how this machine has been derived from the semantics can be found at [14]. For
the purpose of this paper, a short summary of the instructions follows.
Instruction DECREGION deletes from the heap all the regions, if any, be-
tween the current topmost region k and region k0 , excluding the latter. Each
region can be deallocated with a time cost in O(1) due to its implementation as
a linked list (see [21] for details). Instruction POPCONT pops a continuation
from the stack or stops the execution if there is none. Instruction PUSHCONT
pushes a continuation. It will be used in the translation of a let.
Instructions COPY and REUSE just mimic the semantics given to the cor-
responding expressions. Instruction CALL jumps to a new instruction sequence
and creates a new region. Function calls are always tail recursive, so there is no
Formal Certiﬁcation of a Resource-Aware Language Implementation 203

Initial conﬁguration ⇒ Final conﬁguration Condition

(DECREGION : is, (h, k), k0 , S, cs) k ≥ k0

⇒ (is, (h |k0 , k0 ), k0 , S, cs)
([POPCONT ], (h, k), k, b : (k0 , p) : S, cs[p → is])
⇒ (is, (h, k), k0 , b : S, cs)
(PUSHCONT p : is, (h, k), k0 , S, cs[p → is ])
⇒ (is, (h, k), k, (k0 , p) : S, cs)
(COPY : is, (h[b → (l, C vi n )], k), k0 , b : j : S, cs) (h , b ) = copy(h, b, j)
⇒ (is, (h , k), k0 , b : S, cs) j≤k
(REUSE : is, (h [b → w], k), k0 , b : S, cs) fresh(b , h [b → w])
⇒ (is, (h [b → w], k), k0 , b : S, cs)
([CALL p], (h, k), k0 , S, cs[p → is])
⇒ (is, (h, k + 1), k0 , S, cs)
(PRIMOP ⊕ : is, (h, k), k0 , c1 : c2 : S, cs) c = c1 ⊕ c2
⇒ (is, (h, k), k0 , c : S, cs)
m
([MATCH l pj m ], (h[S!l → (j, Crm vi n )], k), k0 , S, cs[pj → is j ])
⇒ (is r , (h, k), k0 , vi n : S, cs)
m
([MATCH ! l pj m ], (h [S!l → (j, Crm vi n )], k), k0 , S, cs[pj → is j ])
⇒ (is r , (h, k), k0 , vi n : S, cs)
n
(BUILDENV Ki : is, (h, k), k0 , S, cs)
n
⇒ (is, (h, k), k0 , Item k (Ki ) : S, cs) (1)
m n
(BUILDCLS Cr Ki K : is, (h, k), k0 , S, cs) Item k (K) ≤ k, fresh(b, h)
n
⇒ (is, (h [b → (Item k (K), Crm Item k (Ki ) )], k), k0 , b : S, cs) (1)
m n
(SLIDE m n : is, (h, k), k0 , bi : bi : S, cs)
m
⇒ (is, (h, k), k0 , bi : S, cs)
8
< S!j if K = j ∈ N
def
(1) Item k (K) = c if K = c
:k if K = self

Fig. 3. The Safe Virtual Machine (SVM)

need for a return instruction. Instruction MATCH does a jump depending on

the constructor of the matched cell. The list of code labels pj m corresponds to
the compilation of a set of case alternatives. Instruction MATCH ! additionally
destroys the matched cell. The following invariant is ensured by the translation:
For every instruction sequence in the code store cs, instruction i is the last one
if and only if it belongs to the set {POPCONT , CALL, MATCH , MATCH !}.
Instruction BUILDENV creates a portion of the environment on top of the
stack: If a key K is a natural number j, the item S!j is copied and pushed
on the stack; if it is a basic constant c, it is directly pushed on the stack; if it
is the identifier self , then the topmost region number k is pushed. Instruction
BUILDCLS allocates a fresh cell and fills it with a constructor application. It
uses the same conventions as BUILDENV . Finally, instruction SLIDE removes
some parts of the stack and it is used to remove environment fragments.
We have defined this semantics in Isabelle/HOL as the function:
execSVM :: SafeImpProg ⇒ SVMState ⇒ (SVMState, SVMState) Either
where Either is a sum type and SVMState denotes a configuration ((h, k),
k0 , pc, S) with the code store removed and the current instruction sequence
replaced by a program counter pc = (p, i). The code store cs is a read-only
component and has been included in the type SafeImpProg . The current instruc-
tion can be retrieved by accessing the i-th element of the sequence (cs p). If the
result of execSVM P s1 is Left s1 , this means that s1 is a final state. Otherwise,
it returns Right s2 .
204 J. de Dios and R. Peña

3 The Translation
The translation splits the runtime environment (E1 , E2 ) of the semantics into
two: a compile-time one ρ mapping program variables to stack offsets, and the
actual runtime environment contained in the stack. As this grows dynamically,
numbers are assigned to the variables from the bottom of the environment.
In this way, if the environment occupies the top m positions of the stack and
ρ[x → 1], then S!(m − 1) will contain the runtime value of x.
An expression let x1 = e1 in e2 will be translated by pushing to the stack
a continuation for e2 , and then executing the translation of e1 . A continuation
consists of a pair (k0 , p) where p points to the translation of e2 and k0 is the
lower watermark associated to e2 . It is saved in the stack because the lower
watermark of e1 is different (see the semantics of PUSHCONT ). As e1 and e2
share most of their runtime environments, the continuation is treated as a barrier
below which the environment must not be deleted while e2 has not reached its
normal form. So, the whole compile-time environment ρ consists of a list of
smaller environments [δ1 , . . . , δn ], mimicking the stack layout. Each individual
block i consists of a triple (δi , li , ni ) with an environment δi mapping variables
to numbers in the range (1 . . . mi ), a block length li = mi + ni , and an indicator
ni = 2 for all the blocks except for the first one, whose value is n1 = 0. We
are assuming that a continuation needs two words in the stack and that the
remaining items need one word.
The offset with respect to the top of the stack of a variable x defined in the
def k
block k, denoted ρ x, is computed as follows: ρ x = ( i=1 li ) − δk x. Only the
top environment may be extended with new bindings. There are three operations
on compile-time environments:
n def n
1. ((δ, m, 0) : ρ) + {xi → ji } = (δ ∪ {xi → m + ji , m + n, 0) : ρ.
def
2. ((δ, m, 0) : ρ)++ = ({}, 0, 0) : (δ, m + 2, 2) : ρ.
def
3. topDepth ((δ, m, 0) : ρ) = m. Undefined otherwise.
The first one extends the top environment with n new bindings, while the second
closes the top environment with a 2-indicator and then opens a new one.
Using these conventions, in Figure 4 we show an idealised version of the trans-
lation function trE taking a Core-Safe expression and a compile-time environ-
ment, and giving as a result a list of SVM instructions and a code store. There,
NormalForm ρ is the following list:
def
NormalForm ρ = [SLIDE 1 (topDepth ρ), DECREGION , POPCONT ]

The whole program translation is done by Isabelle/HOL function trProg which

first translates each function definition by using function trF , and then the main
expression by using trE . The source file is guaranteed to define a function be-
fore its use. The translation accumulates an environment funm mapping every
function name to the initial bytecode sequence of its definition. The main part
of trProg is:
trProg (datas, defs, e) = (
let . . .
((p, funm, contm), codes ) = mapAccumL trF (1, empty , [ ]) defs;
cs = concat codes
in . . . cs . . .)
Formal Certification of a Resource-Aware Language Implementation 205

trE cρ = (BUILDENV [c] : NormalForm ρ, {})

trE xρ = (BUILDENV [ρ x] : NormalForm ρ, {})
trE (x@r) ρ = (BUILDENV [ρ x, ρ r] : COPY : NormalForm ρ, {})
trE (x!) ρ = (BUILDENV [ρ x] : REUSE : NormalForm ρ, {})
trE (a1 ⊕ a2 ) ρ = (BUILDENV [ρ a1 , ρ a2 ] : PRIMOP : NormalForm ρ, {})
trE (f ai n @ sj m ) ρ = ([BUILDENV [ρ ai n , ρ sj m ], SLIDE (n + m) (topDepth ρ), CALL p], cs )
n m
where (f xi @ rj = e) ∈ defs
cs = {p → is} ∪ cs
m n
(is, cs) = trE e [({ rj → m − j + 1 , xi → n − i + m + 1 }, n + m, 0)]
n
trE (let x1 = Clm ai n @s in e) ρ = (BUILDCLS Clm [(ρ ai ) ] (ρ s) : is, cs)
where (is, cs) = trE e (ρ + {x1 → 1})
trE (let x1 = e1 in e2 ) ρ = (PUSHCONT p : is 1 , cs 1 ∪ cs 2 ∪ {p → is 2 })
where (is 1 , cs 1 ) = trE e1 ρ++
(is 2 , cs 2 ) = trE e2 (ρ + {x1 → 1})
n n S
trE (case x of alt i ) ρ = ([MATCH (ρ x) pi n ], {pi → is i } ∪ ( n i=1 cs i ))
where (is i , cs i ) = trA alt i ρ, 1 ≤ i ≤ n
n n S
trE (case! x of alt i ) ρ = ([MATCH ! (ρ x) pi n ], {pi → is i } ∪ ( ni=1 cs i ))
where (is i , cs i ) = trA alt i ρ, 1 ≤ i ≤ n
n
trA (C xi n → e) ρ = trE e (ρ + {xi → n − i + 1 })

Fig. 4. Translation from Core-Safe expressions to Safe-Imp bytecode instructions

P1 → [BUILDCLS Nil 20 [ ] self , BUILDENV [0, 0, self ], SLIDE 3 1, CALL P2 ]

P2 → [MATCH ! 0 [P3 , P4 ]]
P3 → [BUILDENV [1], SLIDE 1 3, DECREGION , POPCONT ]
P4 → [PUSHCONT P5 , BUILDENV [3, 5, 6], SLIDE 3 0, CALL P2 ]
P5 → [BUILDCLS Cons 21 [1, 0] 5, BUILDENV [0], SLIDE 1 6, DECREGION , POPCONT ]

Fig. 5. Imperative code for the Core-Safe appendD program

where cs is the code store resulting from the compilation, and mapAccumL is
a higher-order function, combining map and foldl , deﬁned to Isabelle/HOL by
copying its deﬁnition from the Haskell library (http://dalila.sip.ucm.es/safe/
theories for more details).
In Figure 5 we show the code store generated for the following Core-Safe
program with the appendD function of Sec. 2.1:
appendD xs ys @ r = case! xs of
[] → ys
x : xx → let yy = appendD xx ys @ r in
let zz = x : yy @ r in zz ;
let l = [ ] @ self in append l l @ self

4 Formal Veriﬁcation
The above infrastructure allows us to state and prove the main theorem express-
ing that the pair translation-abstract machine is sound and complete with respect
to the resource-aware semantics. First, we make note that both the semantics
and the SVM machine rules are syntax driven, and that their computations are
deterministic (up to fresh names generation for the heap). So, we only need to
prove that everything done by the semantics can be emulated by the machine,
and that termination of the machine implies termination of the semantics (for
the corresponding expression.)
206 J. de Dios and R. Peña

First we deﬁne in Isabelle/HOL the following equivalence relation between

runtime environments in the semantics and in the machine:

Deﬁnition 1. We say that the environment E = (E1 , E2 ) and the pair (ρ, S)
are equivalent, denoted (E1 , E2 ) 1 (ρ, S), if dom E − {self } = dom ρ, and
∀x ∈ dom E1 . E1 (x) = S!(ρ x), and ∀r ∈ dom E2 − {self } . E2 (r) = S!(ρ r).

Then we deﬁne an inductive relation expressing the evolution of the SVM ma-
chine up to some intermediate points corresponding to the end of the evaluation
of sub-expressions:
inductive
execSVMBalanced :: [SafeImpProg,SVMState,nat list,SVMState list,nat list] ⇒ bool
( , -svm→ , )
where
init: P s, n#ns -svm→ [s], n#ns
| step: [[ P s, n#ns -svm→ s’#ss, m#ms;
execSVM P s’ = Right s’’;
m’ = nat (diﬀStack s’’ s’ m);
m’ ≥ 0;
ms’ = (if pushcont (instrSVM P s’) then 0#m#ms
else if popcont (instrSVM P s’) ∧ ms=m’’#ms’’ then (Suc m’’)#ms’’
else m’#ms)]] =⇒
P s, n#ns -svm→ s’’#s’#ss, ms’

P s, n#ns −svm → ss, 1#ns represents a ‘balanced’ execution of the SVM

corresponding to the evaluation of a source expression. Its meaning is that the
Safe-Imp program P evolves by starting at state s and passing through all the
states in the list ss (s is the last state of the list ss, and the sequence pro-
gresses towards the head of the list), with the stack decreasing at most by n
positions. Should the top instruction of the current state create a smaller stack,
then the machine stops at that state. The symbol # in Isabelle/HOL is the cons
constructor for lists.
Next, we define what resource consumption means at the machine level. Given
a forwards state sequence ss = s0 · · · sr starting at s0 with heap h0 and stack
S0 , maxFreshCells ss gives the highest non-negative difference in cells between
the heaps in ss and the heap h0 . Likewise, maxFreshWords ss gives the maximum
number of fresh words created in the stack during the sequence ss with respect
to S0 . Finally, diff k h h gives for each region j, 0 ≤ j ≤ k, the signed difference
in cells between h and h.
From the input list ds of Core-Safe definitions, we define the set definedFuns ds
of the function names defined there. Also, given an expression e, closureCalled e ds
is an inductive set giving the names of the functions reached from e by direct
or indirect invocation. By cs cs we mean that the code store cs extends the
code store cs with new bindings.
Finally, we show the correctness lemma of the semantics with respect to the
machine, as it has been stated and proved in Isabelle/HOL:
lemma correctness:
E h , k , td , e ⇓ h’ , k , v , r −→
(closureCalled e defs ⊆ definedFuns defs
∧ ((p, funm, contm), codes) = mapAccumL trF (1, empty, []) defs
∧ cs = concat codes
Formal Certification of a Resource-Aware Language Implementation 207

∧ P = ((cs, contm),p,ct,st)
∧ finite (dom h)
−→ ( ∀ rho S S’ k0 s0 p’ q ls is is’ cs1 j.
(q, ls, is, cs1) = trE p’ funm fname rho e
∧ (append cs1 [(q,is’,fname)]) cs
∧ drop j is’ = is
∧ E 1 (rho,S)
∧ td = topDepth rho
∧ k0 ≤ k
∧ S’ = drop td S
∧ s0 = ((h, k), k0, (q, j), S)
−→ ( ∃ s ss q’ i δ m w.
P s0 , td#tds -svm→ s # ss , 1#tds
∧ s = ((h’, k) ↓ k0, k0, (q’, i), Val v # S’)
∧ fst (the (map of cs q’))!i = POPCONT
∧ r = ( δ,m,w)
∧ δ = diff k (h,k) (h’,k)
∧ m = maxFreshCells (rev (s#ss))
∧ w = maxFreshWords (rev (s#ss)))))
The premises state that the arbitrary expression e is evaluated to a value
v according to the Core-Safe semantics, that it is translated in the context of
a closed Core-Safe program defs having a definition for every function reached
from e, and that the instruction sequence is and the partial code store cs1 are
the result of the translation. Then, the execution of this sequence by the SVM
starting at an appropriate state s0 in the context of the translated program P ,
will reach a stopping state s having the same heap (h , k) as the one obtained in
the semantics, and the same value v on top of the stack. Moreover, the memory
(δ, m, w) consumed by the machine, both in the heap and in the stack, is as
predicted by the semantics.
The proof is done by induction on the ⇓ relation, and with the help of a
number of auxiliary lemmas, some of them stating properties of the translation
and some others stating properties of the evaluation. We classify them into the
following groups:

Lemmas on the evolution of the SVM. This group takes care of the ﬁrst three
conclusions, i.e. P s0 , td#tds -svm→ s # ss , 1#tds and the next two ones, and
there is one or more lemmas for every syntactic form of e.

Lemmas on the equivalence of runtime environments. They are devoted to prov-

ing that the relation (E1 , E2 ) 1 (ρ, S) is preserved across evaluation. For in-
l
stance, if e ≡ f ai n @ rj , being f deﬁned by the equation f xi n @ rj l = ef , we
prove that the equivalence of the environments local to f still hold. Formally:
(E1 , E2 ) 1 (ρ, S)
n l
∧ ρ = [({xi → n − i + l + 1 , rj → l − j + 1 , }, n + l, 0)]
n l
∧ (E1 , E2 ) = ([xi → E(ai ) ], [rj → E(rj ) , self → k + 1])
n l
∧ S = S!(ρ ai ) @ S!(ρ rj ) @ drop td S
=⇒ (E1 , E2 ) 1 (ρ , S )

Lemmas on cells charged to the heap. This group takes care of the last but
two conclusion δ = diﬀ k (h,k) (h’,k), and there is one or more lemmas for every
208 J. de Dios and R. Peña

syntactic form of e. For instance, if e ≡ let x1 = e1 in e2 , then the main lemma

has essentially this form:
δ1 = diff k (h, k) (h |k , k)
∧ δ2 = diff k (h |k , k) (h , k)
=⇒ δ1 + δ2 = diff k (h, k) (h , k)

where (h, k), (h |k , k), and (h , k) are respectively the initial heap, and the heaps
after the evaluation of e1 and e2 .
Lemmas on fresh cells needed in the heap. This group takes care of the last but
one conclusion m = maxFreshCells (rev (s#ss)). If e ≡ let x1 = e1 in e2 , then the
main lemma has essentially this form:

δ1 = diﬀ k (h, k) (h |k , k)

∧ m1 = maxFreshCells (rev (s1 #ss 1 ))
∧ m2 = maxFreshCells (rev (s2 #ss 2 ))
=⇒ max m1 (m2 + | δ1 |) = maxFreshCells (rev (s2 #ss 2 @ s1 #ss 1 @ [s0 ]))
where s0 , s1 , and s2 are respectively the initial state of the SVM, and the states
after the evaluation of e1 and e2 .
Lemmas on fresh words needed in the stack. This group takes care of the last
l
conclusion w = maxFreshWords (rev (s#ss)). If e ≡ f ai n @ rj , then the main
lemma has essentially this form:
w = maxFreshWords (rev (s#ss))
=⇒ max (n + l) (w + n + l − td ) = maxFreshWords (rev (s#ss @ [s2 , s1 , s0 ]))
where s0 , s1 , s2 are respectively the initial state of the application, and the states
after the execution of BUILDENV and SLIDE , and s#ss is the state sequence
of the body of f .
That termination of the SVM implies the existence of a derivation in the
semantics for the corresponding expression has not been proved for the moment.

5 Discussion
On the use of Isabelle/HOL. The complete specification in Isabelle/HOL of
the syntax and semantics of our languages, of the translation functions, the
theorems and the proofs, represent almost one person-year of effort. Including
comments, about 7000 lines of Isabelle/HOL scripts have been written, and
about 200 lemmas proved.
Isabelle/HOL gives enough facilities for defining recursive and higher-order
functions. These are written in much the same way as a programmer would do
in ML or Haskell. We have not found special restrictions in this respect. The
only ‘difficulty’ is that it is not possible to write potentially non-terminating
functions. One must provide a termination proof when Isabelle/HOL cannot find
one. Providing such a proof is not always easy because the argument depends
on some other properties such as ‘there are no cycles in the heap’, which are not
so easy to prove. Fortunately in these cases we have expressed the same ideas
using inductive relations.
Isabelle/HOL also provides inductive n-relations, transitive closures as well as
ordinary first-order logic. This has made it easy to express our properties with
Formal Certification of a Resource-Aware Language Implementation 209

almost the same concepts one would use in hand-written proofs. Partial functions
have also been very useful in modelling programming language structures such
as environments, heaps, and the like. Being able to quantify these objects in
Higher-Order Logic has been essential for stating and proving the theorems.
Assessing how ‘easy’ it has been to conduct the proofs is another question. Part
of the difficulties were related to our lack of experience in using Isabelle/HOL.
The learning process was rather slow at the beginning. A second inconvenience
is that proof assistants (as it must be) do not take anything for granted. Trivial
facts that nobody cares to formalise in a hand-written proof, must be painfully
stated and proved before they can be used. We have sparingly used the auto-
matic proving commands such as simp all, auto, etc., in part because they do
‘too many’ things, and frequently one does not recognise a lemma after using
them. Also, we wanted the proof and to relate the proof to our hand-written
version. As a consequence, it is very possible that our scripts are longer than
needed. Finally, having programs and predicates ‘living’ together in a theorem
has been an experience not always easy to deal with.
On the quality of the extracted code. The Haskell code extracted from the Is-
abelle/HOL definitions reaches 700 lines, and has undergone some changes before
becoming operative in the compiler. One of these changes has been a trivial co-
ercion between the Isabelle/HOL types nat and int and the Haskell type Int.
The most important one has been the replacement of the Isabelle/HOL type
representing a partial function, heavily used for specifying our compile-time
environments, by a highly trusty table type of the Haskell library. The code gen-
erated for was just a λ-abstraction needing linear time in order to find the
value associated to a key. This would lead to a quadratic compile time. Our table
is implemented as a balanced tree and has also been used in other phases of the
compiler. With this, the efficiency of the code generation phase is in O(n log n)
for a single Core-Safe function of size n, and about linear with the number of
functions of the input.

6 Related Work
Using some form of formal verification to ensure the correctness of compilers has
been a hot topic for many years. An annotated bibliography covering up to 2003
can be found at [6]. Most of the papers reflected there propose techniques whose
validity is established by formal proofs made and read by humans.
Using machine-assisted proofs for compilers starts around the seventies, with
an intensificaton at the end of the nineties. For instance, [19] uses a constraint
solver to asses the validity of the GNU C compiler translations. They do not try
to prove the compiler correct but instead to validate its output by comparing
it with the corresponding input. This technique was originally proposed in [23].
A more recent experiment in compiler validation is [12]. In this case the source
is the term language of HOL and the target is assembly language of the ARM
processor. The compiler generates for each source, the object file and a proof
showing that the semantics of the source is preserved. The last two stages of the
compilation are in fact formally verified, while validation of the output is used
in the previous phases.
More closely related to our work are [1] which certifies the translation of a Lisp
subset to a stack language by using PVS, and [25] which uses Isabelle/HOL to
formalise the translation from a small subset of Java (called μ-Java) to a stripped
210 J. de Dios and R. Peña

version of the Java Virtual Machine (17 bytecode instructions). Both specify the
translation functions, and prove correctness theorems similar to ours. The latter
work can be considered as a first attempt on Java, and it was considerably
extended by Klein, Nipkow, Berghofer, and Strecker himself in [8,9,3]. Only [3]
claims that the extraction facilities of Isabelle/HOL have been used to produce
an actually running Java compiler. The main emphasis is on formalisation of
Java and JVM features and on creating an infrastructure on which other authors
could verify properties of Java or Java bytecode programs.
A realistic C compiler for programming embedded systems has been built and
verified in [5,10,11]. The source is a small C subset called Cminor to which C is
informally translated, and the target is Power PC assembly language. The com-
piler runs through six intermediate languages for which the semantics are defined
and the translation pass verified. The authors use the Coq proof-assistant and
its extraction facilities to produce Caml code. They provide figures witnessing
that the compile times obtained are competitive whith those of gcc running with
level-2 optimisations activated. This is perhaps the biggest project on machine-
assisted compiler verification done up to now.
Less related work are [7] and the MRG project [24], where certificates in Is-
abelle/HOL about heap consumption, based on special types inferred by the com-
piler, are produced. Two EU projects, EmBounded (http://www.embounded.org)
and Mobius (http://mobius.inria.fr) have continued this work on certification
and proof carrying code, the first one for the functional language Hume, and the
second one for Java and the JVM.
As we have said in Sec. 1, the motivation for verifying the Safe back-end arises
in a different context. We have approached this development because we found
it shorter than translating the Core-Safe properties to certificates at the level
of the JVM. Also, we expected the size of our certificates to be considerably
smaller than the ones obtained with the other approach. We have improved on
previous work by complementing functional correctness with a proof of resource
consumption preservation.

References
1. Dold, A., Vialard, V.: A Mechanically Verified Compiling Specification for a Lisp
Compiler. In: Hariharan, R., Mukund, M., Vinay, V. (eds.) FSTTCS 2001. LNCS,
vol. 2245, pp. 144–155. Springer, Heidelberg (2001)
2. Barthe, G., Grégoire, B., Kunz, C., Rezk, T.: Certificate Translation for Optimizing
Compilers. In: Yi, K. (ed.) SAS 2006. LNCS, vol. 4134, pp. 301–317. Springer,
Heidelberg (2006)
3. Berghofer, S., Strecker, M.: Extracting a formally verified, fully executable compiler
from a proof assistant. In: Proc. Compiler Optimization Meets Compiler Verifica-
tion, COCV 2003. ENTCS, pp. 33–50 (2003)
4. Bertot, Y., Casteran, P.: Interactive Theorem Proving and Program Development
Coq’Art: The Calculus of Inductive Constructions. Texts in Theoretical Computer
Science. An EATCS Series. Springer, Heidelberg (2004)
5. Blazy, S., Dargaye, Z., Leroy, X.: Formal verification of a C compiler front-end.
In: Misra, J., Nipkow, T., Sekerinski, E. (eds.) FM 2006. LNCS, vol. 4085, pp.
460–475. Springer, Heidelberg (2006)
6. Dave, M.A.: Compiler verification: a bibliography. SIGSOFT Software Engineering
Notes 28(6), 2 (2003)
7. Hofmann, M., Jost, S.: Static prediction of heap space usage for first-order func-
tional programs. In: Proc. 30th ACM Symp. on Principles of Programming Lan-
guages, POPL 2003, pp. 185–197. ACM Press, New York (2003)
Formal Certification of a Resource-Aware Language Implementation 211

8. Klein, G., Nipkow, T.: Verified Bytecode Verifiers. Theoretical Computer Sci-
ence 298, 583–626 (2003)
9. Klein, G., Nipkow, T.: A Machine-Checked Model for a Java-Like Language, Vir-
tual Machine and Compiler. ACM Transactions on Programming Languages and
Systems 28(4), 619–695 (2006)
10. Leroy, X.: Formal certification of a compiler back-end, or: programming a compiler
with a proof assistant. In: Principles of Programming Languages, POPL 2006, pp.
42–54. ACM Press, New York (2006)
11. Leroy, X.: A formally verified compiler back-end, July 2008, p. 79 (submitted, 2008)
12. Li, G., Owens, S., Slind, K.: Structure of a Proof-Producing Compiler for a Subset
of Higher Order Logic. In: De Nicola, R. (ed.) ESOP 2007. LNCS, vol. 4421, pp.
205–219. Springer, Heidelberg (2007)
13. Lindholm, T., Yellin, F.: The Java Virtual Machine Sepecification, 2nd edn. The
Java Series. Addison-Wesley, Reading (1999)
14. Montenegro, M., Peña, R., Segura, C.: A Resource-Aware Semantics and Abstract
Machine for a Functional Language with Explicit Deallocation. In: Workshop on
Functional and (Constraint) Logic Programming, WFLP 2008, Siena, Italy, July
2008, pp. 47–61 (2008) (to appear in ENTCS)
15. Montenegro, M., Peña, R., Segura, C.: A Simple Region Inference Algorithm for
a First-Order Functional Language. In: Trends in Functional Programming, TFP
2008, Nijmegen (The Netherlands), May 2008, pp. 194–208 (2008)
16. Montenegro, M., Peña, R., Segura, C.: A Type System for Safe Memory Man-
agement and its Proof of Correctness. In: Nadathur, G. (ed.) PPDP 1999. LNCS,
vol. 1702, pp. 152–162. Springer, Heidelberg (1999)
17. Montenegro, M., Peña, R., Segura, C.: An Inference Algorithm for Guaranteeing
Safe Destruction. In: LOPSTR 2008. LNCS, vol. 5438, pp. 135–151. Springer, Hei-
delberg (2009)
18. Necula, G.C.: Proof-Carrying Code. In: ACM SIGPLAN-SIGACT Principles of
Programming Languages, POPL 1997, pp. 106–119. ACM Press, New York (1997)
19. Necula, G.C.: Translation validation for an optimizing compiler. SIGPLAN No-
tices 35(5), 83–94 (2000)
20. Nipkow, T., Paulson, L., Wenzel, M.: Isabelle/HOL. A Proof Assistant for Higher-
Order Logic. LNCS, vol. 2283. Springer, Heidelberg (2002)
21. Peña, R., Rupérez, D.: A Certified Implementation of a Functional Virtual Machine
on top of the Java Virtual Machine. In: Jornadas sobre Programación y Lenguajes,
PROLE 2008, Gijón, Spain, October 2008, pp. 131–140 (2008)
22. Peña, R., Segura, C., Montenegro, M.: A Sharing Analysis for SAFE. In: Selected
Papers of the 7th Symp. on Trends in Functional Programming, TFP 2006, pp.
109–128 (2007) (Intellect)
23. Pnueli, A., Siegel, M., Singerman, E.: Translation Validation. In: Steffen, B. (ed.)
TACAS 1998. LNCS, vol. 1384, pp. 151–166. Springer, Heidelberg (1998)
24. Sannela, D., Hofmann, M.: Mobile Resources Guarantees. EU Open FET project,
IST 2001-33149 2001-2005, http://www.dcs.ed.ac.uk/home/mrg
25. Strecker, M.: Formal Verification of a Java Compiler in Isabelle. In: Voronkov, A.
(ed.) CADE 2002. LNCS, vol. 2392, pp. 63–77. Springer, Heidelberg (2002)
26. Wildmoser, M.: Verified Proof Carrying Code. Ph.D. thesis, Institut für Informatik,
Technical University Munchen (2005)
A Certified Data Race Analysis for a Java-like
Language

Frédéric Dabrowski and David Pichardie

INRIA, Centre Rennes - Bretagne Atlantique, Rennes, France

Abstract. A fundamental issue in multithreaded programming is detecting data

races. A program is said to be well synchronised if it does not contain data races
w.r.t. an interleaving semantics. Formally ensuring this property is central, be-
cause the JAVA Memory Model then guarantees that one can safely reason on the
interleaved semantics of the program. In this work we formalise in the COQ proof
assistant a JAVA bytecode data race analyser based on the conditional must-not
alias analysis of Naik and Aiken. The formalisation includes a context-sensitive
points-to analysis and an instrumented semantics that counts method calls and
loop iterations. Our JAVA-like language handles objects, virtual method calls,
thread spawning and lock and unlock operations for threads synchronisation.

1 Introduction
A fundamental issue in multithreaded programming is data races, i.e., the situation
where two threads access a memory location, and at least one of them changes its value,
without proper synchronisation. Such situations can lead to unexpected behaviours,
sometimes with damaging consequences [14, 20]. The semantics of programs with mul-
tiple threads of control is described by architecture-dependent memory models [1, 10]
which define admissible executions, taking into account optimisations such as caching
and code reordering. Unfortunately, these models are generally not sequentially consis-
tent, i.e., it might not be possible to describe every execution of a program as the se-
rialization, or interleaving, of the actions performed by its threads. Although common
memory models impose restrictions on admissible executions, these are still beyond in-
tuition: writes can be seen out of order and reads can be speculative and return values
from the future.
Reasoning directly on memory models is possible but hard, counter-intuitive and
probably infeasible to the average programmer. As a matter of fact, the interleaving
semantics is generally assumed in most formal developments in compilation, static
analysis and so on. Hopefully, under certain conditions, the interleaving semantics can
be turned into a correct approximation of admissible behaviors. Here, we focus on
programs expressed in JAVA, which comes with its own, relieved from architecture spe-
cific details, memory model. Although the JAVA memory model [15, 21] does not guar-
antee sequential consistency for all programs, race free programs are guaranteed to be
sequentially consistent. Moreover, it enjoys a major property, so called, the datarace
free guarantee. This property states that a program whose all sequentially consistent

Work partially supported by EU project MOBIUS, and by the ANR-SETI-06-010 grant.

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 212–227, 2009.
c Springer-Verlag Berlin Heidelberg 2009
A Certified Data Race Analysis for a Java-like Language 213

executions are race free, only admit sequentially consistent executions. In other words,
proving that a program is race free can be done on a simple interleaving semantics; and
doing so guarantees the correctness of the interleaving semantics for that program. It
is worth noticing that data race freedom is important, not only because it guarantees
semantic correctness, but also because it is at the basis of a higher level property called
atomicity. The possibility to reason sequentially about atomic sections is a key feature
in analysing multithreaded programs. Designing tools, either static or dynamic, aiming
at proving datarace freeness is thus a fundamental matter.
This paper takes root in the european MOBIUS project1 where several program verifi-
cation techniques have been machine checked with respect to a formal semantics of the
sequential JAVA bytecode language. The project has also investigated several verifica-
tion techniques for multithreaded JAVA but we need a formal guarantee that reasoning
on interleaving semantics is safe. While a JAVA memory model’s formalisation has been
done in COQ [9] and a machine-checked proof of the data race free guarantee has been
given in [2] we try to complete the picture formally proving data race freeness. We study
how such a machine-checked formalisation can be done for the race detection analysis
recently proposed by Naik and Aiken [16–18].
The general architecture of our development is sketched in Figure 1. We formalise
four static analyses : a context-sensitive points-to analysis, a must-lock analysis, a con-
ditional must-not alias analysis based on disjoint reachability and a must-not thread
escape analysis. In order to ensure the data-race freeness of the program, these analyses
are used to refine, in several stages an initial over-approximation of the set of poten-
tial races of a program, with the objective to obtain an empty set at the very last stage.
Each analysis is mechanically proved correct with respect to an operational semantics.
However, we consider three variants of semantics. While the first one is a standard
small-step semantics, the second one attaches context information to each reference
and frame. This instrumentation makes the soundness proof of the points-to analysis
easier. The last semantics handles more instrumentation in order to count method calls
and loop iterations. Each instrumentation is proved correct with respect to the seman-
tics just above it. The notion of safe instrumentation is formalised through a standard
simulation diagram.
The main contributions of our work are as follows.

– Naik and Aiken have proposed one of the most powerful data race analysis of the
area. Their analyser relies on several stages that remove pairs of potential races.
Most of these layers have been described informally. The most technical one has
been partially proved correct with pencil and paper for a sequential While lan-
guage [17]. We formalise their work in COQ for a realistic bytecode language with
unstructured control flow, operand stack, objects, virtual method calls and lock and
unlock operations for threads synchronization.
– Our formalisation is an open framework with three layers of semantics. We for-
malise and prove correct four static analyses on top of these semantics. We expect
our framework to be sufficiently flexible to allow easy integration of new certified
blocks for potential race pruning.
1
http://mobius.inria.fr
214 F. Dabrowski and D. Pichardie

Fig. 1. Architecture of the development

2 A Challenging Example Program

Figure 2 presents an example of a source program for which it is challenging to formally

prove race freeness. This example is adapted from the running example given by Naik
and Aiken [17] in a While language syntax. The program starts in method main by
creating in a first loop, a simple linked list l and then launches a bunch of threads of
class T that all share the list l in their field data. Each thread, launched in this way,
chooses non deterministically a cell m of the list and then updates m.val.f, using a
lock on m.
Figure 3 presents the potential races computed for this example. A data race is de-
scribed by a triplet (i, f, j) where i and j denote the program points in conflicts and
f denotes the accessed field. The first over-approximation, the set of Original Pairs is
simply obtained by typing: thanks to JAVA’s strong typing, a pair of accesses may be
involved in a race only if both access the same field and at least one access is a write.
For each other approximation, a marked indicates a potential race. The program is in
fact data race free but the size of the set of Original Pairs (13 pairs here) illustrates the
difficulty of statically demonstrating it.
Following [18], a first approximation of races computes the field accesses that are
reachable from the entry point of the program and removes also pairs where both
accesses are taken by the main thread (Reachable pairs). Some triplets may also be
removed with an alias analysis that shows that two potential conflicting accesses are not
in alias (Aliasing pairs). Both sets rely on the points-to analysis presented in Section 4.
Among the remaining potential races, several triplets can be disabled by a must-not
thread escape analysis that predicts a memory access only concerns a reference which
A Certified Data Race Analysis for a Java-like Language 215

class Main() { class A{};

void main() { class List{ T val; List next; };
List l = null;
while (*) { class T extends java.lang.Thread {
List temp = new List; A f;
1: temp.val = new T; List data;
2: temp.val.f = new A; void run(){
3: temp.next = l; while(*) {
l = temp; }; 6: List m = this.data;
while (*) { 7: while (*) { m = m.next; }
T t = new T; synchronized(m)
4: t.data = l; 8: { m.val.f = new A; }};
t.start(); }
5: t.f = null; }}}; };

Fig. 2. A challenging example program

Original Reachable Aliasing Unlocked Escaping

(1, val, 1), (1, val, 2), (2, f, 2), (3, next, 3),

(4, data, 4)
(2, f, 5)
(5, f, 5)
(4, data, 6), (3, next, 7), (1, val, 8), (2, f, 8)
(5, f, 8)
(8, f, 8)

Fig. 3. Potential race pairs in the example program

is local to a thread at the current point (Escaping pairs). The last potential race (8, f, 8)
requires the most attention since several threads of class T are updating fields f in par-
allel. These writes are safe because they are guarded by a synchronization on an object
which is the only ancestor of the write target in the heap. Such reasoning relies on the
fact that if locks guarding two accesses are different then so are the targeted memory lo-
cations. The main difficulty comes when several objects allocated at the same program
point, e.g. within a loop, may point to the same object. This last triplet is removed by
the conditional must not alias presented in Section 5.

3 Standard Semantics

The previous example can be compiled into a bytecode language whose syntax is
given below. The instruction set allows to manipulate objects, call virtual methods, start
threads and lock (or unlock) objects for threads synchronization.
Compared to real JAVA, we discard all numerical manipulations because they are
not relevant to our purpose. Static fields, static methods and arrays are not managed
216 F. Dabrowski and D. Pichardie

here but they are nevertheless source of several potential data races in JAVA programs.
Naik’s approach [18] for these layers is similar to the technique developed for objects.
We estimate that adding these language features would not bring new difficulties that we
have not covered yet in the current work. At last, as Naik and Aiken did before us, we
only cover synchronization by locks without join, wait and interruption mechanisms.
Our approach is sound in presence of such statements, but doesn’t take into account
the potential races they could prevent. The last missing feature is the JAVA’s exception
mechanism. Exceptions complicate the control flow of a JAVA program. We expect that
handling this mechanism would increase the amount of formal proof but will not require
new proof techniques. This is left for further work.

Program syntax. A program is a set of classes, coming with a Lookup function match-
ing signatures and program points (allocation sites denoting class names) to methods.
Cid ⊇ {cid , . . .} F ⊇ {f, g, h, . . .} Mid ⊇ {mid , . . .}
V ⊇ {x, y, z, . . .} Msig = Mid × Cnid × (Cid ∪ {void})

M {sig ∈ Msig ; body ∈ N inst}

C {name ∈ Cid ; fields ⊆ F; methods ⊆ M}

inst ::= aconstnull | new cid | aload x | astore x | getfield f | putfield f

Semantics Domain. The dynamic semantics of our language is defined over states as a
labelled transition system. States and labels, or events, are defined in Figure 4, where
→ stands for total functions and stands for partial functions. We distinguish location
and memory location sets. The set of locations is kept abstract in this presentation. In
this section, a memory location is itself a location (L = O). This redundancy will be
useful when defining new instrumented semantics where memory locations will carry
more information (Sections 4 and 5). In a state (L, σ, μ), L maps memory locations (that
identify threads) to call stacks, σ denotes the heap that associates memory locations to
objects (cid , map) with cid a class name and map a map from fields to values. We note
class(σ, l) for fst (σ(l)) when l ∈ dom(σ). A locking state μ associates with every
location a pair ( , n) if is locked n times by and the constant free if is not held
ppt
by any thread. An event (, ?ppt f , ) (resp. (, !f , )) denotes a read (resp. a write) of a
field f , performed by the thread over the memory location , at a program point ppt .
An event τ denotes a silent action.
e
Transition system. Labelled transitions have the form st → st (when e is τ we simply
omit it). They rely on the usual interleaving semantics, as expressed in the rule below.
e
L = cs L; (cs, σ, μ) → (L , σ , μ )
e
(L, σ, μ) → (L , σ , μ )
e
Reductions of the shape L; (cs, σ, μ) → (L , σ , μ ) are defined in Figure 5.
Intuitively, such a reduction expresses that in state (L, σ, μ), reducing the thread defined
A Certified Data Race Analysis for a Java-like Language 217

L (location)
O=L (memory location)
O⊥ v ::= | Null (value)
s ::= v :: s | ε (operand stack)
V ar → O⊥ ρ (local variables)
O Cid × (F → O⊥ ) σ (heap)
PPT = M × N ppt ::= (m, i) (program point)
CS cs ::= (m, i, s, ρ) :: cs | (call stack)
O CS L (thread call stacks)
O → ((O × N∗ ) ∪ {free}) μ (locking state)
st ::= (L, σ, μ) (state)

e ::= τ | (, ?ppt
f , ) | (, !f , ) (event)
ppt

Fig. 4. States and actions

σ[.f ← v] f = v (, 1) if μ( ) = free

(acquire μ) =
σ[.f ← v] f = σ f if = (, n + 1) if μ( ) = (, n)
σ[.f ← v] f = σ f if f = f (acquire μ) = μ if =

(a) Notations
?f
ppt

getfield f ; ; ppt (i, :: s, ρ, σ) −−−−→1 (i + 1, (σ f ) :: s, ρ, σ) if ∈ dom(σ)

!f
ppt

putfield f ; ; ppt (i, v :: :: s, ρ, σ) −−−−→1 (i + 1, s, ρ, σ[ .f → v]) if ∈ dom(σ)

new cid ; ; ppt (i, s, ρ, σ) −→1 (i + 1, :: s, ρ, σ[ → new (cid )]) where ∈ dom(σ)
e
(m.body) i; ; (m, i) (i, s, ρ, σ) →1 (i , s , ρ , σ )
L = L[ → (m, i , s , ρ ) :: cs]
e (1)
L; ((m, i, s, ρ) :: cs, σ, μ) → (L , σ , μ)

(m.body) i = invokevirtual mid : (cn id ) rtype (m.body) i = start ¬( ∈ dom(L))

s = vn :: . . . :: v1 :: :: s Lookup (run : ()void) class(σ, ) = m1
Lookup (mid : (cnid )rtype) class(σ,
) = m 1 ρ1 = [0 → ]

ρ1 = [0 → , 1 → v1 , . . . , n → vn ] L = L[ → (m, i + 1, s , ρ) :: cs,

L = L[ → (m1 , 0, ε, ρ1 ) :: (m, i + 1, s , ρ) :: cs] → (m1 , 0, ε, ρ1 ) :: ]

L; ((m, i, s, ρ) :: cs, σ, μ) → (L , σ, μ) L; ((m, i, :: s , ρ) :: cs, σ, μ) → (L , σ, μ)

(m.body) i = monitorenter
μ ∈ {free, (, n)} (m.body) i = monitorexit
μ = acquire μ μ = acquire μ
L = L[ → (m, i + 1, s, ρ) :: cs] L = L[ → (m, i + 1, s, ρ) :: cs]
L; ((m, i, :: s, ρ) :: cs, σ, μ) → (L , σ, μ ) L; ((m, i, :: s, ρ) :: cs, σ, μ) → (L , σ, μ )
(b) Reduction rules

Fig. 5. Standard Dynamic Semantics

218 F. Dabrowski and D. Pichardie

by the memory location and the call stack cs, by a non deterministic choice, produces
the new state (L , σ , μ ). For the sake of readability, we rely on an auxiliary relation of
e
the shape instr; ; ppt (i, s, ρ, σ) →1 (i , s , ρ , σ ) for reduction of intra-procedural
instructions. In Figure 5, we consider only putfield, getfield and new. Reductions
for instructions are standard and produce a τ event. The notation σ[.f ← v] for field
update, where ∈ dom(σ), is defined in Figure 5(a). It does not change the class of
an object. The reduction of a new instruction pushes a fresh address onto the operand
stacks and allocates a new object in the heap. The notation σ[ ← new (cid )], where
¬( ∈ dom(σ)), denotes the heap σ with a new object, at location , of class cid and
with all fields equals to Null. The auxiliary relation is embedded into the semantics
by rule (1). Method invocation relies on the lookup function for method resolution and
generates a new frame. Thread spawning is similar to method invocation. However, the
new frame is put on top of an empty call stack. We omit the reduction rules for return
and areturn, those rules are standard and produce a τ event. For monitorenter and
monitorexit we use a partial function acquire defined in Figure 5(a). Intuitively,
acquire μ results from thread locking object in μ.
We write RState(P ) for the set of states that contains the initial state of a program P ,
that we do not describe here for conciseness concerns, and that is closed by reduction.
A data race is a tuple (ppt 1 , f, ppt 2 ) such that Race(P, ppt 1 , f, ppt 2 ) holds.
ppt 1
1 !f 0 R
2 0 ppt 2 ppt 2
st ∈ RState(P ) st −−−−−→ st1 st −−−−→ st2 R ∈ {?f , !f } 1 = 2
Race(P, ppt 1 , f, ppt 2 )
The ultimate goal of our certified analyser is to guarantee Data Race Freeness, i.e.
for all ppt 1 , ppt 2 ∈ P P T and f ∈ F, ¬Race(P, ppt 1 , f, ppt 2 ).

4 Points-to Semantics
Naik and Aiken make intensive use of points-to analysis in their work. Points-to analy-
sis computes a finite abstraction of the memory where locations are abstracted by their
allocation site. The analysis can be made context sensitive if allocation sites are distin-
guished wrt. the calling context of the method where the allocation occurs.
Many static analyses use this kind of information to have a conservative approxi-
mation of the call graph and the heap of a program. Such analyses implicitly reason
on instrumented semantics that directly manipulates informations on allocation sites
while a standard semantics only keeps track of the class given to a reference during its
allocation. In this section we formalise such an intermediate semantics.
This points-to semantics takes the form of a COQ module functor
Module PointsToSem (C:CONTEXT). ... End PointsToSem.

parameterised by an abstract notion of context which captures a large variety of points-

to contexts. Figure 6 presents this notion.
A context is given by two abstract types pcontext and mcontext2 for pointer
contexts and method contexts. Function make_new_context is used to create a new
2
mcontext is noted Context in Section 5.
A Certified Data Race Analysis for a Java-like Language 219

Module Type CONTEXT.

Parameter pcontext : Set. ( ∗ p o i n t e r c o n t e x t ∗ )

Parameter mcontext : Set. ( ∗ method c o n t e x t ∗ )

Parameter make_new_context :
method → line → classId → mcontext → pcontext.
Parameter make_call_context :
method → line → mcontext → pcontext → mcontext.
Parameter get_class : program → pcontext → option classId.

Parameter class_make_new_context : ∀ p m i cid c,

body m i = Some (New cid) →
get_class p (make_new_context m i cid c) = Some cid.

Parameter init_mcontext : mcontext.

Parameter init_pcontext : pcontext.

Parameter eq_pcontext : ∀ c1 c2:pcontext, {c1=c2}+{c1<>c2}.

Parameter eq_mcontext : ∀ c1 c2:mcontext, {c1=c2}+{c1<>c2}.

End CONTEXT.

Fig. 6. The Module Type of Points-to Contexts

pointer context (make_new_context m i cid c) when an allocation of an object

of class cid is performed at line i of a method m, called in a context c. We create a new
method context (make_call_context m i c p) when building the calling context
of a method called on an object of context p, at line i of a method m, itself called in
a context c. At last, (get_class prog p) allows to retrieve the class given to an
object allocated in a context p. The hypothesis class_make_new_context ensures
consistency between get_class and make_new_context.
The simplest instantiation of this semantics takes class name as pointer contexts and
uses a singleton type for method context. A more interesting instantiation is k-objects
sensitivity: contexts are sequences of at most k allocation sites (m, i) ∈ M × N. When
creating an object at site (m, i) in a context c, we attach to this object a pointer context
(m, i) ⊕k c defined by (m, i) · c if |c|=k, c = c · (m , i ) and (m, i) · c if |c| < k,
without any change to the current method context. When calling a method on an object
we build a new frame with the same method context as the pointer context of the object.
The definition of this semantics is similar to the standard semantics described in
the previous section, except that memory location are now couples of the form (, p)
with a location and p a pointer context. We found convenient for our formalisation
to deeply instrument the heap that is now a partial function from memory location of
the form (, p) to objects. This allows us to state a property about the context of a
location without mentioning the current heap (in contrast to the class of a location in
the previous standard semantics). The second change concerns frames that are now of
the form (m, i, c, s, ρ) with c being the method context of the current frame.
220 F. Dabrowski and D. Pichardie

In order to reason on this semantics and its different instantiations we give to this
module a module type POINTSTO_SEM such that for all modules of type CONTEXT,
PointsToSem(C) : POINTSTO_SEM.
Several invariants are proved on this semantics, for example that if any memory
locations (, p1 ) and (, p2 ) are in the domain of a heap reachable from a initial state,
then p1 = p2 .

Module PointsToSemInv (S:POINTSTO_SEM). ...

Lemma reachable_wf_heap : ∀ p st,
reachable p st →
match st with (L,sigma,mu) ⇒
∀ l p1 p2, sigma (l,p1)<>None → sigma (l,p2)<>None → p1=p2
end.
Proof. ... Qed.
End PointsToSemInv.
\vspace*{-3mm}

Safe Instrumentation. The analyses we formalise on top of this points-to semantics are
meant for proving absence of race. To transfer such a semantic statement in terms of
the standard semantics, we prove simulation diagrams between the transitions systems
of the standard and the points-to semantics. Such diagram then allows us to prove that
each standard race corresponds to a points-to race.

Module SemEquivProp (S:POINTSTO_SEM).

Lemma race_equiv :
∀ p ppt ppt’, Standard.race p ppt ppt’ → S.race p ppt ppt’.
Proof. ... Qed.
End SemEquivProp.
\vspace*{-2.5mm}

Points-to Analysis. A generic context-sensitive analysis is specified as a set of con-

straints attached to a program. The analysis is flow-insensitive for heap and flow-
sensitive for local variables and operand stacks. Its result is given by four functions
PtL: mcontext → method → line → var → (pcontext → Prop).
PtS: mcontext → method → line → list (pcontext → Prop).
PtR: mcontext → method → (pcontext → Prop).
PtF: pcontext → field → (pcontext → Prop).
that attach pointer context properties to local variables (PtL), operand stack (PtS) and
method returns (PtR). PtF is the flow-insensitive abstraction of the heap.
This analysis is parameterized by a notion of context and proved correct with respect
to a suitable points-to semantics. The final theorem says that if (PtL,PtS,PtR,PtF) is
a solution of the constraints system then it correctly approximates any reachable states.
The notion of correct approximation expresses, for example, that all memory locations
(, p) in the local variables ρ or operand stack s of a reachable frame (m, i, c, s, ρ) is
such that p is in the points-to set attached to PtL or PtS for the corresponding flow
position (m, i, c).
A Certified Data Race Analysis for a Java-like Language 221

Must-Lock Analysis. Fine lock analysis requires to statically understand which locks
are definitely held when a given program point is reached. For this purpose we spec-
ify and prove correct a flow sensitive must-lock analysis that computes the following
informations:
Locks: method → line → mcontext → (var → Prop).
Symbolic: method → line → list expr.

At each flow position (m, i, c), Locks computes an under-approximation of the local
variables that are currently held by the thread reaching this position. The specification
of Locks depends on the points-to information PtL computed before. This is a use-
ful information for the monitorexit instruction because the unlocking of a variable
x can only cancel the lock information of the variables that may be in alias with x.
Symbolic is a flow sensitive abstraction of the operand stack that manipulate symbolic
expressions. Such expressions are path expressions of the form x, x.f , etc... Lock anal-
ysis only requires variable expressions but more complex expressions are useful for the
conditional must lock analysis given in Section 5.

Removing False Potential Races. The previous points-to analysis supports the first two
stages of the race analyser of Naik et al [18]. The first stage prunes the so called Reach-
ablePairs. It only keeps in OriginalsPairs the accesses that may be reachable from a
start() call site that is itself reachable from the main method, according to the
points-to information. Moreover, it discards pairs where each accesses are performed
by the main thread because there is only one thread of this kind.
The next stage keeps only the so called AliasingPairs using the fact that a conflicting
access can only occur on references that may alias. In the example of the Figure 2, the
potential race (5, f, 8) is cancelled because the points-to information of t and m.val
are disjoints.
For each stage we formally prove that all these sets over-approximate the set of real
races wrt. the points-to semantics.

5 Counting Semantics

The next two stages of our analysis require a deeper instrumentation. We introduce a
new semantics with instrumentation for counting method calls and loop iterations. This
semantics builds on top of the points-to semantics and uses k-contexts. All develop-
ments of this section were formalized in COQ. However, for the sake of conciseness we
introduce them in a paper style. In addition to the allocation site (m, i) and the calling
context c of an allocation, this semantics captures counting information. More precisely,
it records that the allocation occurred after the nth iteration of flow edge L(m, i) in the
k th call to m in context c. Given a program P , the function L ∈ M × N → Flow, for
Flow = N × N, must satisfy Safe P (L) as defined below:

Safe P (L) ≡ ∀m, i, cid . (m.body) i = new cid ⇒

∀n > 0, j1 , . . . , jn . leadsTo(m, i, i, j1 · . . . · jn ) ⇒
∃k < n.(jk , jk+1 ) = L(m, i)
222 F. Dabrowski and D. Pichardie

where leadsT o(m, i, j, j1 · . . . · jn ) states that j1 · . . . · jn is a path from i to j in

the control flow graph of m. Intuitively, the semantics counts, and records, iteration of
all flow edges while L maps every allocation site to a flow edge, typically a loop entry.
Obviously, the function defined by L(m, i) = (i, i+1) is safe. However, for the purpose
of static analysis we need to observe that two allocations occurred within the same loop
and, thus, we need a less strict definition (but still strict enough to discriminate between
different allocations occurring at the same site). The function L might be provided by
the compiler or computed afterwards with standard techniques. For example, in the
bytecode version of our running example, mapping the three allocation sites of the first
loop with the control flow edge of the first one is safe. We update the semantic domain
as follows:
mV ect = M × Context → N ω (method vector)
lV ect = M × Context × Flow → N π (iteration vector)
CP cp ::= m, i, c, ω, π (code pointer)
O = L × CP (memory location)
CS cs ::= (cp, s, ρ) :: cs | (call stack)
st ::= (L, σ, μ, ωg ) state

A frame holding the code pointer m, i, c, ω, π is the ω(m, c)th call to method m in
context c (a k-context) since the execution began and, so far, it has performed π(m, c, φ)
steps through edge φ of its control flow graph. In a state (L, σ, μ, ωg ), ωg is a global
method vector used as a shared call counter by all threads.
Below, we sketch the extended transition system by giving rules for allocation and
method invocation.
(m.body) i = new cid ∀cp.¬(( , cp) ∈ dom(σ))

L = L[ → (m, i + 1, c, ω, π , :: s, ρ) :: cs]

π = π[(m, c, (i, i + 1)) → π(m, c, (i, i + 1)) + 1] = ( , m, i, c, ω, π)

L; ((m, i, c, ω, π, s, ρ) :: cs, σ, μ, ωg ) → (L , σ[ → new(cid)], μ, ωg )

(m.body) i = invokevirtual mid : (cidn ) rtype

s = vn :: . . . :: v1 :: :: s = (a0 , m0 , i0 , c0 , ω0 , π0 )

Lookup (mid : (cidn )rtype) class(σ, ) = m1
c1 = (m0 , i0 ) ⊕k c0
π = π[(m, c, (i, i + 1)) → π(m, c, (i, i + 1)) + 1]
ω1 = ω[(m1 , c1 ) → ωg (m1 , c1 ) + 1] π1 = π [(m1 , c1 , . , . ) → 0]

ρ1 = [0 → , 1 → v1 , . . . , n → vn ] ωg = ωg [(m1 , c1 ) → ωg (m1 , c1 ) + 1]
L = L[ → (m1 , 0, c1 , ω1 , π1 , ε, ρ1 ) :: (m, i + 1, c, ω, π , s , ρ) :: cs]
L; ((m, i, c, ω, π, s, ρ) :: cs, σ, μ, ωg ) → (L , σ, μ, ωg )

For allocation, we simply annotate the new memory location with the current code
pointer and record the current move. For method invocation, the caller records the cur-
rent move. The new frame receives a copy of the vectors of the caller (after the call)
where the current call is recorded and the iteration vector corresponding to this call
is reseted. Except for thread spawning, omitted rules simply record the current move.
A Certified Data Race Analysis for a Java-like Language 223

Thread spawning is similar to method invocation except that the new frame receives
fresh vectors rather than copies of the caller’s vectors.

Safe Instrumentation. As we did between the standard and the points-to semantics we
prove a diagram simulation between the points-to semantics and the counting seman-
tics. Is ensures that all points-to races correspond to a counting race. However, in order
to use the soundness theorem of the must-lock analysis we also need to prove a bisim-
ulation diagram. It ensures that all states that are reachable in the counting semantics
correspond to a reachable state in the points-to semantics. It allows us to transfer the
soundness result of the must-lock analysis in terms of the counting semantics.

Semantics invariants. Proposition 1 states that our instrumentation discriminates be-

tween memory locations allocated at the same program point. As expected, to discrim-
inate between memory locations allocated at program point (m, i) in context c, it is
sufficient to check the values of ω(m, c) and π(m, c, L(m, i)).
Proposition 1. Given a program P , if (L, σ, μ, ωg ) is a reachable state of P and if
Safe P (L) holds then, for all 1 , 2 ∈ dom(σ), we have

1 = (a1 , m, i1 , c, ω1 , π1 ) ∧ 2 = (a2 , m, i2 , c, ω2 , π2 ) ∧

⇒ 1 = 2
ω1 (m, c), π1 (m, c, L(m, i)) = ω2 (m, c), π2 (m, c, L(m, i))

Proving Proposition 1 requires stronger semantics invariants. Intuitively, when reaching

an allocation site, no memory location in the heap domain should claim to have been
allocated at the current iteration. More formally, we have proved that any reachable
state is well-formed. A state (L, σ, μ, ωg ) is said to be well formed if for any frame
(m, i, c, ω, π, s, ρ) in L and for any memory location (, m0 , i0 , c0 , ω0 , π0 ) in the
heap domain we have

ω(m, c) ≤ ωg (m, c) ω(m0 , c0 ) ≤ ωg (m0 , c0 )

localCoherency (L, (, m0 , i0 , c0 , ω0 , π0 ), m, i, c, ω, π)
ω(m, c) = ω (m, c) for all distinct frame (m, i , c, ω , π , s , ρ ) in L

where localCoherency (L, (, m0 , i0 , c0 , ω0 , π0 ), m, i, c, ω, π) stands for

(m, c) = (m0 , c0 ) ⇒ ω(m, c) = ω0 (m, c) ⇒

⎛0 (m, c, L(m, i0 )) ≤ π(m, c, L(m, i0 )) ∧
π ⎞
π0 (m, c, L(m, i0 )) = π(m, c, L(m, i0 )) ⇒
⎝ i0 = i ∧ ⎠
leadsTo(m, i, i0 , j1 · . . . · jn ) ⇒ ∃k < n, (jk , jk+1 ) = L(m, c)

Type And Effect System. We have formalized a type and effect system which captures
the fact that some components of vectors of a memory location are equals to the same
components of vectors of : (1) the current frame when the memory location is in local
variables or in the stack of the frame or (2) of another memory location pointing to
224 F. Dabrowski and D. Pichardie

it in the heap. By lack of space, we cannot describe the type and effect system here.
Intuitively, we perform a points-to analysis where allocation sites are decorated with
masks which tell us which components of vectors of the abstracted memory location
match the same components in vectors of a given code pointer (depending on whether
we consider (1) or (2)). Formally, an abstract location τ ∈ T in our extended points-to
analysis is a pair (A, F ) where A is a set of allocation sites and F maps every element
of A to a pair Ω, Π of abstract vectors. Abstract vectors are defined by Ω ∈ MVect =
M × Context → {1, } and Π ∈ LVect = M × Context × Flow → {1, }.
Our analysis computes a pair (A, Σ) where A provides flow-sensitive points-to in-
formation with respect to local variables and Σ provides flow-insensitive points-to in-
formation with respect to the heap. The decoration of a points-to information acts as
a mask. For a memory location held by a local variable, it tells us which components
of its vectors (those set to 1) match those of the current frame. When a memory loca-
tion points-to another one in the heap, it tells us which components of their respective
vectors are equal.
Below we present the last stages we use for potential race pruning. For each stage
the result stated by proposition 1 is crucial. Indeed, given an abstract location with allo-
cation site (m, i, c), they rely on a property stating that whenever the decoration states
that Ω(m, c) = Π(m, c, L(m, i)) = 1, the abstraction describes a unique concrete lo-
cation. This property results from the combination of the abstraction relation defined in
our type system and of Proposition 1.

Must Not Escape Analysis. We use the flow sensitive element A of the previous type
and effect system to check that, at some program point, an object allocated by a thread
is still local to that thread (or has not escaped yet, i.e. it is not reachable from others).
More preciselly, the type and effect systems is used to guarantee that, at some program
point, the last object allocated by a thread at a given allocation site is still local to that
thread. In particular, our analysis proves that an access performed at point 4 in our
running example, is on the last object of type T allocated by the main thread (which
is is local, although at each loop iteration, the new object eventually escapes the main
thread). On the opposite, the pair (5, f, 8) cannot be removed by this analysis since the
location has already escaped the main thread at point 5. This pair is removed by the
aliasing analysis. Our Escape analysis improves on that of Naik and Aiken which does
not distinguish among several allocations performed at the same site.

Conditional Must Not Alias Analysis. The flow-insensitive element Σ of the previous
type and effect system is used to define an under-approximation DR Σ of the notion of
disjoint reachability. Given a finite set of heaps {σ1 , . . . , σn } and a set of allocation
sites H, the disjoint reachability set DR{σ1 ,...,σn } (H) is the set of allocation sites h
such that whenever an object o allocated at site h may be reachable by one or more
field dereferences for some heap in {σ1 , . . . , σn }, from objects o1 and o2 allocated at
any sites in H then o1 = o2 . It allows to remove the last potential race of our running
example. For each potential conflict between two program points i1 and i2 , we first
compute the set May 1 and May 2 of sites that the corresponding targeted objects may
A Certified Data Race Analysis for a Java-like Language 225

points-to, using the previous points-to analysis. Then we use the must-lock analysis
to compute the sets Must 1 and Must 2 of allocation sites such that for any h in Must 1
(resp. Must 2 ), there must exists a lock l currently held at point i1 (resp. i2 ) and allocated
at h. The current targeted object must furthermore be reachable from l with respect to
the heap history that leads to the current point. This last property is ensured by the
path expressions that are computed with a symbolic operand stack during the must-lock
analysis. At last, we remove the potential race if and only if Must 1 = ∅, Must 2 = ∅
and
May 1 ∩ May 2 ⊆ DR Σ (Must 1 ∪ Must 2 )
We formally prove that any potential race that succeeds this last check is not a real race.

6 Related Work

Static race detection. Most works on static race detection follow the lock based ap-
proach, as opposed with event ordering based approaches. This approach imposes that
every pair of concurrent accesses to the same memory location are guarded by a com-
mon lock and is usually enforced by means of a type and effect discipline.
Early work [5] proposes an analysis for a λ-calculus extended with support for shared
memory and multiple threads. Each allocation comes in the text of the program with an
annotation specifying which lock protects the new memory location and the type and ef-
fect system checks that this lock is held whenever it is accessed. More precisely, the an-
notation refers to a lexically scoped lock definition, thus insuring unicity. To overcome
the limitation imposed by the lexical scope of locks, existential types are proposed as
a solution to encapsulate an expression with the locks required for its evaluation. This
approach was limited in that it was only able to consider programs where all accesses
are guarded, even when no concurrent access is possible. Moreover, it imposed the use
of specific constructions to manage existential types.
A step toward treatment of realistic languages was made in [7] which considers the
JAVA language and supports various common synchronization patterns, classes with
internal synchronization, classes that require client-side synchronization and thread-
local classes. Aside from additional synchronization patterns, the approach is similar to
the previous one and requires annotations on fields (the lock protecting the field) and
method declarations (locks that must be held at invocation time). However, the object-
oriented nature of the JAVA language is used as a more natural mean for encapsulation.
Fields of an object must be protected by a lock (an object in JAVA) accessible from
this object. For example, x.f may be protected by x.g.h where g and h are final fields
(otherwise, two concurrent accesses to x.f guarded by x.g.h could use different locks).
Client-side synchronization and thread-local classes are respectively handled by classes
parametrized by locks and a simple form of escape analysis. A similar approach, using
ownership types to ensure encapsulation, was taken in [3, 4].
The analysis we consider here is that of [17, 18]. Thanks to the disjoint reachability
property and to an heavy use of points-to analysis, it is more precise and captures more
idioms than those above. Points-to analysis also makes it more costly but it has been
proved that such analyses are tractable thanks to BDD based resolution techniques [22].
226 F. Dabrowski and D. Pichardie

Machine checked formalisation for multithreaded JAVA. There is a growing interest in

machine checked semantics proof. Leroy [13] develops a certified compiler from Cmi-
nor (a C-like imperative language) to PowerPC assembly code in COQ, but only in a
sequential setting. Hobor et al. [8] define a modular operational semantics for Con-
current C minor and prove the soundness of a concurrent separation logic w.r.t. it, in
COQ . Several formalisation of the sequential JVM and its type system have been per-
formed (notably the work of Klein and Nipkow [11]), but few have investigated its
multithreaded extension. Petri and Huisman [19] propose a realistic formalization of
multithreaded JAVA bytecode in COQ, BICOLANO MT that extends the sequential se-
mantics considered in the MOBIUS project. Lochbihler extends the JAVA source model
of Klein and Nipkow [11] with an interleaving semantics and prove type safety. As-
pinall and Sevcik formalise the JAVA data race free guarantee theorem that ensures that
data races free program can only have sequentially consistent behaviors. The work of
Petri and Huisman [9] follows a similar approach. The only machine checked proof of a
data race analyser we are aware of is the work of Lammich and Müller-Olm [12]. Their
formalisation is done at the level of an abtract semantics of a flowgraph-based program
model. They formalise a locking analyses with an alias analysis technique simpler than
the one used by Naik and Aiken.

7 Conclusions and Future Work

In this paper, we have presented a formalisation of a JAVA bytecode data race anal-
ysis based on four advanced static analyses: a context-sensitive points-to analysis, a
must-lock analysis, a must-not thread escape analysis and a conditional must-not-alias
analysis. Our soundness proofs for these analyses rely on three layers of semantics
which have been formally linked together with simulation (and sometimes bisimulation)
proofs. The corresponding COQ development has required a little more than 15.000 lines
of code. It is available on-line at http://www.irisa.fr/lande/datarace.
This is already a big achievement and as far as we know, one of the first attempt
to formally prove data race freeness. However the current specification is not exe-
cutable. Our analyses are only specified as sets of constraints on logical domains as
(pcontext → Prop). We are currently working on the implementation part, starting
by an Ocaml prototype to mechanically check the example given in Section 2. Then
we will have to implement in COQ the abstract domains and the transfer functions of
each analysis, following the methodology proposed in our previous work [6]. Thanks
to the work we have presented in this paper, these transfer functions will not have to be
proved sound with respect to an operational semantics. It is sufficient (and far easier)
to prove that they refine correctly the logical specification we have developed here. We
plan to only formalise a result checker and check with it the result given by the un-
trusted analyser written in Ocaml. Extracting an efficient checker is a challenging task
here because state-of-the-art points-to analysis implementations rely on such complex
symbolic techniques as BDD [22].

Acknowledgment. We thank Thomas Jensen and the anonymous TPHOLs reviewers for
their helpful comments.
A Certified Data Race Analysis for a Java-like Language 227

References
1. AMD. Amd64 architecture programmer’s manual volume 2: System programming. Techni-
cal Report 24593 (2007)
2. Aspinall, D., Sevcı́k, J.: Formalising java’s data race free guarantee. In: Schneider, K.,
Brandt, J. (eds.) TPHOLs 2007. LNCS, vol. 4732, pp. 22–37. Springer, Heidelberg (2007)
3. Boyapati, C., Lee, R., Rinard, M.: Ownership types for safe programming: preventing data
races and deadlocks. In: ACM Press (ed.) Proc. of OOPSLA 2002, New York, NY, USA, pp.
211–230 (2002)
4. Boyapati, C., Rinard, M.: A parameterized type system for race-free Java programs. In: ACM
Press (ed.) Proc. of OOPSLA 2001, New York, NY, USA, pp. 56–69 (2001)
5. Flanagan, C., Abadi, M.: Types for safe locking. In: Swierstra, S.D. (ed.) ESOP 1999. LNCS,
vol. 1576, pp. 91–108. Springer, Heidelberg (1999)
6. Cachera, D., Jensen, T., Pichardie, D., Rusu, V.: Extracting a Data Flow Analyser in Con-
structive Logic. Theoretical Computer Science 342(1), 56–78 (2005)
7. Flanagan, C., Freund, S.N.: Type-based race detection for java. In: Proc. of PLDI 2000, pp.
219–232. ACM Press, New York (2000)
8. Hobor, A., Appel, A.W., Zappa Nardelli, F.: Oracle semantics for concurrent separation logic.
In: Drossopoulou, S. (ed.) ESOP 2008. LNCS, vol. 4960, pp. 353–367. Springer, Heidelberg
(2008)
9. Huisman, M., Petri, G.: The Java memory model: a formal explanation. In: Verification and
Analysis of Multi-threaded Java-like Programs, VAMP (2007) (to appear)
10. Intel. Intel 64 architecture memory ordering white paper. Technical Report SKU 318147-001
(2007)
11. Klein, G., Nipkow, T.: A machine-checked model for a Java-like language, virtual machine
and compiler. ACM Transactions on Programming Languages and Systems 28(4), 619–695
(2006)
12. Lammich, P., Müller-Olm, M.: Formalization of conflict analysis of programs with proce-
dures, thread creation, and monitors. In: The Archive of Formal Proofs (2007)
13. Leroy, X.: Formal certification of a compiler back-end, or: programming a compiler with a
proof assistant. In: Proc. of POPL 2006, pp. 42–54. ACM Press, New York (2006)
14. Leveson, N.G.: Safeware: system safety and computers. ACM, NY (1995)
15. Manson, J., Pugh, W., Adve, S.V.: The Java Memory Model. In: Proc. of POPL 2005, pp.
378–391. ACM Press, New York (2005)
16. Naik, M.: Effective Static Data Race Detection For Java. PhD thesis, Standford University
(2008)
17. Naik, M., Aiken, A.: Conditional must not aliasing for static race detection. In: Proc. of
POPL 2007, pp. 327–338. ACM Press, New York (2007)
18. Naik, M., Aiken, A., Whaley, J.: Effective static race detection for java. In: Proc. of PLDI
2006, pp. 308–319. ACM Press, New York (2006)
19. Petri, G., Huisman, M.: BicolanoMT: a formalization of multi-threaded Java at bytecode
level. In: Bytecode 2008. Electronic Notes in Theoretical Computer Science (2008)
20. Poulsen, K.: Tracking the blackout bug (2004)
21. Sun Microsystems, Inc. JSR 133 Expert Group, Java Memory Model and Thread Specifica-
tion Revision (2004)
22. Whaley, J., Lam, M.S.: Cloning-based context-sensitive pointer alias analysis using binary
decision diagrams. In: Proc. of PLDI 2004, pp. 131–144. ACM, New York (2004)
Formal Analysis of Optical Waveguides in HOL

Osman Hasan, Sanaz Khan Afshar, and Soﬁène Tahar

Dept. of Electrical & Computer Engineering, Concordia University,

1455 de Maisonneuve W., Montreal, Quebec, H3G 1M8, Canada
{o hasan,s khanaf,tahar}@ece.concordia.ca

Abstract. Optical systems are becoming increasingly important as they

tend to resolve many bottlenecks in the present age communications and
electronics. Some common examples include their usage to meet high
capacity link demands in communication systems and to overcome the
performance limitations of metal interconnect in silicon chips. Though,
the inability to efficiently analyze optical systems using traditional anal-
ysis approaches, due to the continuous nature of optics, somewhat lim-
its their application, specially in safety-critical applications. In order to
overcome this limitation, we propose to formally analyze optical systems
using a higher-order-logic theorem prover (HOL). As a first step in this
endeavor, we formally analyze eigenvalues for planar optical waveguides,
which are some of the most fundamental components in optical devices.
For the formalization, we have utilized the mathematical concepts of dif-
ferentiation of piecewise functions and one-sided limits of functions. In
order to illustrate the practical effectiveness of our results, we present
the formal analysis of a planar asymmetric waveguide.

1 Introduction
Optical systems are increasingly being used these days, mainly because of their
ability to provide high capacity communication links, in applications ranging
from ubiquitous internet and mobile communications, to not so commonly used
but more advanced scientific domains, such as optical integrated circuits, bio-
photonics and laser material processing. The correctness of operation for these
optical systems is usually very important due to the financial or safety critical
nature of their applications. Therefore, quite a significant portion of the design
time of an optical system is spent on analyzing the designs so that functional
errors can be caught prior to the production of the actual devices. Calculus plays
a significant role in such analysis. Nonliner differential equations with transcen-
dental components are used to model the electric and magnetic field components
of the electromagnetic light waves. The optical components are characterized by
their refractive indices and then the effects of passing electromagnetic waves of
visible and infrared frequencies through these mediums are analyzed to ensure
that the desired reflection and refraction patterns are obtained.
The analysis of optical systems has so far been mainly conducted by using
paper-and-pencil based proof methods [18]. Such traditional techniques are usu-
ally very tedious and always have some risk of an erroneous analysis due to the

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 228–243, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Formal Analysis of Optical Waveguides in HOL 229

complex nature of the present age optical systems coupled with the human-error
factor. The advent of fast and inexpensive computational power in the last two
decades opened up avenues for using computers in the domain of optical sys-
tem analysis. Nowadays, computer based simulation approaches and computer
algebra systems are quite frequently used to validate the optical system analysis
results obtained earlier via paper-and-pencil proof methods. In computer simu-
lation, complex electromagnetic wave models can be constructed and then their
behaviors in an optical medium of known refractive index can be analyzed. But,
computer simulation cannot provide 100% precise results since the fundamental
idea in this approach is to approximately answer a query by analyzing a large
number of samples. Similarly, computer algebra systems, which even though are
considered to be semi-formal and are very efficient in mathematical computa-
tions, also fail to guarantee correctness of results because they are constructed
using extremely complicated algorithms, which are quite likely to contain bugs.
Thus, these traditional techniques should not be relied upon for the analysis of
optical systems, especially when they are used in safety critical areas, such as
medicine, transportation and military, where inaccuracies in the analysis may
even result in the loss of human lives.
In the past couple of decades, formal methods have been successfully used for
the precise analysis of a verity of hardware and software systems. The rigorous
exercise of developing a mathematical model for the given system and analyz-
ing this model using mathematical reasoning usually increases the chances for
catching subtle but critical design errors that are often ignored by traditional
techniques like simulation. Given the sophistication of the present age optical
systems and their extensive usage in safety critical applications, there is a dire
need of using formal methods in this domain. However, due to the continuous na-
ture of the analysis and the involvement of transcendental functions, automatic
state-based approaches, like model checking, cannot be used in this domain.
On the other hand, we believe that higher-order-logic theorem proving offers a
promising solution for conducting formal analysis of optical systems. The main
reason being the highly expressiveness nature of higher-order logic, which can be
leveraged upon to essentially model any system that can be expressed in a closed
mathematical form. In fact, most of the classical mathematical theories behind
elementary calculus, such as differentiation, limit, etc., and transcendental func-
tions, which are the most fundamental tools for analyzing optical systems, have
been formalized in higher-order logic [6]. Though, to the best of our knowledge,
formal analysis of optical devices is a novelty that has not been presented in the
open literature so far using any technique, including theorem proving.
In this paper, as a first step towards using a higher-order-logic theorem prover
for analyzing optical systems, we present the formal analysis of planar optical
waveguides operating in the transverse electric (TE) mode, i.e., a mode when
electric field is transverse to the plane of incidence. A waveguide can be defined
as an optical structure that allows the confinement of electromagnetic light waves
within its boundaries by total internal reflection (TIR). It is considered to be one
of the most fundamental components of any optical system. Some of the optical
230 O. Hasan, S.K. Afshar, and S. Tahar

systems that heavily rely on optical waveguides, include ﬁber-optic communica-

tions links, fiber lasers and amplifiers for high-power applications, as well as all
optical integrated circuits. A planar waveguide, which we mainly analyze in this
paper, is a relatively simple but widely used structure for light confinement. It
is well accepted in the optics literature that the one-dimensional analysis of this
simple planar waveguide is directly applicable to many real problems and the
whole concept forms a foundation for more complex optical structures [18].
In order to formally describe the behavior of the planar waveguide, we model
the electric and magnetic field equations, which govern the passage of light waves
through a planar waveguide, in higher-order logic. The formalization is relatively
simple because in the TE mode there is no y − axis dependance, which allows
us to describe the electromagnetic fields as a small subset of Maxwell Equations.
Based on these formal definitions, we present the verification of the eignevalue
equation for a planar waveguide in the TE mode. This equation plays a vital
role in designing planar waveguides, as it provides the relationship between the
wavelength of light waves that need to be transmitted through a planar waveg-
uide and the planar waveguide’s physical parameters, such as refractive indices
and dimensions. In this formalization and verification, we required the mathe-
matical concepts of differentiation of piecewise functions and one-sided limits.
We built upon Harrison’s real analysis theories [6] for this purpose, which in-
clude the higher-order-logic formalization of differentiation and limits. We also
present some new definitions that allow us to reason about the differentiation of
piecewise functions and one-sided limits with minimal reasoning efforts. Finally,
in order to illustrate the effectiveness of the formally verified eigenvalue equa-
tion in designing real-world optical systems, we present the analysis of a planar
dielectric structure [18]. All the work described in this paper is done using the
HOL theorem prover [4]. The main motivations behind this choice include the
past familiarity with HOL along with the availability of Harrison’s real analysis
theories [6], which forms the fundamental core of our work.
The remainder of this paper is organized as follows. Section 2 gives a review of
related work. In Section 3, we provide a brief introduction about planar waveg-
uides along with their corresponding electromagnetic field equations and eigen-
values. In Section 4, we present the formalization of the electromagnetic fields
for a planar waveguide. We utilize this formalization to verify the eigenvalue
equation in Section 5. The analysis of a planar dielectric structure is presented
in Section 6. Finally, Section 7 concludes the paper.

2 Related Work

The continuous advancement of optical devices towards increased functionality

and performance comes with the challenge of developing analysis tools that are
able to keep up with the growing level of sophistication. Even though, there is
a signiﬁcant amount of research going on in this important area of analyzing
optical systems but, to the best of our knowledge, none of the available optical
analysis tools are based on formal methods and the work presented in this paper
Formal Analysis of Optical Waveguides in HOL 231

is the first one of its kind. In this section, we present a brief overview of the
state-of-the-art informal techniques used for optical system analysis.
The most commonly used computer based techniques for optical system anal-
ysis are based on simulation and numerical methods. Some examples include
the analysis of integrated optical devices [20], optical switches [16] and biosen-
sors [23]. Optical systems are continuous systems and thus the first step in
their simulation based analysis is to construct a discrete model of the given sys-
tem [5]. Once the system is discretized, the electromagnetic wave equations are
solved by numerical methods. Finite difference methods are the most commonly
used numerical approaches applied on wave equations. Finite difference meth-
ods applied to the time domain discretized wave equations are referred to as
the Finite Difference Time Domain (FDTD) methods [21] and to the frequency
domain discretized wave equations as the Finite Difference Frequency Domain
(FDFD) methods [19]. Solving equations with numerical methods itself imposes
an additional form of error on solutions of the problem. Besides inaccuracies,
another major disadvantage, associated with the numerical methods and simu-
lation based approaches, is the tremendous amount of CPU time and memory
requirements for attaining reasonable analysis results [10]. In [9,13], the authors
argued different methodologies to break the structure into smaller components
to improve the memory consumption and speed of the FDTD methods. Simi-
larly, some enhancements for the FDFD method are proposed in [22,12]. There
is extensive effort on this subject and although there are some improvements but
the inherent nature of numerical and simulation based methods fails all these
effort to bring 100% accuracy in the analysis, which can be achieved by the
proposed higher-order-logic theorem proving based approach.
Computer algebra systems incorporate a wide variety of symbolic techniques
for the manipulation of calculus problems. Based on these capabilities, they have
been also tried in the area of optical system analysis. For example, the analysis
of planar waveguides using Mathematica [14], which is a widely used computer
algebra system, is presented in [3]. With the growing interest in optical system
analysis, a dedicated optical analysis package Optica [17] has been very recently
released for Mathematica. Optica performs symbolic modeling of optical systems,
diffraction, interference, and Gaussian beam propagation calculations and is gen-
eral enough to handle many complex optical systems in a semi-formal manner.
Computer algebra systems have also been found to be very useful for evaluating
eigenvalues for transcendental equations. This feature has been extensively used
along with the paper-and-pencil based analytical approaches. The idea here is to
verify the eigenvalue equation by hand and then feed that equation to a computer
algebra system to get the desired eigenvalues [18]. Despite all these advantages,
the analysis results from computer algebra systems cannot be termed as 100%
precise due to the many approximations and heuristics used for automation and
reducing memory constraints. Another source of inaccuracy is the presence of
unverified huge symbolic manipulation algorithms in their core, which are quite
likely to contain bugs. The proposed theorem proving based approach overcomes
these limitations but at the cost of significant user interaction.
232 O. Hasan, S.K. Afshar, and S. Tahar

3 Planar Waveguides

Planar waveguides are basically optical structures in which optical radiation

propagates in a single dimension. Planar waveguides have become the key el-
ements in the modern high speed optical networks and have been shown to
provide a very promising solution to overcome performance limitations of metal
interconnect in silicon chips.

Fig. 1. Planar Waveguide Structure

The planar waveguide, shown in Figure 1, is considered to be inﬁnite in extent

in two dimensions, lets say the yz plane, but finite in the x direction. It consists of
a thin dielectric film surrounded by materials of different refractive indices. The
refractive index of a medium is usually defined as the ratio between the phase
velocity of the light wave in a reference medium to the phase velocity in the
medium itself and is a widely used characteristic for optical devices. In Figure 1,
nc , ns , and nf represent the refractive indices of the cover region, the substrate
region, and the film, which is assumed to be of thickness h, respectively. The
refractive index profile of a planar waveguide can be summarized as follows:
⎧
⎨ nc x > 0
n(x) = nf −h < x < 0 (1)
⎩
ns x < −h

The most important concept in optical waveguides is that of total internal reflec-
tion (TIR). When a wave crosses a boundary between materials with different
refractive indices, it is usually partially refracted at the boundary surface, and
partially reflected. TIR happens when there is no refraction. Since, the objective
of waveguides is to guide waves with minimum loss, ideally we want to ensure TIR
for the waves that we want the waveguide to guide. TIR is ensured only when the
following two conditions are satisfied. Firstly, the refractive index of the trans-
mitting medium must be greater than its surroundings, nmedium > nsurrounding
and secondly, the angle of incidence of the wave at the medium is greater than
a particular angle, which is usually referred to as the critical angle. The value of
the critical angle also depends on the relative refractive index of the two materials
Formal Analysis of Optical Waveguides in HOL 233

of the boundary. Thus, the distribution of refractive indices of the waveguides

characterize the behavior of the waveguide and restricts the type of waves which
the waveguide can guide.
Like all other waveguides, the planar waveguide also needs to provide the
TIR conditions for the waves, which are required to be transmitted through
them. The first condition is satisfied by choosing nf to be greater than both ns
and nc . The second condition, on the other hand, is dependent on the angle of
incidence of the wave on the boundary of the waveguide and thus involves the
characteristics of the wave itself, which makes it more challenging to ensure.
Basically, light is an electromagnetic disturbance propagated through the field
according to electromagnetic laws. Thus, propagation of light waves through a
medium can be characterized by their electromagnetic fields. Based on Maxwell
equations [11], which completely describe the behavior of light waves, it is not
necessary to solve electromagnetic problems for each and every field component.
It is well known that for a planar waveguide, it suffices to consider two possi-
ble electric field polarizations, transverse electric (TE) or transverse magnetic
(TM) [18]. In the TE mode, the electric field is transverse to the direction of
propagation and it has no longitudinal component along the z-axis. Thus, the
y − axis component of the electric field Ey is sufficient to completely charac-
terize the planar waveguide. Similarly, in the TM mode, magnetic field has no
longitudinal components along the z-axis and solving the system only for the
y − axis component of the magnetic field Hy will provide us with the remaining
electric field components. In this paper, we focus on the TE mode, though the
TM mode can also be analyzed in a similar way.
Based on the above discussion, the electric and magnetic field amplitudes in
the TE mode for the three regions, with different refractive indices, of the planar
waveguide are given as follows [18]:
⎧ γx
⎨ Ae c x>0
Ey (x) = B cos(κf ) + C sin(κf x) −h < x < 0 (2)
⎩ γs (x+h)
De x < −h

j ∂Ey
Hz = (3)
ωμ0 ∂x
where A, B, C, and D are amplitude coefficients, γc and γs are attenuation coef-
ficients of the cover and substrate, respectively, κf is the transverse component
of the wavevector k = 2π λ in the guiding film, ω is the angular frequency of
light and μ is the permeability of the medium. Some of these parameters can be
further defined as follows:

γc = β 2 − k02 n2c (4)

γs = β 2 − k02 n2s (5)

κf = k02 n2f − β 2 (6)
234 O. Hasan, S.K. Afshar, and S. Tahar

Fig. 2. Longitudinal (β) and Transverse (κ) Components of Wavevector k

where k0 is the vacuum wavevector, such that k0 = nk with n being the refractive
index of the medium, and β and κ are the longitudinal and transverse compo-
nents of the wavevector k, respectively, inside the film, as depicted in Figure 2.
The angle θ, is the required angle of incidence of the wave.
This completes the mathematical model of the light wave in a planar waveg-
uide, which leads us back to the original question of finding the angle of incidence
θ of the wave to ensure TIR. β is the most interesting vector in this regard. It
summarizes two of the very important characteristics of a wave in a medium.
Firstly, because it is the longitudinal component of the wavevector, β contains
the information about the wavelength of the wave. Secondly, it contains the
propagation direction of the wave within the medium, which consequently gives
us the angle of incidence θ. Now, in order to ensure the second condition for TIR,
we need to find the corresponding βs. These specific values of βs are nominated
to be the eigenvalue of waveguides since they contain all the information that is
required to describe the behavior of the wave and the waveguide.
The electric and magnetic field equations (2) and (3) can be utilized along
with their well-known continuous nature [18] to verify the following useful rela-
tionship, which is usually termed as the eigenvalue equation for β.

γ + γs
tan(hκf ) = c (7)
κf 1 − γκc γ2 s
f

The good thing about this relationship is that it contains β along with all the
physical characteristics of the planar waveguide, such as refractive indices and
height. Thus, it can be used to evaluate the value of β in terms of the planar
waveguide parameters. This way, we can tune these parameters in such a way
that an appropriate value of β is attained that satisfies the second condition for
TIR, i.e., sin−1 ( λβ
2π ) < critical angle. All the values of β that satisfy the above
conditions are usually termed as the TE modes in the planar waveguide.
In this paper, we present the higher-order-logic formalization of the electric
and magnetic field equations for the planar wave guide, given in Equations (2)
and (3), respectively. Then, based on these formal definitions, we present the
formal verification of the eigenvalue equation, given in Equation (7). As out-
lined above, it is one of the most important relationships used for the analysis
of planar waveguides, which makes its formal verification in a higher-order-logic
theorem prover a significant step towards using them for conducting formal op-
tical systems analysis.
Formal Analysis of Optical Waveguides in HOL 235

4 Formalization of Electromagnetic Fields

In this section, we present the higher-order-logic formalization of the electric

and magnetic fields for a planar waveguide in the TE mode. We also verify an
expression for the magnetic field by differentiating the electric field expression.
The electric field, given in Equation (2), is a piecewise function, i.e., a function
whose values are defined differently on disjoint subsets of its domain. Reasoning
about the derivatives of piecewise functions in a theorem prover is a tedious task
as it involves rewriting based on the classical definitions of differentiation and
limit due to their domain dependant values. In order to facilitate such reasoning,
we propose to formally define piecewise linear functions in terms of the Heaviside
step function [1], which is sometimes also referred to as the unit step function.
A Heaviside step function is a discontinuous, relatively simple piecewise, real-
valued function that returns 1 for all strictly positive arguments, 0 for strictly
negative arguments and its value at point 0 is usually 12 .
⎧
⎨0 x < 0;
1
H(x) = x = 0; (8)
⎩ 2
1 0 < x.

By deﬁning piecewise functions based on the Heaviside step function, we make

their domain dependance implicit. It allows us to simply reason about the deriva-
tives of piecewise functions using the differentiation properties of sum and prod-
ucts of functions. This way, we need to reason about the derivative of only one
piecewise function, i.e., the Heaviside step function, from scratch and build upon
these results to reason about the derivatives of all kinds of piecewise functions
without utilizing the classical definitions. In this paper, we apply this approach
to reason about the derivative of the electric field expression, given in Equation
(2). The first step in this regard is to formalize the Heaviside step function as
the following higher-order-logic function.

Deﬁnition 1: Heaviside Step Function

∀ x. h step x = if x=0 then 1
2 else (if x<0 then 0 else 1)

Next, we formally verify that the derivative of h step function for all values of
its argument x, except 0, is equal to 0.

Theorem 1: Derivative of Heaviside Step Function

∀ x. ¬(x = 0) ⇒ (deriv h step x = 0)

where the HOL function deriv represents the derivative function [6] that accepts
a real-valued function f and a differentiating variable x and returns df /dx. The
proof of the above theorem is based on the classical definitions of differentiation
and limit along with some simple arithmetic reasoning.
Now, the electric field of a planar waveguide, given in Equation (2), can be
expressed in higher-order logic as the following function.
236 O. Hasan, S.K. Afshar, and S. Tahar

Deﬁnition 2: Electric Field for the√ Planar Waveguide in TE mode

∀ b k n. gamma b k n = √b2 − k2 n2
∀ b k n. kappa b k n = k2 n2 − b2
∀ A B C D n c n s n f k 0 b h x.
E field A B C D n c n s n f k 0 b h x =
A e(−(gamma b k 0 n c))x (h step x) +
(B cos((kappa b k 0 n f) x) + C sin((kappa b k 0 n f) x))
(h step (-x)) +
(D e(−(gamma b k 0 n s))(x+h) -
(B cos((kappa b k 0 n f) x) + C sin((kappa b k 0 n f) x)))
(h step (-x - h))

The function E field accepts the four amplitude coeﬃcients A, B, C and D,

the three refractive indices for the planar waveguide n c, n s and n f, corre-
sponding to the cover, substrate and the film regions, respectively, the vacuum
wave vector k 0, the longitudinal component of the wave vector b, the height of
the waveguide h and the variable x for the x-axis. It uses the function gamma to
obtain the two attenuation coefficients in the cover and substrate as (gamma b
k 0 n c) and (gamma b k 0 n s), respectively, and the function kappa to model
the transverse component of k in the guiding film as (kappa b k 0 n f). It also
utilizes the Heaviside step function h step thrice with appropriate arguments
to model the three sub domains of the piecewise electric field for the planar
waveguide, described by the above parameters, according to Equation (2). It is
important to note that, rather than having the undefined values for the bound-
aries x = 0 and x = −h, as is the case in Equation (2), our formal definition
assigns fixed values to these points. But, since we will be analyzing the ampli-
tude coefficients under the continuity of electric and magnetic fields, these point
values do not alter our results as will be seen in the next section.
Next, we formalize the magnetic field expression for the planar waveguide,
given in Equation (3), using the functional definition of the derivative, deriv,
given in [6], as follows.

Deﬁnition 3: Magnetic Field for the Planar Waveguide

∀ omega mu A B C D n c n s n f k 0 b h x.
H field omega mu A B C D n c n s n f k 0 b h x =
1
omega mu deriv (λx. E field A B C D n c n s n f k 0 b h x) x

The function H field accepts the frequency omega and the permeability of the
medium mu besides the same parameters that have been used for defining the
electric field of the planar waveguide in Definition 2. We have removed the imag-
inary unit part from the original definition, given in Equation (3), in the above
definition for simplicity as our analysis is based on the amplitudes or absolute
values of electric and magnetic fields and thus requires the real portion of the
corresponding complex numbers only. However, if need arises, the imaginary
part can be included in the analysis as well by utilizing the higher-order-logic
formalization of complex numbers [7].
Formal Analysis of Optical Waveguides in HOL 237

Deﬁnitions 2 and 3 can now be used to formally verify a relation for the
magnetic ﬁeld in a planar waveguide as follows:

Theorem 2: Expression for the Magnetic Field

∀ omega mu A B C D n c n s n f k 0 b h x.
¬(x = 0) ∧ ¬(x = -h) ⇒
H field omega mu A B C D n c n s n f k 0 b h x = omega1 mu (
(-(gamma b k 0 n c)) A e(−(gamma b k 0 n s))x (h step x) +
(kappa b k 0 n f)
(-B sin((kappa b k 0 n f) x) + C cos((kappa b k 0 n f) x))
(h step (-x)) +
(gamma b k 0 n s) (D e((gamma b k 0 n s))(x+h) -
(kappa b k 0 n f)
(-B sin((kappa b k 0 n f) x) + C cos((kappa b k 0 n f) x)))
(h step (-x - h)))

This theorem can be verified by proving the derivatives of the three expressions
found in the definition of the electric field and the derivative of the Heaviside
step function, given in Theorem 2, along with basic differentiation properties of
a product and sum of functions, formally verified in [6].

5 Veriﬁcation of the Eigenvalue Equation

In this section, we build upon the formal definitions of electromagnetic field
relations, formalized in the previous section, to formally verify the eigenvalue
equation for the planar waveguide in the TE mode, given in Equation (7).
The main idea behind our analysis is to leverage upon the continuous nature
of the electric and magnetic field functions. Like all other continuous functions,
a continuous piecewise function f also approaches the value f (x0) at any point
x = x0 in its domain. This condition, when applied to the boundary points
x = 0 and x = −h of our piecewise functions for the electric and magnetic fields,
E field and H field, respectively, yields very interesting results that allow us
to express the amplitude coefficients B, C and D in terms of the amplitude
coefficient A and then finally utilize these relationships to verify Equation (7).
Due to the piecewise nature of our electric and magnetic field functions, the
above reasoning is based on the mathematical concept of right and left hand
limits, sometime referred to as one-sided limits, which are the limits of a real-
valued function taken as a point in their domain is approached from the right
and from the left hand side of the real axis, respectively [2]. Therefore, the first
step towards the formal verification of Equation (7) is the formalization of right
hand limit in higher-order logic using its classical definition as follows:

Deﬁnition 4: Limit from the Right

∀ f y0 x0. right lim f y0 x0 =
∀ e. 0 < e ⇒ ∃d. 0 < d ∧
∀x. 0 < x - x0 ∧ x - x0 < d ⇒ abs(f x - y0) < e
238 O. Hasan, S.K. Afshar, and S. Tahar

The abs function is the HOL function for the absolute value of a real number.
According to the above definition, the limit of a real valued function f (x), as x
tends to x0 from the right is y0, if for all strictly positive values e, there exists a
number d such that for all x satisfying x0 < x < x0 + d, we have |f (x) − y0| < e.
Similarly, the left hand limit can be formalized as follows:
Definition 5: Limit from the Left
∀ f y0 x0. left lim f y0 x0 =
∀ e. 0 < e ⇒ ∃d. 0 < d ∧
∀x. -d < x - x0 ∧ x - x0 < 0 ⇒ abs(f x - y0) < e
If the normal limit of a function exists at a point and is equal to y0 then both
the right and left limits for that function are also well-defined for the same point
and are both equal to y0. This is an important result for our analysis and thus
we formally verify it in the HOL theorem prover as the following theorem.
Theorem 4: Limit Implies Limit from the Right and Left
∀ f y0 x0. (f→y0)x0 ⇒ right lim f y0 x0 ∧ left lim f y0 x0
The assumption of the above theorem (f → y0)x0 represents the formalization of
the normal limit of a function [6] and is True only if the function f approaches
y0 at point x = x0. The proof of Theorem 4 is basically a re-writing of the
definitions involved along with the properties of the absolute function. We also
verified the uniqueness of both right and left hand limits as follows.
Theorem 5: Limit from the Right is Unique
∀f y1 y2 x0. right lim f y1 x0 ∧ right lim f y2 x0 ⇒(y1=y2)
Theorem 6: Limit from the Left is Unique
∀f y1 y2 x0. left lim f y1 x0 ∧ left lim f y2 x0 ⇒(y1=y2)
The proof of Theorem 5 is by contradiction, as it is not possible that a real-valued
function gets as near as possible to two unequal points in its range for the same
argument. We proceed with the proof by first assuming that ¬(y1 = y2) and
then rewriting the statement of Theorem 5 with the definition of the function
right lim. Next, the two assumptions are specialized for e = |y1−y2|2 case. Now,
the same x is chosen for both the assumptions in such a way that the conditions
on x, i.e., x0 < x < x0 + d, for both of the assumptions are satisfied. One
such x is min 2d1 d2 + x0, where d1 and d2 are the d s for the two assumptions,
respectively, and the function min returns the minimum value out of its two real
number arguments. Thus, for such an x, the two given assumptions imply that
|f x − y1| < |y1−y2|
2 and |f x − y2| < |y1−y2|
2 , which leads to a contradiction in
both of the cases when y1 < y2 and y2 < y1. Hence, our assumption ¬(y1 =
y2) cannot be True and y1 must be equal to y2, which concludes the proof of
Theorem 5. Theorem 6 is also verified using similar reasoning.
The above infrastructure can now be utilized to formally verify the mathemat-
ical relationships between the amplitude coefficients. The relationship between
the amplitude coefficients B and A can be formally stated as follows:
Formal Analysis of Optical Waveguides in HOL 239

Theorem 7: B = A
∀ A B C D n c n s n f k 0 b h x. 0 < h ∧
(∀x. (λx. E field A B C D n c n s n f k 0 b h x) contl x)
⇒ (B = A)

The first assumption ensures that h is always greater than 0 and is valid since h
represents the height of the waveguide. Whereas, the HOL predicate (f contl
x) [6], used in the above theorem, represents the relational form of a continuous
function definition, which is True when the limit of the real-valued function f
exists for all points x on the real line and is equal to f (x). Thus, the corre-
sponding assumption, in the above theorem, ensures that the function E field
is continuous on the x − axis and its limit at the boundary points x = 0 and
x = −h is equal to the value of the function E field at x = 0 and x = −h.
In order to verify Theorem 7, consider the boundary point x = 0, for which
the value of the function E field becomes A+B2 , according to Definition 2. Now,
based on Theorem 5, the limit from the right at x = 0 for the function E field
is also going to be A+B2 . Next, we verified, using Definition 4 along with the
properties of the exponential function [6], that the limit from the right for the
function E field at point x = 0 is in fact equal to A. The uniqueness of the
right limit property, verified in Theorem 5, can now be used to verify that A
must be equal to A+B 2 as they both represent the limit from the right for the
same function at the same point. This result can be easily used to discharge our
proof goal A = B, which concludes the proof for Theorem 7.
Next, we apply similar reasoning as above with the magnetic field relation
for the planar waveguide, verified in Theorem 2, at point x = 0 to verify the
following relationship between the amplitude coefficients C and A.

Theorem 8: C = −A κγfc
∀ omega mu A B C D n c n s n f k 0 b h x.(0 < h)∧(0 < mu)∧
(0 < omega)∧(b < k 0 n f)∧(k 0 n s < b)∧(0 < n s)∧(0 < k 0)∧
(∀x.λx.H field omega mu A B C D n c n s n f k 0 b h x) contl x)
(gamma b k 0 n c)
⇒ (C = −A (kappa b k 0 n f) )

The additional assumptions besides, 0 < h, used in the above theorem, ensure
that the values of the functions gamma and kappa are positive real numbers and
do not attain an imaginary complex number value, according to their definitions,
given in Section 3. Again based on the continuity of the magnetic field H field
assumption, we know that its limit at point 0 is equal to the value of H field at
x = 0, say H0 . It is important to note that the value of H0 cannot be obtained
from the expression for the H field, given in Theorem 2. Therefore, we cannot
reason about its precise value but based on the continuity of H field, we do know
that it exists. This implies that the limit from right and left for this function
would be also equal to H0 , according to Theorem 4. Next, we verified that limits
from right and left for the magnetic field function H field, given in Theorem 2,
at point x = 0 are −A(gamma b k 0 n c)
om mu and C(kappa b k 0 n f)
om mu using Definitions 4
and 5, respectively. This leads to the verification of Theorem 8, since we already
240 O. Hasan, S.K. Afshar, and S. Tahar

know that these two limit values are equal to H0 , using the uniqueness of limits
from right and left, verified in Theorems 5 and 6.
Now, using similar reasoning as above and applying continuity of E field
and H field at x = −h, we verified the following two relations to express the
amplitude coefficient D in terms of the amplitude coefficients B and C.

Theorem 9: D = B cos(κf h) − C sin(κf h)

∀ A B C D n c n s n f k 0 b h x. 0 < h ∧
(∀x. (λx. E field A B C D n c n s n f k 0 b h x) contl x)
⇒ (D = B(cos((kappa b k 0 n f) h)) - C(sin((kappa b k 0 n f) h)))
B sin(κ h)+C cos(κ h)
f f
Theorem 10: D = κf γs
∀ omega mu A B C D n c n s n f k 0 b h x. (0 < h) ∧
(0 < mu)∧(0 < omega)∧(k 0 n s < b)∧(0 < n s)∧(0 < k 0) ∧
(∀x.(λx.H field omega mu A B C D n c n s n f k 0 b h x)contl x)
⇒(D=(kappa b k 0 n f) (B(sin((kappa b k 0 n(gamma
f) h))+C(cos((kappa b k 0 n f) h))))
b k 0 n s)

The above theorems allows us to reach an alternate expression for E field in

terms of only A, which is the amplitude of the electric field at x = 0. This
relationship is very useful for plotting the mode profiles of guided modes [18].
The right-hand sides of the conclusions of Theorems 9 and 10 can now equated
together, since both are equal to D, and the amplitudes coefficients B and C
can be expressed in terms of A, using Theorems 7 and 8, respectively, to for-
mally verify the desired relationship for evaluating the eigenvalues of the planar
waveguide, given in Equation (7), as the following theorem.

Theorem 11: Eigenvalue Equation

∀ omega mu A B C D n c n s n f k 0 b h x. (0 < A) ∧
(0 < h) ∧ (0 < mu) ∧ (0 < omega) ∧
(b < k 0 n f) ∧ (k 0 n s < b) ∧ (0 < n s) ∧ (0 < k 0) ∧
¬((kappa b k 0 n f)2 = (gamma b k 0 n c) (gamma b k 0 n s)) ∧
(∀x.(λx.E field A B C D n c n s n f k 0 b h x)contl x) ∧
(∀x.(λx.H
field omega mu A B C D n c n s n f k 0 b h x)contl x)
⇒ tan((kappa b k 0 n f) h) =
(gamma b k 0 n c)+(gamma b k 0 n s)

(kappa b k 0 n f) 1− (gamma b k 0 n c)(gamma b k 0
(kappa b k 0 n f)2
n s)

Due to the inherent soundness of the theorem proving approach, our verifica-
tion results exactly matched the paper-and-pencil analysis counterparts for the
eigenvalue equation, as conducted in [18], and thus can be termed as 100%
precise. Interestingly, the assumption ¬((kappa b k 0 n f)2 = (gamma b k 0
n c) (gamma b k 0 n s)), without which the eigenvalues are undefined, was
found to be missing in [18]. This fact clearly demonstrates the strength of for-
mal methods based analysis as it allowed us to highlight this corner case, which
if ignored could lead to the invalidation of the whole eigenvalue analysis.
The verification results, given in this section, heavily relied upon real analysis
and thus the useful theorems available in the HOL real analysis theories [6]
Formal Analysis of Optical Waveguides in HOL 241

proved to be a great asset in this exercise. The veriﬁcation task took around
2500 lines of HOL code and approximately 100 man-hours.

6 Application: Planar Asymmetric Waveguide

In this section, we demonstrate the eﬀectiveness of Theorem 11 in analyzing

the eigenvalues of a planar asymmetric waveguide [18]. The waveguide is char-
acterized by a guiding index nf of 1.50, the substrate index ns of 1.45 and the
cover index nc of 1.40. The thickness of the guiding layer h is 5μm. The goal
is to determine the allowable values of β for this structure, assuming that the
wavelength λ of 1μm is used to excite the waveguide.
In order to obtain the allowable values of β from Theorem 11, we rewrite it
with the deﬁnition of the function gamma, replace the term (kappa b k 0 n f)
with k f and
express the variable b, which represents β in our Theorems, in terms
of k f as (k 0)2 (n f)2 − (k f)2 , using the deﬁnition of kappa, to obtain the
following alternate relationship.

Theorem 12: Alternate form of Eigenvalue Equation

∀ omega mu A B C D n c n s n f k 0 b h x. (0 < A) ∧
(0 < h) ∧ (0 < mu) ∧ (0 < omega) ∧
(b < k 0 n f) ∧ (k 0 n s < b) ∧ (0 < n s) ∧ (0 < k 0) ∧
¬((kappa b k 0 n f)2 = (gamma b k 0 n c) (gamma b k 0 n s)) ∧
(∀x.(λx.E field A B C D n c n s n f k 0 b h x)contl x) ∧
(∀x.(λx.H
field omega mu A B C D n c n s n f k 0 b h x)contl x)
⇒ tan(k f h) =
√ √
((k 0)2 (n f)2 − (k f)2 )−(k 0)2 (n c)2 + ((k 0)2 (n f)2 − (k f)2 )−(k 0)2 (n s)2
√ √
( ((k 0)2 (n f)2 − (k f)2 )−(k 0)2 (n c)2 )( ((k 0)2 (n f)2 − (k f)2 )−(k 0)2 (n s)2 )
k f 1− (k f)2

All the quantities in the conclusion of the above theorem are known except k f,
since k 0 can be expressed in terms of the wavelength that is used to excite
the waveguide, as outlined in Section 3. Though, getting a closed form solution
for k f is not possible from the above equation. Therefore, we propose to use a
computer algebra system to solve for the value of k f. Using Mathematica, the
first four eigenvalues of k f were found to be 5497.16, 10963.2, 16351 and 21545
cm−1 . These values can then be used to calculate
the desired eigenvalues for b
according to the following relationship b = (k 0)2 (n f)2 − (k f)2 , and were
found to be 94087, 93608, 92819 and 91752 cm−1 .
Hypothetically the above analysis can be divided into two parts. The first
part covers the analysis starting from the electromagnetic wave equations, with
the given parameters, up to the point where we obtain the alternate form of
eigenvalue equation, given in Theorem 12. The second part is concerned with
the actual computation of eigenvalues from Theorem 12. The first part of the
above analysis was completely formal and thus 100% precise, since it was done
using the HOL theorem prover. The proof script for this theorem was less than
100 lines long, which clearly demonstrates the effectiveness of our work, as it was
242 O. Hasan, S.K. Afshar, and S. Tahar

mainly due the the availability of Theorem 11 that we were able to tackle this
kind of a verification problem with such a minimal effort. The second part of the
analysis cannot be handled in HOL, because of the involvement of a transcen-
dental equation for which a closed form solution for k f does not exist. For this
part, we utilized Mathematica and obtained the desired eigenvalues. To the best
of our knowledge, no other approach based on simulation, numerical methods
or computer algebra systems, can provide 100% precision and soundness in the
results like the proposed approach for the first part of the analysis. Whereas,
in the second part, we have used a computer algebra system, which is the best
option available, in terms of precision, for this kind of analysis. Other approaches
used for the second part include graphical or numerical methods, which cannot
compete with computer algebra systems in precision. Thus, as far as the whole
analysis is concerned, the proposed method offers the most precise solution.

7 Conclusions
This paper presents the formal analysis of planar optical waveguides using a
higher-order-logic theorem prover. Planar optical waveguides are simple, yet
widely used optical structures and not only find their applications in wave guid-
ing, but also in coupling, switching, splitting, multiplexing and de-multiplexing
of optical signals. Hence, their formal analysis paves the way to the formal analy-
sis of many other optical systems as well. Since the analysis is done in a theorem
prover, the results can be termed as 100% precise, which is a novelty that cannot
be achieved by any other computer based optical analysis framework.
We mainly present the formalization of the electromagnetic field equations
for a planar waveguide in the TE mode. These definitions are then utilized to
formally reason about the eigenvalue equation, which plays a vital role in the
design of planar waveguides for various engineering and other scientific domains.
To illustrate the effectiveness and utilization of the formally verified eigenvalue
equation, we used to reason about the eigenvalues of a planar asymmetric waveg-
uide. To the best of our knowledge, this is the first time that a formal approach
has been proposed for the analysis of optical systems.
The successful handling of the planar waveguide analysis clearly demonstrates
the effectiveness and applicability of higher-order-logic theorem proving for an-
alyzing optical systems. Some of the interesting future directions in this novel
domain include the verification of the eigenvalue equation for the planar waveg-
uide in the TM mode, which is very similar to the analysis presented in this
paper, and the analysis of couplers that represent two or more optical devices
linked together with an optical coupling relation, which can be done by building
on top of the results presented in this paper along with formalizing the couple
mode theory [8] in higher-order logic. Besides these, many saftey-critical pla-
nar waveguide applications can be formally analyzed including biosensors [23]
or medical imaging [15] by building on top of our results.
Formal Analysis of Optical Waveguides in HOL 243

References
1. Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions with Formu-
las, Graphs, and Mathematical Tables. Dover, New York (1972)
2. Anderson, J.A.: Real Analysis. Gordon and Breach Science Publishers, Reading
(1969)
3. Costa, J., Pereira, D., Giarola, A.J.: Analysis of Optical Waveguides using Math-
ematica. In: Microwave and Optoelectronics Conference, pp. 91–95 (1997)
4. Gordon, M.J.C., Melham, T.F.: Introduction to HOL: A Theorem Proving Envi-
ronment for Higher-Order Logic. Cambridge Press, Cambridge (1993)
5. Hafner, C.: The Generalized Multipole Technique for Computational Electromag-
netics. Artech House, Boston (1990)
6. Harrison, J.: Theorem Proving with the Real Numbers. Springer, Heidelberg (1998)
7. Harrison, J.: Formalizing Basic Complex Analysis. In: From Insight to Proof:
Festschrift in Honour of Andrzej Trybulec. Studies in Logic, Grammar and
Rhetoric, vol. 10, pp. 151–165. University of Bialystok (2007)
8. Haus, H., Huang, W., Kawakami, S., Whitaker, N.: Coupled-mode Theory of Op-
tical Waveguides. Lightwave Technology 5(1), 16–23 (1987)
9. Hayes, P.R., O’Keefe, M.T., Woodward, P.R., Gopinath, A.: Higher-order-compact
Time Domain Numerical Simulation of Optical Waveguides. Optical and Quantum
Electronics 31(9-10), 813–826 (1999)
10. Heinbockel, J.H.: Numerical Methods For Scientific Computing. Trafford (2004)
11. Jackson, J.D.: Classical Electrodynamics. John Wiley & Sons, Inc., Chichester
(1998)
12. Johnson, S.G., Joannopoulos, J.D.: Block-iterative Frequency Domain Methods for
Maxwell’s Equations in a Planewave Basis. Optics Express 8(3), 173–190 (2001)
13. Liu, Y., Sarris, C.D.: Fast Time-Domain Simulation of Optical Waveguide Struc-
tures with a Multilevel Dynamically Adaptive Mesh Refinement FDTD Approach.
Journal of Lightwave Technology 24(8), 3235–3247 (2006)
14. Mathematica (2009), http://www.wolfram.com
15. Moore, E.D., Sullivan, A.C., McLeod, R.: Three-dimensional Waveguide Arrays
via Projection Lithography into a Moving Photopolymer. Organic 3D Photonics
Materials and Devices II 7053, 309–316 (2008)
16. Ntogari, G., Tsipouridou, D., Kriezis, E.E.: A Numerical Study of Optical Switches
and Modulators based on Ferroelectric Liquid Crystals. Journal of Optics A: Pure
and Applied Optics 7(1), 82–87 (2005)
17. Optica (2009), http://www.opticasoftware.com/
18. Pollock, C.R.: Fundamentals of Optoelectronics. Tom Casson (1995)
19. Rumpf, R.C.: Design and Optimization of Nano-Optical Elements by Coupling
Fabrication to Optical Behavior. PhD thesis, University of Central Florida, Or-
lando, Florida (2006)
20. Schmidt, F., Zschiedrich, L.: Adaptive Numerical Methods for Problems of Inte-
grated Optics. In: Integrated Optics: Devices, Materials, and Technologies VII,
vol. 4987, pp. 83–94 (2003)
21. Yee, K.: Numerical Solution of Inital Boundary Value Problems involving Maxwell
Equations in Isotropic Media. IEEE Transactions on Antennas and Propaga-
tion 14(3), 302–307 (1966)
22. Yin, L., Hong, W.: Domain Decomposition Method: A Direct Solution of Maxwell
Equations. In: Antennas and Propagation, pp. 1290–1293 (1999)
23. Zhian, L., Wang, Y., Allbritton, N., Li, G.P., Bachman, M.: Labelfree Biosensor
by Protein Grating Coupler on Planar Optical Waveguides. Optics Letters 33(15),
1735–1737 (2008)
The HOL-Omega Logic

Peter V. Homeier

U. S. Department of Defense
[email protected]
http://www.trustworthytools.com

Abstract. A new logic is posited for the widely used HOL theorem
prover, as an extension of the existing higher order logic of the HOL4
system. The logic is extended to three levels, adding kinds to the existing
levels of types and terms. New types include type operator variables and
universal types as in System F . Impredicativity is avoided through the
stratiﬁcation of types by ranks according to the depth of universal types.
The new system, called HOL-Omega or HOLω , is a merging of HOL4,
HOL2P[11], and major aspects of System Fω from chapter 30 of [10].
This document presents the abstract syntax and semantics for the kinds,
types, and terms of the logic, as well as the new fundamental axioms
and rules of inference. As the new logic is constructed according to the
design principles of the LCF approach, the soundness of the entire system
depends critically and solely on the soundness of this core.

1 Introduction

The HOL theorem prover [3] has had a wide influence in the field of mechanical
theorem proving. Despite appearing in 1988 as one of the first tools in the field,
HOL has enjoyed wide acceptance around the world, and continues to be used
for many substantial projects, for example Anthony Fox’s model of the ARM
processor. HOL’s influence is seen in that three other major theorem provers,
HOL Light, ProofPower, and Isabelle/HOL, have used essentially the same logic.
One of the main reasons for HOL’s influence has been that the actual logic
implemented in the tool, higher order logic based on Church’s simple theory of
types, turns out to be both easy to work with and expressive enough to be able
to support most models of hardware and software that people have wished to
investigate. There are theorem provers with more powerful logics, and ones with
less powerful logics, but it seems that classical higher order logic fortuitously
found a “sweet-spot,” balancing strong expressivity with nimble ease of use.
However, despite HOL’s value, it has been recognized that there are some
useful concepts beyond the power of higher order logic to state. An example is
the practical device of monads. Monads are particularly useful in modelling, for
example, realistic computations involving state or exceptions, as a shallow em-
bedding in a logic which itself is strictly functional, without state or exceptions.
Individual monads can and have been expressed in HOL, and used to reduce
the complexity of proofs about such real-world computations.

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 244–259, 2009.
The HOL-Omega Logic 245

However, stating the general properties of all monads, and proving results
about the class of all monads, has not been possible. The following shows why.
Let M be a postfix unary type operator that maps a type α to a type α M ,
unit a prefix unary term operator of type α → α M , and #= an infix binary term
operator of type α M → (α → β M ) → β M , where k a #= h is (k a) #= h.
Then M together with unit and #= is a monad iff the following properties hold:
left unit: unit a #= k = k a
right unit: m #= unit = m
associativity: m #= (λa. k a #= h) = (m #= k) #= h
There are two problems with this definition in higher order logic. First, while
higher order logic includes type operator constants like list and option, it does
not support type operator variables like M above.
But even if it did, consider the associativity property above. There are four
occurrences of #= in that property. Among these four instances are three dis-
tinct types. Unfortunately, in higher order logic, within a single expression a
variable may only have a single type. So this property would not type-check.
This is annoying because if #= were a constant instead of a variable, these
different instances of its basic type would be supported. What we need is a way
to give #= a single type which can then be specialized for each of #=’s four
instances to produce the three distinct types required.
One way is to introduce universal types, as in System F [10]. A universal type
is written ∀α.σ, where α is a type variable and σ is a type expression, possibly
including α. Such occurrences of α are bound by the universal quantification.
In addition, System F introduces abstractions of types over terms, written as
λ:α.t, where α is a type variable and t is a term. This yields a term, whose type
is a universal type. Specifically, if t has type σ, then λ:α.t has type ∀α.σ.
Given such an abstraction t, it is specialized for a particular type by t[:σ:].
This gives rise to a new form of beta-reduction on term-type applications, where
(λ:α.t)[:σ:] reduces to t[σ/α]. For convenience, we write t[:α, β:] for (t[:α:])[:β:].
Given these new forms, we can express the types of unit and #= as
unit : ∀α. α → α M
#= : ∀α β. α M → (α → β M ) → β M
and the three monad properties as
unit [:α:] a (#=[:α, β:]) k = k a
m (#=[:α, α:]) (unit[:α:]) = m
m (#=[:α, γ:]) (λa. k a (#=[:β, γ:]) h) = (m (#=[:α, β:]) k) (#=[:β, γ:]) h
What we have done here is take manual control of the typing. Since the normal
HOL parametric polymorphism was inadequate, we have added facilities for type
abstraction and instantiation of terms. This allows the single type of a variable
to be specialized for different occurrences within the same expression.
Given the existing polymorphism in HOL, in practice universal types are
needed only rarely; but when they are needed, they are absolutely essential.
246 P.V. Homeier

In related work, as early as 1993 Tom Melham advocated adding quantiﬁcation

over type variables [8]. HOL-Omega includes such quantification, defining it
using abstraction over type variables. Norbert Völker’s HOL2P [11], a direct
ancestor of this work, supports universal types quantifying over types of rank 0.
HOL2P is approximately the same as HOL-Omega, but without kinds, curried
type operators, or ranks > 1. Benjamin C. Pierce [10] describes a variety of
programming languages with advanced type systems. HOL-Omega is similar to
his system Fω of chapter 30, but avoids Fω ’s impredicativity. HOL-Omega does
not include dependent types, such as found in the calculus of constructions.
In the remainder of this paper, we describe the core logic of the HOL-Omega
system, and some additions to the core. In Section 2, we present the abstract
syntax of HOL-Omega. Section 3 describes the set-theoretic semantics for the
logic. Section 4 gives the new core rules of inference and axioms. Section 5
covers additional type and term definitions on top of the core. Section 6 presents
a number of examples using the expanded logic, and in Section 7 we conclude.

2 Syntax of the HOL-Omega Logic

For reasons of space, we assume the reader is familiar with the types, terms,
axioms, and rules of inference of the HOL logic, as described in [3,4,5,6]. This
section presents the abstract syntax of the new HOL-Omega logic.
In HOL-Omega, the syntax consists of ranks, kinds, types, and terms.

2.1 Ranks

Ranks are natural numbers indicating the depth of universal type quantiﬁcation
present or permitted in a type. We use the variable r to range over ranks.

rank ::= natural

The purpose of ranks is to avoid impredicativity, which is inconsistent with

HOL [2]. However, a naı̈ve interpretation has been found to be too constrictive.
For example, the HOL identity operator I has type α → α, where α has rank
0. However, it is entirely natural to expect to apply I to values of higher ranks,
and to expect I to function as the identity function on those higher-rank values.
To have an infinite set of identity functions, one for each rank, would be absurd.
Inspired by new set theory, John Matthews suggested the idea of considering
all ranks as being formed as a sum of a variable and a natural, where there is
only one rank variable, z, ranging over naturals. This reflects the intuition that
if a mathematical development was properly constructed at one rank, it could as
easily have been constructed at the rank one higher, consistently at each step of
the development. Only one rank variable is necessary to capture this intuition,
representing the finite number of ranks that the entire development is promoted.
If there is only one rank variable, it may be understood to be always present
without being explicitly modeled. Thus rank 2 signifies z +2. A rank substitution
The HOL-Omega Logic 247

θr indicates the single mapping z → z + θr , so applying θr to z + n yields

z + (n + θr ). A rank r is an instance of r if r = r[θr ] for some θr , i.e., if r ≥ r.

2.2 Kinds
HOL-Omega introduces kinds as a new level in the logic, not present in HOL.
Kinds control the proper formation of types just as types do for terms.
There are three varieties of kinds, namely the base kind (the kind of proper
types), kind variables, and arrow kinds (the kinds of type operators).

kind ::= ty (base kind)

| κ (kind variable)
| k1 ⇒ k2 (arrow kind)
We use the variable k to range over kinds, and κ to range over kind variables.
The arrow kind k1 ⇒ k2 has domain k1 and range k2 . Arrow kinds are also called
higher kinds, meaning higher than the base kind. A kind k is an instance of k
if k = k[θk ] for some substitution θk , a mapping from kind variables to kinds.

2.3 Types
Replacing HOL’s two varieties of types, HOL-Omega has ﬁve: type variables,
type constants, type applications, type abstractions, and universal types.

type-variable ::= name × kind × rank

type-constant ::= name × kind × rank (instance of kind in env.)

type ::= α (type-variable)

| τ (type-constant)
| σarg σopr (type application, postﬁx syntax)
| λα. σ (type abstraction)
| ∀α. σ (universal type)
We will use α to range over type variables, τ to range over type constants,
and σ to range over types. Type constants must have kinds which are instances
of the environment’s kind for that type constant name.

σopr : k1 ⇒k2 , σarg : k1 α : k1 , σ : k2

Kinding: α : kind of α σarg σopr : k2 λα.σ : k1 ⇒k2
σ : k α : k, σ : ty
τ : kind of τ ∀α.σ : ty

σopr :≤ r2 , σarg :≤ r1 α :≤ r1 , σ :≤ r2
Ranking: α :≤ rank of α σarg σopr :≤ max(r1 ,r2 ) λα.σ :≤ max(r1 ,r2 )
σ :≤ r σ :≤ r, r ≤ r α :≤ r1 , σ :≤ r2
τ :≤ rank of τ σ :≤ r ∀α.σ :≤ max(r1 +1,r2 )

topr : σ1 →σ2 , targ : σ1 x : σ1 , t : σ2

Typing: x : type of x topr targ : σ2 λx.t : σ1 →σ2
t : σ t : ∀α:k:≤r. σ , σ : k :≤ r α : k :≤ r, t : σ
c : type of c t [:σ:] : σ [σ/α] λ:α.t : ∀α.σ
248 P.V. Homeier

Existing types of HOL are fully supported in HOL-Omega. HOL type variables
are represented as HOL-Omega type variables of kind ty and rank 0. HOL type
applications of a type constant to a list of type arguments are represented in
HOL-Omega as a curried type constant applied to the arguments in sequence,
as (α1 , ..., αn )τ = αn (... (α1 τ )...).
We write σ : k :≤ r to say that type σ has kind k and rank r.
Proper types are types of kind ty; only these types can be the type of a term.
In a type application of a type operator to an argument, the operator must
have an arrow kind, and the domain of the kind of the operator must equal the
kind of the argument. If so, the kind of the result of the type application will be
the range of the kind of the operator. Also, the body of a universal type must
have the base kind. These restrictions ensure types are well-kinded.
In both universal types and type abstractions, the type variable is bound
over the type body. This binding structure introduces the notions of alpha and
beta equivalence, as direct analogs of the corresponding notions for terms. In
fact, types are identified up to alpha-beta equivalence. The following denote the
same type: λα.α, λβ.β, λβ.β(λα.α), γ(λα.λβ.β). Beta reduction is of the form
σ2 (λα.σ1 ) = σ1 [σ2 /α], where σ1 [σ2 /α] is the result of substituting σ2 for all free
occurrences of α in σ1 , with bound type variables in σ1 renamed as necessary.
A type σ is an instance of σ if σ =σ[θr ][θk ][θσ ] for some rank, kind, and type
substitutions θr ∈ N, θk mapping kind variables to kinds, and θσ mapping type
variables to types. The substitutions are applied in sequence, with θr first.
When matching two types, the matching is higher order, so the pattern
α → α μ (where μ : ty ⇒ ty) matches β → β, yielding [α → β, μ → λα.α].
The primeval environment contains the type constants bool, ind, and fun as
in HOL, where bool : ty, ind : ty, and fun : ty ⇒ ty ⇒ ty, and all three have
rank 0. fun is usually written as the binary infix type operator →, and for a
function type σ1 → σ2 , we say that the domain is σ1 and the range is σ2 . Also,
for a universal type ∀α.σ, we say that the domain is α and the range is σ.

2.4 Terms
HOL-Omega adds to the existing four varieties of terms two new varieties,
namely term-type applications and type-term abstractions. We use x to range
over term variables, c over term constants, and t over terms.

variable ::= name × type

constant ::= name × type (an instance of type stored in environment)

term ::= x (variable)

In applications t1 t2 , the domain of the type of t1 must equal the type of t2 .

As in System F, in abstractions of a type variable over a term λ:α.t, the type
variable α must not occur freely in the type of any free variable of the term t.
There are three important restrictions on term-type applications (t [: σ :]).
1. The type of the term t must be a universal type, say ∀α.σ .
2. The kind of α must match the kind of the type argument σ.
3. The rank of α must contain (≥) the rank of the type argument σ.
The ﬁrst and second restrictions ensure terms are well-typed and well-kinded.
The third restriction is necessary to avoid impredicativity, for a simpler set-
theoretic model. This restriction means that the type argument is validly one of
the types over which the universal type quantiﬁes. On this key restriction, the
consistency of HOL-Omega rests.

3 Semantics of the HOL-Omega Logic

3.1 A Universe for HOL-Omega Kinds, Types, and Terms
We give the ZFC semantics of HOL-Omega kinds, types, and terms in terms of a
universe U, which is fixed set of sets of sets. This development draws heavily from
Pitts[6] and Völker[11]. We construct U as a result of first constructing sequences
of sets U0 , U1 , U2 , ..., and T0 , T1 , T2 , ..., where Ui and Ti will only involve types of
rank ≤ i. Kinds will be modeled as elements K of U, types will be modeled as
elements T of K ∈ U, and terms will be modeled as elements E of T ∈ T ∈ U.
There exist Ui and Ti for i = 0, 1, 2, ..., satisfying the following properties:
Inhab. Each element of Ui is a non-empty set of non-empty sets.
Typ. Ui contains a distinguished element Ti .
Arrow. If K ∈ Ui and L ∈ Ui , then K→L ∈ Ui , where X→Y is the set-theoretic
(total) function space from the set X to the set Y .
Clos. Ui has no elements except those by Typ or Arrow.
Ext. Ti+1 extends Ti : Ti ⊆ Ti+1 .
Sub. If X ∈ Ti and ∅ = Y ⊆ X, then Y ∈ Ti .
Fun. If X ∈ Ti and Y ∈ Ti , then X→Y ∈ T i.
Univ. If K ∈ Ui and f : K → Ti+1 , then X∈K f X ∈Ti+1 . The set theoretic
product X∈K f X is the set of all functions g : K → X∈K f X such that for
all X ∈ K, gX ∈ f X.
Bool. T0 contains a distinguished 2-element set B = {true, false}.
Infty. T0 contains a distinguished
infinite set I.
AllTyp. T is defined to be i∈N Ti .
AllArr. U is the closure of {T } under set theoretic function space creation.

Choice. There are distinguished elements chtyi ∈ K∈Ui K and ch ∈ X∈T X.
For all i and for all K ∈ Ui , K is nonempty and chtyi (K) ∈ K is an example of
this, and for all X ∈ T , X is nonempty and ch(X) ∈ X is an example of this.
The system consisting of the above properties is consistent. The following con-
struction is from William Schneeburger. Let Ui be the closure of {Ti } under
250 P.V. Homeier

Arrow. Given Ti and Ui , we can construct Ti+1 by iteration over the ordi-
nals [9]. Let S0 = Ti . For all ordinals α, let Sα+1 be the closure under Sub and
Fun of
Sα { X∈K f X | K ∈ Ui ∧ f : K → Sα }.

For limit ordinals
λ, let Sλ = α<λ Sα , which is closed under Sub and Fun.
Let n = | K∈Ui K|. Then |K| ≤ n for all K ∈ Ui . Let m = n+ , the least
cardinal > n. Then m is a regular cardinal [9, p. 146] > |K| for all K ∈ Ui . Then
we define Ti+1 = Sm , which is sufficiently large by the following theorem.
Theorem 1. Sm is closed under Univ (as well as Sub and Fun).

Proof. Suppose K ∈ Ui and f : K → Sm . Sm = α<m Sα , so for each X∈K
define γX = the smallest α s.t. f X ∈ Sα , thus γX < m. Define Γ = {γX |X∈K}.
Then Γ ⊆ m, and |K| < m so |Γ | < m thus Γ < m since m is regular. The
image of f ⊆ S Γ , so by the definition of Sα+1 , X∈K f X ∈ S( Γ )+1 ⊆ Sm .

3.2 Constraining Kinds and Types to a Particular Rank

The function ⇓r transforms an element K of U into an element of Ur :
T ⇓r = Tr
(K1 → K2 )⇓r = K1 ⇓r → K2 ⇓r
We need to map some elements T ∈ K ∈ U down to the corresponding
elements in K⇓r ∈ Ur , when T is consistent with a type of rank r. Not all T can
be so mapped; we deﬁne the subset of K that can, and the mapping, as follows.
We deﬁne the subset K|r ⊆ K ∈ U as the elements consistent with rank r,
and the function ↓r which transforms an element T of K|r into one of K⇓r,
mutually recursively on the structure of K:
T |r = Tr
(K1 → K2 )|r = {f | f ∈ K1 → K2 ∧
∀(x, y) ∈ f. (x ∈ K1 |r ⇒ y ∈ K2 |r) ∧
f ↓r is a function}

If T ∈ T |r, then T ↓r = T
If T ∈ (K1 → K2 )|r, then T ↓r = {(x↓r, y↓r) | (x, y) ∈ T ∧ x ∈ K1 |r }
If K = K1 → K2 , by the deﬁnition of T ↓r, T ↓r ⊆ K1 ⇓r×K2 ⇓r, and by T ∈ K|r,
T ↓r is a function, so T ↓r ∈ K1 ⇓r → K2 ⇓r = (K1 → K2 )⇓r = K⇓r.
We can deﬁne ⇑r : Ur →U and ↑r : K⇓r→K|r as the inverses of ⇓r and ↓r,
so that (K⇑r)⇓r = K for all K ∈ Ur and (T ↑r)↓r = T for all T ∈ K ∈ Ur .
Tr ⇑r = T
(K1 → K2 )⇑r = K1 ⇑r → K2 ⇑r

If T ∈ T ⇓r, then T ↑r = T
If T ∈ (K1 → K2 )⇓r, then T ↑r = λ(x ∈ K1 ). if x ∈ K1 |r then (T (x↓r))↑r
else chtype(K2 , r)
where chtype(K, r) = (chtyr (K⇓r))↑r
The HOL-Omega Logic 251

3.3 Semantics of Ranks and Kinds

As mentioned earlier, ranks syntactically appear as natural numbers r, but are
actually combined with the hidden single rank variable z as z + r. A rank envi-
ronment ζ ∈ N gives the value of z. The semantics of ranks is then [[r]]ζ = ζ + r.
A kind environment ξ is a mapping from kind variables to elements of U.
The semantics of kinds [[k]]ξ is deﬁned by recursion over the structure of k:

[[ty]]ξ = T
[[κ]]ξ = ξ κ
[[k1 ⇒ k2 ]]ξ = [[k1 ]]ξ → [[k2 ]]ξ

3.4 Semantics of Types

We will distinguish bool, ind, and function types σ1 → σ2 as special cases, in
order to ensure a standard model. We assume a model M that takes a rank and
kind environment (ζ, ξ) and gives a valuation of each type constant τ of kind k
and rank r as an element of [[k]]ξ | [[r]]ζ . For clarity we omit the decoration [[ ]]M .
A type environment ρ takes a rank and a kind environment (ζ, ξ) to a mapping
of each type variable α of kind k and rank r to a value T ∈ [[k]]ξ | [[r]]ζ .
[[σ]]ζ,ξ,ρ is deﬁned by recursion over the structure of σ:

[[bool]]ζ,ξ,ρ = B
[[ind]]ζ,ξ,ρ = I
[[ σ1 → σ2 ]]ζ,ξ,ρ = [[σ1 ]]ζ,ξ,ρ → [[σ2 ]]ζ,ξ,ρ
[[τ ]]ζ,ξ,ρ = M (ζ, ξ) τ
[[α]]ζ,ξ,ρ = ρ (ζ, ξ) α
[[ σarg σopr ]]ζ,ξ,ρ = [[σopr ]]ζ,ξ,ρ [[σarg ]]ζ,ξ,ρ

[[σ]]ζ,ξ,ρ[α→T ] if T ∈ [[k]]ξ | [[r]]ζ
[[ λ(α : k :≤ r). σ ]]ζ,ξ,ρ = λT ∈ [[k]]ξ .
chtype([[kσ ]]ξ , [[rσ ]]ζ ) otherwise

[[ ∀(α : k :≤ r). σ ]]ζ,ξ,ρ = T ∈[[k]] ⇓[[r]] [[σ]]ζ,ξ,ρ[α→T ↑[[r]] ]
ξ ζ ζ

where for [[ λ(α : k :≤ r). σ ]]ζ,ξ,ρ , if T has rank larger than the variable α, an
arbitrary type of the kind kσ and rank rσ of σ is returned, essentially as an error.
By induction over the structure of types, it can be demonstrated that the
semantics of types is consistent with the semantics of kinds and ranks, i.e.,

[[ σ : k :≤ r ]]ζ,ξ,ρ ∈ [[k]]ξ | [[r]]ζ .

3.5 Semantics of Terms

In addition to the type mapping described above, the model M is assumed,
given a triple of a rank, kind, and type environments, to provide a valuation of
252 P.V. Homeier

each term constant c of type σ as an element of [[σ]]ζ,ξ,ρ . A term environment μ

takes a triple of a rank, kind, and type environments to a mapping of each term
variable x of type σ to a value v which is an element of [[σ]]ζ,ξ,ρ .
[[ t ]]ζ,ξ,ρ,μ is deﬁned by recursion over the structure of t:
[[ c ]]ζ,ξ,ρ,μ = M (ζ, ξ, ρ) c
[[ x ]]ζ,ξ,ρ,μ = μ (ζ, ξ, ρ) x
[[ t1 t2 ]]ζ,ξ,ρ,μ = [[ t1 ]]ζ,ξ,ρ,μ [[ t2 ]]ζ,ξ,ρ,μ
[[ λ(x : σ). t ]]ζ,ξ,ρ,μ = λv ∈ [[σ]]ζ,ξ,ρ . [[ t ]]ζ,ξ,ρ,μ[x→v]
[[ λ:(α : k :≤ r). t ]]ζ,ξ,ρ,μ = λT ∈ [[k]]ξ ⇓[[r]]ζ . [[t]]ζ,ξ,ρ[α→T ↑[[r]]
ζ ],μ

[[ t [:σ:] ]]ζ,ξ,ρ,μ = [[t]]ζ,ξ,ρ,μ ([[σ]]ζ,ξ,ρ ↓ [[r]]ζ )

where for t [: σ :], the type of t must have the form ∀α.σ , and r is the rank of α.

4 Primitive Rules of Inference of the HOL-Omega Logic

HOL-Omega includes all of the axioms and rules of inference of HOL, reinter-
preting them in light of the expanded sets of types and terms, and extends them
with the following new rules of inference, directed at the new varieties of terms.
– Rule INST TYPE is revised; it says that consistently and properly substituting
types for free type variables throughout a theorem yields a theorem.

Γ t
(INST TYPE)
Γ [σ1 , . . . , σn /α1 , . . . , αn ] t[σ1 , . . . , σn /α1 , . . . , αn ]
– Rule INST KIND says that consistently substituting kinds for kind variables
throughout a theorem yields a theorem.

Γ t
(INST KIND)
Γ [k1 , . . . , kn /κ1 , . . . , κn ] t[k1 , . . . , kn /κ1 , . . . , κn ]
– Rule INST RANK says that consistently incrementing by n ≥ 0 the rank of all
type variables throughout a theorem yields a theorem. z is the rank variable.

Γ t
(INST RANK)
Γ [(z + n)/z] t[(z + n)/z]
– Rule TY ABS says that if two terms are equal, then their type abstractions
are equal, where α is not free in Γ .

Γ t1 = t2
(TY ABS)
Γ (λ:α.t1 ) = (λ:α.t2 )
– Rule TY BETA CONV describes the equality of type beta-conversion, where
t[σ/α] denotes the result of substituting σ for free occurrences of α in t.
(TY BETA CONV)
(λ:α.t)[:σ:] = t[σ/α]
The HOL-Omega Logic 253

HOL-Omega adds one new axiom.

– Axiom TY ETA AX says type eta reduction is valid.

(λ:α:κ. t[:α:]) = t (TY ETA AX)

To ensure the soundness of the HOL-Omega logic, all of the axioms and rules
of inference need to have their semantic interpretations proven sound within set
theory for all rank, kind, type, and term environments. This has not yet been
formally done, but it is a priority for future work. When this is accomplished,
by the LCF approach, all theorems proven within HOL-Omega will be sound.

5 Additional Type and Term Deﬁnitions

Of course the core of any system is only a point from which to begin. This section
describes new type abbreviations and term constants not in HOL, deﬁned as
conservative extensions of the core logic of HOL-Omega.

5.1 New Type Abbreviations

HOL-Omega introduces the type abbreviations
I = λ(α : k). α
K = λ(α : k) (β : l). α
S = λ(α : k ⇒ l ⇒ m) (β : k ⇒ l) (γ : k). γ β (γ α)
o = λ(f : k ⇒ l) (g : l ⇒ m) (α : k). α f g
The use of kind variables k, l, and m makes these type abbreviations appli-
cable as type operators to types with arrow kinds. o is an inﬁx type operator,
written as f o g = λα. α f g. These are reminiscent of the term combinators,
e.g. I = λ(x:α).x, K = λ(x:α)(y:β).x, and (g : β → γ)◦(f : α → β) = λx. g (f x).
In HOL2P, both the arguments and the results of type operator applications
must have the base kind ty. In HOL-Omega, the arguments and results may
themselves be type operators of higher kind, as managed by the kind structure.
This is of great advantage, for example when using o to compose two type
operators, neither of which is applied to any arguments yet.

5.2 New Terms

HOL-Omega provides universal and existential quantiﬁcation of type variables
over terms using the new type binder constants ∀: and ∃:, deﬁned as
∀: = λP. (P = (λ:α:κ. T))
∃: = λP. (P = (λ:α:κ. F))
To ease readability, the following forms are also supported:
∀:α:κ. P = ∀: (λα:κ. P ) ∀:α1 :κ1 α2 :κ2 ... . P = ∀:α1 :κ1 . ∀:α2 :κ2 . ... P
∃:α:κ. P = ∃: (λα:κ. P ) ∃:α1 :κ1 α2 :κ2 ... . P = ∃:α1 :κ1 . ∃:α2 :κ2 . ... P
254 P.V. Homeier

6 Examples
The HOL-Omega logic makes it straightforward to express many concepts from
category theory, such as functors and natural transformations. Much of the ﬁrst
two examples below is ported from HOL2P [11]; the main diﬀerence is that
the higher-order type abbreviations and type inference of HOL-Omega allow a
more pleasing presentation. We focus on the category Type whose objects are
the proper types of the HOL-Omega logic, and whose arrows are the (total)
term functions from one type to another. The source and target of an arrow are
the domain and range of the type of the function. The identity arrows are the
identity functions on each type. The composition of arrows is normal functional
composition. The customary check that the target of one arrow is the source of
the other is accomplished automatically by the strong typing of the logic.

6.1 Functors
Functors map objects to objects and arrows to arrows. In the category Type,
the ﬁrst mapping is represented as a type F of kind ty ⇒ ty, and the second as
a function of the type F functor, where functor is the type abbreviation

functor = λF. ∀α β. (α → β) → (α F → β F ).

To be a functor, a function of this type must satisfy the following predicate:

functor (F : F functor) =
(∀:α. F (I : α → α) = I) ∧ Identity
(∀:α β γ. ∀(f : α → β)(g : β → γ). F (g ◦ f ) = F g ◦ F f ) Composition

where g ◦ f = λx. g (f x). This is actually an abbreviated version; the parser

and type inference ﬁll in the necessary type applications, so the full version is

functor (F : F functor) =
(∀:α. F [:α, α:] (I : α → α) = I) ∧ Identity
(∀:α β γ. ∀(f : α → β)(g : β → γ). Composition
F [:α, γ:] (g ◦ f ) = F [:β, γ:] g ◦ F [:α, β:] f )
In what follows, these type applications will normally be omitted for clarity.
In HOL, list : ty ⇒ ty is the type of finite lists. It is defined as a recursive
datatype with two constructors, [] : α list and :: : α → α list → α list.
:: is infix. The function MAP : (α → β) → (α list → β list) is defined by

MAP f [] = []
MAP f (x :: xs) = f x :: MAP f xs

Then MAP can be proven to be a functor: functor ((λ:α β. MAP) : list functor).
A simple functor is the identity function I: functor ((λ:α β. I) : I functor).
The composition of two functors is a functor. We overload ◦ to deﬁne this:

(G : G functor) ◦ (F : F functor) = λ:α β. G[:α F , β F :] ◦ F [:α, β:]

The HOL-Omega Logic 255

The result has type (F o G)functor. As an example, (λ:α β. MAP) ◦ (λ:α β. MAP) =
(λ:α β. MAP ◦ MAP) : (list o list)functor is a functor. The type composition
operator o reﬂects the category theory composition of two functors’ mappings
on objects. In HOL2P, the MAP functor composition example is expressed as:
TYINST (θ → λα. (α list)list) functor (λ:α β. λf. MAP (MAP f ))
Here the notation has been adjusted to that of this paper, for ease of com-
parison. TYINST is needed to manually instantiate a free type variable θ of the
functor predicate with the type for this instance, which must be stated as a type
abstraction. HOL-Omega’s kinds and type inference enable a clearer statement:
functor (λ:α β. MAP ◦ MAP)
Beyond the power of HOL2P, HOL-Omega supports quantiﬁcation over functors:
∃:F . ∃(F : F functor). functor F.

6.2 Natural Transformations

Given functors F and G, a natural transformation maps objects A to arrows
F A → GA. In the category Type, we represent natural transformations as
functions of the type (F , G)nattransf, where nattransf is the type abbreviation
nattransf = λF G. ∀α. α F → α G.
A natural transformation φ from a functor F to a functor G (φ : F → G)
must satisfy the following predicate:
nattransf (φ : (F , G)nattransf) (F : F functor) (G : G functor) =
∀:α β. ∀(h : α → β). G h ◦ φ = φ ◦ F h
Define the function INITS to take a list and return a list of all prefixes of it:
INITS [] = []
INITS (x :: xs) = [] :: MAP (λys. x :: ys) (INITS xs)
INITS can be proven to be a natural transformation from MAP to MAP ◦ MAP:
nattransf ((λ:α. INITS) : (list, list o list)nattransf)
((λ:α β. MAP) : list functor)
((λ:α β. MAP ◦ MAP) : (list o list)functor).
The vertical composition of two natural transformations is defined as
(φ2 : (G, H)nattransf) ◦ (φ1 : (F , G)nattransf) = λ:α. φ2 ◦ (φ1 [:α:])
The result of this vertical composition is a natural transformation:
nattransf ( φ1 : (F , G)nattransf) F G ∧
nattransf ( φ2 : (G, H)nattransf) G H ⇒
nattransf ( φ2 ◦ φ1 : (F , H)nattransf) F H
256 P.V. Homeier

A natural transformation may be composed with a functor in two ways, where

the functor is either applied ﬁrst or last. We deﬁne these, again overloading ◦:
(φ : (F , G)nattransf) ◦ (H : H functor) = λ:α. φ [:α H:]

(H : H functor) ◦ (φ : ( F , G)nattransf) = λ:α. H (φ [:α:])
That the last of these is a natural transformation is expressed in HOL2P as
nattransf φ F G ∧ functor H ⇒
TYINST ((θ1 → λα. ((α)θ1 )θ3 ) (θ2 → λα. ((α)θ2 )θ3 ))
nattransf (λ:α. H φ) (λ:α β. H ◦ F ) (λ:α β. H ◦ G)
where in HOL-Omega, the type inference, higher kinds, and overloaded ◦ permit
nattransf φ F G ∧ functor H ⇒
nattransf (H ◦ φ) (H ◦ F ) (H ◦ G).

6.3 Monads
Wadler [12] has proposed using monads to structure functional programming.
He defines a monad as a triple (M , unit, #=) of a type operator M and two
term operators unit and #= (where #= is an infix operator) obeying three laws.
We express this definition in HOL-Omega as follows.
We define two type abbreviations unit and bind:
unit = λM. ∀α. α → α M
bind = λM. ∀α β. α M → (α → β M ) → β M
We define a monad to be two term operators, unit and #=, with a single
common free type variable M : ty⇒ty, satisfying a predicate of the three laws:
monad (unit : M unit, #= : M bind) =
(∀:α β. ∀(a : α)(k : α → β M ). (Left unit)
unit a #= k = k a) ∧
(∀:α. ∀(m : α M ). (Right unit)
m #= unit = m) ∧
(∀:α β γ. ∀(m : α M )(k : α → β M )(h : β → γ M ). (Associative)
(m #= k) #= h = m #= (λα. k α #= h))
As an example, we define the unit and #= operations for a state monad as
state = λσ α. σ → α × σ

state unit = λ:α. λ(x:α) (s:σ). (x, s)

state bind = λ:α β. λ(w:(σ, α)state) (f :α→(σ, β)state) (s:σ). let (x, s ) = w s
in f x s
Then we can prove these operations satisfy the monad predicate for M = σ state,
taking advantage of the curried nature of state, where (σ, α)state = α (σ state):
monad ( state unit : (σ state)unit, state bind : (σ state)bind ).
The HOL-Omega Logic 257

Wadler [12] also formulates an alternative deﬁnition of monads, expressed in

terms of three operators, unit, map, and join, satisfying seven laws:

map = λM. ∀α β. (α → β) → (α M → β M )
join = λM. ∀α. α M M → α M

umj monad (unit : M unit, map : M map, join : M join) =

(∀:α. map (I : α → α) = I) ∧ (map I)
(∀:α β γ. ∀(f : α → β)(g : β → γ). (map o)
map (g ◦ f ) = map g ◦ map f ) ∧
(∀:α β. ∀(f : α → β). (map unit)
map f ◦ unit = unit ◦ f ) ∧
(∀:α β. ∀(f : α → β). (map join)
map f ◦ join = join ◦ map (map f )) ∧
(∀:α. join ◦ unit = (I : α M → α M )) ∧ (join unit)
(∀:α. join ◦ map unit = (I : α M → α M )) ∧ (join map unit)
(∀:α. join [:α:] ◦ map join = join ◦ join) (join map join)
Given a monad deﬁned using unit and #=, corresponding map and join op-
erators MMAP(unit, #=) and JOIN(unit, #=) may be constructed automatically:

MMAP (unit : M unit, #= : M bind)

= λ:α β. λ(f : α → β) (m : α M ). m #= (λa. unit (f a))
JOIN (unit : M unit, #= : M bind)
= λ:α. λ(z : α M M ). z #= I

Given a monad deﬁned using unit, map, and join, the corresponding #=
operator BIND(map, join) may also be constructed automatically:

BIND (map : M map, join : M join)

= λ:α β. λ(m : α M ) (k : α → β M ). join (map k m)

E.g., for the state monad, state map = MMAP (state unit, state bind)
state join = JOIN (state unit, state bind)
state bind = BIND (state map, state join).
Then it can be proven that these two deﬁnitions of a monad are equivalent.

monad (unit : M unit, #= : M bind) ⇔

(umj monad (unit, MMAP(unit, #=), JOIN(unit, #=)) ∧
#= = BIND (MMAP(unit, #=), JOIN(unit, #=)))

umj monad(unit : M unit, map : M map, join : M join) ⇒

monad(unit, BIND(map, join))

Lack and Street [7] deﬁne monads as a category A, a functor t : A → A, and

natural transformations μ : t2 → t and η : 1A → t satisfying three equations, as
expressed by the commutative diagrams (in the functor category)
258 P.V. Homeier

tμ tη
t3 - t2 t - t2 ηt t
@
1@
μt μ μ
1
? ? R ?
@
t2 - t t.
μ

This deﬁnition can be expressed in HOL-Omega as follows:

cat monad (t : M functor, μ : (M o M , M )nattransf, η : (I, M )nattransf) =

functor t ∧ (t is a functor)
nattransf μ (t ◦ t) t ∧ (μ is a natural transformation)
nattransf η (λ:α β. I) t ∧ (η is a natural transformation)
(μ ◦ (t ◦ μ) = μ ◦ (μ ◦ t)) ∧ (square commutes)
(μ ◦ (t ◦ η) = λ:α. I) ∧ (left triangle commutes)
(μ ◦ (η ◦ t) = λ:α. I) (right triangle commutes).

It can be proven that this is equivalent to the (unit, map, join) deﬁnition:

∀(unit : M unit) map join.

cat monad(map, join, unit) ⇔ umj monad(unit, map, join).

Therefore all three deﬁnitions of monads are equivalent.

7 Conclusion

This document has presented a description of the core logic of the HOL-Omega
theorem prover. This has been implemented as a variant of the HOL4 theorem
prover. The implementation may be downloaded by the command

svn checkout https://hol.svn.sf.net/svnroot/hol/branches/HOL-Omega

Installation instructions are in the top directory.

This provides a practical workbench for developments in the HOL-Omega
logic, integrated in a natural and consistent manner with the existing HOL4
tools and libraries that have been refined and extended over many years.
This implementation was designed with particular concern for backward
compatibility.This was almost entirely achieved, which was possible only because
the fundamental data types representing types and terms were originally
encapsulated. This meant that the underlying representation could be changed
without affecting the abstract view of types and terms by the rest of the system.
Virtually all existing HOL4 code will build correctly, including the extensive
libraries. The simplifiers have been upgraded, including higher-order matching of
the new types and terms and automatic type beta-reduction. Algebraic types with
higher kinds and ranks may be constructed using the familiar Hol datatype tool
[5]. Not all of the tools will work as expected on the new terms and types, as the
revision process is ongoing, but they will function identically on the classic terms
and types. So nothing of HOL4’s power has been lost.
The HOL-Omega Logic 259

Also, the nimble ease of use of HOL has been largely preserved. For example, the
type inference algorithm is a pure extension, so that all classic terms have the same
types successfully inferred. Inference of most general types for all terms is not always
possible, as also seen in System F, and type inference may fail even for typeable
terms, but in practice a few user annotations are usually suﬃcient.
The system is still being developed but is currently useful. All of the examples pre-
sented have been mechanized in the examples/HolOmega subdirectory, along with
further examples from Algebra of Programming [1] ported straightforwardly from
HOL2P, including homomorphisms, initial algebras, catamorphisms, and the ba-
nana split theorem. While maintaining backwards compatibility with the existing
HOL4 system and libraries, the additional expressivity and power of HOL-Omega
makes this tool applicable to a great collection of new problems.

Acknowledgements. Norbert Völker’s HOL2P [11] was an vital inspiration.

Michael Norrish helped get the new branch of HOL4 established and to begin the
new parsers and prettyprinters. John Matthews suggested adding the single rank
variable to every rank. William Schneeburger justiﬁed an aggressive set-theoretic
semantics of ranks. Mike Gordon has consistently encouraged this work. We honor
his groundbreaking and seminal achievement in the original HOL system [3], with-
out which none of this work would have been possible.

Soli Deo Gloria.

References
1. Bird, R., de Moor, O.: Algebra of Programming. Prentice Hall (1997)
2. Coquand, T.: A new paradox in type theory. In: Prawitx, D., Skyrms, B., Westerstahl,
D. (eds.) Proceedings 9th Int. Congress of Logic, Methodology and Philosophy of
Science, pp. 555–570. North-Holland, Amsterdam (1994)
3. Gordon, M.J.C., Melham, T.F.: Introduction to HOL. Cambridge University Press,
Cambridge (1993)
4. Gordon, M.J.C., Pitts, A.M.: The HOL Logic and System. In: Bowen, J. (ed.) Towards
Verified Systems, ch. 3, pp. 49–70. Elsevier Science B.V., Amsterdam (1994)
5. The HOL System DESCRIPTION (Version Kananaskis 4),
http://downloads.sourceforge.net/hol/kananaskis-4-description.pdf
6. The HOL System LOGIC (Version Kananaskis 4),
http://downloads.sourceforge.net/hol/kananaskis-4-logic.pdf
7. Lack, S., Street, R.: The formal theory of monads II. Journal of Pure Applied Algo-
rithms 175, 243–265 (2002)
8. Melham, T.F.: The HOL Logic Extended with Quantification over Type Variables.
Formal Methods in System Design 3(1-2), 7–24 (1993)
9. Monk, J.D.: Introduction to Set Theory. McGraw-Hill, New York (1969)
10. Pierce, B.C.: Types and Programming Languages. MIT Press, Cambridge (2002)
11. Völker, N.: HOL2P - A System of Classical Higher Order Logic with Second Order
Polymorphism. In: Schneider, K., Brandt, J. (eds.) TPHOLs 2007. LNCS, vol. 4732,
pp. 334–351. Springer, Heidelberg (2007)
12. Wadler, P.: Monads for functional programming. In: Jeuring, J., Meijer, E. (eds.) AFP
1995. LNCS, vol. 925. Springer, Heidelberg (1995)
A Purely Definitional Universal Domain

Brian Huﬀman

Portland State University

[email protected]

Abstract. Existing theorem prover tools do not adequately support

reasoning about general recursive datatypes. Better support for such
datatypes would facilitate reasoning about a wide variety of real-world
programs, including those written in continuation-passing style, that are
beyond the scope of current tools.
This paper introduces a new formalization of a universal domain that
is suitable for modeling general recursive datatypes. The construction is
purely deﬁnitional, introducing no new axioms. Deﬁning recursive types
in terms of this universal domain will allow a theorem prover to derive
strong reasoning principles, with soundness ensured by construction.

1 Introduction
One of the main attractions of pure functional languages like Haskell is that
they promise to be easy to reason about. However, that promise has not yet
been fulﬁlled. To illustrate this point, let us deﬁne a couple of datatypes and
functions, and try to prove some simple properties.
data Cont r a = MkCont ((a -> r) -> r)

mapCont :: (a -> b) -> Cont r a -> Cont r b

mapCont f (MkCont c) = MkCont (\k -> c (k . f))

data Resumption r a = Done a | More (Cont r (Resumption r a))

bind :: Resumption r a -> (a -> Resumption r b) -> Resumption r b

bind (Done x) f = f x
bind (More c) f = More (mapCont (\r -> bind r f) c)
Haskell programmers may recognize type Cont as a standard continuation
monad. Along with the type definition is a map function mapCont, for which we
expect the functor laws to hold. By itself, type Cont is not difficult to work with.
None of the definitions are recursive, so they can be formalized easily in most
any theorem prover. Proofs of the functor laws mapCont id = id and mapCont
(f . g) = mapCont f . mapCont g are straightforward.
Things get more interesting with the next datatype definition. Monad experts
might notice that type Resumption is basically a resumption monad transformer
wrapped around a continuation monad. The function bind is the monadic bind

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 260–275, 2009.

c Springer-Verlag Berlin Heidelberg 2009
A Purely Deﬁnitional Universal Domain 261

operation for the Resumption monad; together with Done as the monadic unit,
we should expect bind to satisfy the monad laws.
The first monad law follows trivially from the definition of bind. Instead,
let’s consider the second monad law (also known as the right-unit law) which
states that bind r Done = r. How can we go about proving this, formally or
otherwise?
It might be worthwhile to try case analysis on r, for a start. If r is equal
to Done x, then from the definition of bind we have bind (Done x) Done =
Done x, so the law holds in this case. Furthermore, if r is equal to ⊥, then
from the strictness of bind we have bind ⊥ Done = ⊥, so the law also holds
for ⊥. Finally, we must consider the case when r is equal to More c. Using the
definition of bind we obtain the following:

bind (More c) Done = More (mapCont (\r -> bind r Done) c)

Now, if we could only rewrite the bind r Done on the right-hand side to r,
then we could use the functor identity law for mapCont to simplify the entire
right-hand side to More c. Perhaps an appropriate induction rule could help.
When doing induction over simple datatypes like lists, the inductive hypoth-
esis simply assumes that the property being proved holds for an immediate
subterm: We get to assume P(xs) in order to show P(x : xs). This kind of
inductive hypothesis will not work for type Resumption, because of the indirect
recursion in its deﬁnition.
In fact, an induction rule for Resumption appropriate for our proof does exist.
(The proof of the second monad law using this induction scheme is left as an
exercise for the reader.)

admissible(P)
P(undefined)
∀x. P(Done x) (1)
∀f c. (∀x. P(f x)) −→ P(More (mapCont f c))
∀x. P(x)

This induction rule is rather unusual—the inductive step quantifies over a func-
tion f, and also mentions mapCont. It is probably not obvious to most readers
that it is correct. How can we trust it? It would be desirable to formally prove
such rules using a theorem prover.
Unfortunately, a fully mechanized semantics of general recursive datatypes
does not yet exist. Various theorem provers have facilities for defining recur-
sive datatypes, but none can properly deal with datatype definitions like the
Resumption type introduced earlier. The non–strictly positive recursion causes
the definition to be rejected by both Isabelle/HOL’s datatype package and Coq’s
inductive definition mechanism.
Of all the currently available theorem proving tools, the Isabelle/HOLCF
domain package is the closest to being able to support such datatypes. It uses
the continuous function space, so it is not limited to strictly positive recursion.
However, the domain package has some problems due to the fact that it generates
262 B. Huffman

non-trivial axioms “on the fly”: For each type definition, the domain package
declares the existence of the new type (without defining it), and asserts an
appropriate type isomorphism and induction rule. The most obvious worry with
this design is the potential for unsoundness. On the other hand, the desire to
avoid unsoundness can lead to an implementation that is overly conservative.
In contrast with the current domain package, the Isabelle/HOL inductive
datatype package [14] is purely definitional. It uses a parameterized universe
type, of which new datatypes are defined as subsets. Induction rules are not
asserted as axioms; rather, they are proved as theorems. Using a similar design
for the HOLCF domain package would allow strong reasoning principles to be
generated, with soundness ensured by construction.
The original contributions of this paper are as follows:
– A new construction of a universal domain that can represent a wide variety of
types, including sums, products, continuous function space, powerdomains,
and recursive types built from these. Universal domain elements are defined
in terms of sets of natural numbers, using ideal completion—thus the con-
struction is suitable for simply-typed, higher-order logic theorem provers.
– A formalization of this construction in the HOLCF library of the Isabelle
theorem prover. The formalization is fully definitional; no new axioms are
asserted.
Section 2 reviews various domain theory concepts used in the HOLCF formal-
ization. The construction of the universal domain type itself, along with embed-
ding and projection functions, are covered in Section 3. Section 4 describes how
the universal domain can be used to define recursive types. After a discussion of
related work in Section 5, conclusions and directions for future work are found
in Section 6.

2 Background Concepts
This paper assumes some familiarity with basic concepts of domain theory: A
partial order is a set with a reflexive, transitive, antisymmetric relation ('). A
chain is an increasing sequence indexed by the naturals; a complete partial order
(cpo) has a least upper bound (lub) for every chain. A pointed cpo also has a
least element, ⊥. A continuous function preserves lubs of chains. An admissible
predicate holds for the lub of a chain, if it holds for all elements of the chain.
HOLCF [13] is a library of domain theory built on top of the Isabelle/HOL
theorem prover. HOLCF defines all of the standard notions listed above; it also
defines standard type constructors like the continuous function space, and strict
sums and products. The remainder of this section is devoted to some more
specialized concepts included in HOLCF that support the formalization of the
universal domain.

2.1 Embedding-Projection Pairs

Some cpos can be embedded within other cpos. The concept of an embedding-
projection pair (often shortened to ep-pair ) formalizes this notion. Let A and B
A Purely Deﬁnitional Universal Domain 263

data Shrub = Node Shrub Shrub | Tip

data Tree = Branch Tree Tree | Leaf | Twig

embed :: Shrub -> Tree

embed (Node l r) = Branch (embed l) (embed r)
embed Tip = Twig

project :: Tree -> Shrub

project (Branch l r) = Node (project l) (project r)
project Leaf = undefined
project Twig = Tip

deflate :: Tree -> Tree

deflate (Branch l r) = Branch (deflate l) (deflate r)
deflate Leaf = undefined
deflate Twig = Twig

Fig. 1. Embedding-projection pairs and deﬂations in Haskell. Function deflate is equal

to the composition of functions embed and project.

be cpos, and e : A → B and p : B → A be continuous functions. Then e and p

are an ep-pair if p ◦ e = IdA and e ◦ p ' IdB . The existence of such an ep-pair
means that cpo A can be embedded within cpo B.
Figure 1 shows an example in Haskell, where the the type Shrub is embedded
into the larger type Tree. If we embed a Shrub into type Tree, we can always
project back out to recover the original value. In other words, for all s, we have
project (embed s) = s. On the other hand, if we project a Tree out to type
Shrub, then embed back into type Tree, we may or may not get back the same
value we started with. If t of type Tree contains no Leaf constructors at all,
then we have embed (project t) = t. Otherwise, we basically end up with a
tree with all its leaves stripped oﬀ—each Leaf constructor is replaced with ⊥.

2.2 Deflations
Cpos may contain other cpos as subsets. A deflation 1 is a way to encode such a
sub-cpo as a continuous function. Let B be a cpo, and d : B → B be a continuous
function. Then d is a deflation if d ◦ d = d ' IdB . The image set of deflation
d : B → B gives a sub-cpo of B.
Essentially, a deflation is a value that represents a type. For example, the
function deflate in Fig. 1 is a deflation; its image set consists of exactly those
values of type Tree that contain no Leaf constructors. Note that while the
the definition of deflate does not mention type Shrub at all, its image set
is isomorphic to type Shrub—in other words, deflate (a function value) is a
representation of Shrub (a type).
1
My usage of deflation follows Gunter [6]. Many authors use the term projection to
refer to the same concept, but I prefer deflation because it avoids confusion with the
second half of an ep-pair.
264 B. Huffman

While types can be represented by deflations, type constructors (which are like
functions from types to types) can be represented as functions from deflations
to deflations. For example, the map function represents Haskell’s list type con-
structor: While deflate is a deflation on type Tree that represents type Shrub,
map deflate is a deflation on type [Tree] that represents type [Shrub].
Deflations and ep-pairs are closely related. Given an ep-pair (e, p) from cpo A
into cpo B, the composition e◦p is a deflation on B whose image set is isomorphic
to A. Conversely, every deflation d : B → B also gives rise to an ep-pair. Define
the cpo A to be the image set of d; also define e to be the inclusion map from A
to B, and define p = d. Then (e, p) is an embedding-projection pair. So saying
that there exists an ep-pair from A to B is equivalent to saying that there exists
a deflation on B whose image set is isomorphic to A.
Finally we are ready to talk about what it means for a cpo to be a universal
domain. A cpo U is universal for a class of cpos, if for every cpo D in the class,
there exists an ep-pair from D into U . Equivalently, for every D there must exist
a deflation on U with an image set isomorphic to D.

2.3 Algebraic and Biﬁnite Cpos

Lazy recursive datatypes often have infinite as well as finite values.2 For example,
we can define a datatype of recursive lazy lists of booleans:

data BoolList = Nil | Cons Bool BoolList

Finite values of type BoolList include total values like Cons False Nil, and
Cons True (Cons False Nil), along with partial finite values like Cons False
undefined. On the other hand, recursive definitions can yield infinite values:

trues :: BoolList
trues = Cons True trues

One way to characterize the set of ﬁnite values is in terms of an approx

function, defined below. The function approx is similar to the standard list
function take that we all know and love, except that approx 0 returns ⊥ instead
of Nil. (This makes each approx n into a deflation.) A value xs of type BoolList
is finite if and only if there exists some n such that approx n xs = xs.

approx :: Int -> BoolList -> BoolList

approx 0 xs = undefined
approx n Nil = Nil
approx n (Cons x xs) = Cons x (approx (n-1) xs)
2
The formalization actually uses the related concept of compactness in place of finite-
ness. A value k is compact iff (λx. k x) is an admissible predicate. Compactness
and finiteness do not necessarily coincide; for example, in a cpo of ordinals, ω + 1 is
compact but not finite. In the context of recursive datatypes, however, the concepts
are generally equivalent.
A Purely Definitional Universal Domain 265

The function approx is so named because for any input value xs it generates
a sequence of finite approximations to xs. For example, the first few approxi-
mations to trues are ⊥, Cons True ⊥, Cons True (Cons True ⊥), and so on.
Each is finite, but the least upper bound of the sequence is the infinite value
trues. This property of a cpo, where every infinite value can be written as the
least upper bound of a chain of finite values, is called algebraicity. Thus BoolList
is an algebraic cpo.
The sequence of deflations approx n is a chain of functions whose least upper
bound is the identity function. In terms of image sets, we have a sequence of
partial orders whose limit is the whole type BoolList.
A further property of approx which may not be immediately apparent is that
for any n, the image of approx n is a finite set. This means that image sets of
approx n yield a sequence of finite partial orders. As a limit of finite partial
orders, we say that type BoolList is a bifinite cpo. More precisely, as a limit of
countably many finite partial orders, BoolList is an omega-bifinite cpo.3
The omega-bifinites are a useful class of cpos because bifiniteness is pre-
served by all of the type constructors defined in HOLCF. Furthermore, all
Haskell datatypes are omega-bifinite. Basically any type constructor that pre-
serves finiteness will preserve bifiniteness as well. More details about the formal-
ization of omega-bifinite domains in HOLCF can be found in [10].

2.4 Ideal Completion and Continuous Extensions

In an algebraic cpo the set of finite elements, together with the ordering relation
on them, completely determines the structure of the entire cpo. We say that the
set of finite elements forms a basis for the cpo, and the entire cpo is a completion
of the basis.
Given a basis B with ordering relation ((), we can reconstruct the whole
algebraic cpo. The standard process for doing this is called ideal completion, and
it is done by considering the set of ideals over the basis.
An ideal is a non-empty, downward-closed, directed set—that is, it contains
an upper bound for any finite subset. A principal ideal is an ideal of the form
{y. y ( x} for some x, denoted ↓ x. The set of all ideals over B, ( is denoted
Idl(B); when ordered by subset inclusion, Idl(B) forms an algebraic cpo. The
compact elements of Idl(B) are exactly those represented by principal ideals.
Note that the relation (() does not need to be antisymmetric. For x and y
that are equivalent (that is, both x ( y and y ( x) the principal ideals ↓ x and
↓ y are equal. This means that the ideal completion construction automatically
takes care of quotienting by the equivalence induced by (().
Just as the structure of an algebraic cpo is completely determined by its
basis, a continuous function from an algebraic cpo to another cpo is completely
determined by its action on basis elements. This suggests a method for defining
continuous functions over ideal completions: First, define a function f from basis
3
“SFP domain” is another name, introduced by Plotkin [15], that is used for the same
concept—the name stands for Sequence of Finite Posets.
266 B. Huffman

B to cpo C such that f is monotone, i.e. x ( y implies f (x) ' f (y). Then we

can define the continuous extension of f as f(S) = x∈S f (x). The function
f is the unique continuous function of type Idl(B) → C that agrees with f on
principal ideals—that is, for all x : B, f(↓ x) = f (x).
In the next section, all of the constructions related to the universal domain will
be done in terms of basis values: The universal domain itself will be defined using
ideal completion, and the embedding and projection functions will be defined as
continuous extensions.
HOLCF includes a formalization of ideal completion and continuous exten-
sions, which was created to support the definition of powerdomains [10].

3 Construction of the Universal Domain

Informally, an omega-biﬁnite domain is a cpo that can be written as the limit

of a sequence of finite partial orders. This section describes how to construct a
universal omega-bifinite domain U , along with an ep-pair from another arbitrary
omega-bifinite domain D into U . The general strategy is as follows:

– From the biﬁnite structure of D, obtain a sequence of ﬁnite posets Pn whose

limit is D.
– Following Gunter [7], decompose the sequence Pn further into a sequence of
increments that insert new elements one at a time.
– Construct a universal domain basis that can encode any increment.
– Construct the actual universal domain U using ideal completion.
– Deﬁne the embedding and projection functions between D and U using con-
tinuous extension, in terms of their action on basis elements.

The process of constructing a sequence of increments is described in Sec. 3.1.

The underlying theory is standard, so the section is primarily exposition; the
original contribution here is the formalization of that work in a theorem prover.
The remainder of the construction, including the basis and embedding/projection
functions, is covered in Sec. 3.2 onwards; here both the theory and the formal-
ization are original.

3.1 Building a Sequence of Increments

Any omega-biﬁnite domain D can be represented as the limit of a sequence

of finite posets, with embedding-projection pairs between each successive pair.
Figure 2 shows the first few posets from one such sequence.
In each step along the chain, each new poset Pn+1 is larger than the previous
Pn by some finite amount; the structure of Pn+1 has Pn embedded within it, but
it has some new elements as well.
An ep-pair between finite posets P and P , where P has exactly one more
element than P , is called an increment (terminology due to Gunter [8]). In Fig. 2,
the embedding of P1 into P2 is an example of an increment.
A Purely Definitional Universal Domain 267

P0 P1 P2 P3

Fig. 2. A sequence of ﬁnite posets. Each Pn can be embedded into Pn+1 ; black nodes
indicate the range of the embedding function.

The strategy for embedding a bifinite domain into the universal domain is
built around increments. The universal domain is designed so that if a finite
partial order P is representable (i.e. by a deflation), and there is an increment
from P to P , then P will also be representable.
For all embeddings from Pn to Pn+1 that add more than one new value, we
will need to decompose the single large embedding into a sequence of smaller
increments. The challenge, then, is to determine in which order the new elements
should be inserted. The order matters: Adding elements in the wrong order can
cause problems, as shown in Fig. 3.

=⇒ =⇒ =⇒

Fig. 3. The right (top) and wrong (bottom) way to order insertions. No ep-pair exists
between the 3-element and 4-element posets on the bottom row.

To describe the position of a newly-inserted element, it will be helpful to

invent some terminology. The set of elements above the new element will be
known as its superiors. An element immediately below the new element will be
known as its subordinate.
In order for the insertion of a new element to be a valid increment, it must
have exactly one subordinate. The subordinate indicates the value that the in-
crement’s projection maps the new value onto.
With the four-element poset in Fig. 3, it is not possible to insert the top
element last. The reason is that the element has two subordinates: If a projection
function maps the new element to one, the ordering relation with the other will
not be preserved. Thus a monotone projection does not exist.
268 B. Huﬀman

A strategy for successfully avoiding such situations is to always insert maximal

elements ﬁrst [7, §5]. Fig. 4 shows this strategy in action. Notice that the number
of superiors varies from step to step, but each inserted element always has exactly
one subordinate. To maintain this invariant, the least of the four new values must
be inserted last.

P2 P2.1 P2.2 P2.3 P3

Fig. 4. A sequence of four increments going from P2 to P3 . Each new node may have
any number of upward edges, but only one downward edge.

Armed with this strategy, we can finally formalize the complete sequence of
increments for type D. To each element x of the basis of D we must assign
a sequence number place(x)—this numbering tells in which order to insert the
values. The HOLCF formalization breaks up the definition of place as follows.
First, each basis value is assigned to a rank, where rank (x) = n means that the
basis value x first appears in the poset Pn . Equivalently, rank (x) is the least
n such that approx n (x) = x. Then an auxiliary function pos assigns sequence
numbers to values in finite sets, by repeatedly removing an arbitrary maximal
element until the set is empty. Finally, place(x) is defined as the sequence number
of x within its (finite) rank set, plus the total size of all earlier ranks.

choose(A) = (εx ∈ A. ∀y ∈ A. x ' y −→ x = y) (2)

0, if x = choose(A)
pos(A, x) = (3)
1 + pos(A − {choose(A)}, x), if x = choose(A)
place(x) = pos({y. rank (y) = rank (x)}, x) + |{y. rank (x) < rank (y)}| (4)

For the remainder of this paper, it will be sufficient to note that the place function
satisfies the following two properties:
– Values in earlier ranks come before values in later ranks: If rank (x) <
rank (y), then place(x) < place(y).
– Within the same rank, larger values come first: If rank (x) = rank (y) and
x ' y, then place(y) < place(x).

3.2 A Basis for the Universal Domain

Constructing a partial order incrementally, there are two possibilities for any
newly inserted value:
A Purely Deﬁnitional Universal Domain 269

– The value is the very ﬁrst one (i.e. it is ⊥)

– The value is inserted above some previous value (its subordinate), and below
zero or more other previous values (its superiors)
Accordingly, we can define a datatype to describle the position of these values
relative to each other. (Usage of Haskell datatype syntax is merely for conve-
nience; this is not intended to be viewed as a lazy datatype. Here Nat represents
the natural numbers, and Set a represents finite sets with elements of type a.)
data Basis = Bottom | Node { serial_number :: Nat
, subordinate :: Basis
, superiors :: Set Basis }
The above definition does not work as a datatype definition in Isabelle/HOL,
because the finite set type constructor does not work with the datatype package.
(Indirect recursion only works with other inductive datatypes.) But it turns out
that we do not need the datatype package at all—the type Basis is actually
isomorphic to the natural numbers. Using the bijections N ∼
= 1+N and N ∼ = N×N
with N ∼= Pf (N), we can construct a bijection that lets us use N as the basis
datatype:
N∼= 1 + N × N × Pf (N) (5)
In the remainder of this section, we will use mathematical notation to write
values of the basis datatype: ⊥ represents Bottom, and i, a, S will stand for
Node i a s.

e a = ⊥
b = 1, a, {}
f g
c = 2, a, {}
d = 3, a, {b}
b h
e = 4, b, {}
d c f = 5, d, {e}
g = 6, c, {}
a h = 7, a, {e, f, g}

Fig. 5. Embedding elements of P3 into the universal domain basis

Figure 5 shows how this system works for embedding all the elements from
the poset P3 into the basis datatype. The elements have letter names from a–
h, assigned alphabetically by insertion order. In the datatype encoding of each
element, the subordinate and superiors are selected from the set of previously
inserted elements. Serial numbers are assigned sequentially.
The serial number is necessary to distinguish multiple values that are inserted
in the same position. For example, in Fig. 5, elements b and c both have a as the
subordinate, and neither has any superiors. The serial number is the only way
to tell such values apart.
270 B. Huﬀman

Note that the basis datatype seems to contain some junk—some subordi-
nate/superiors combinations are not well formed. For example, in any valid in-
crement, all of the superiors are positioned above the subordinate. One way to
take care of this requirement would be to deﬁne a well-formedness predicate for
basis elements. However, it turns out that it is possible (and indeed easier) to
simply ignore any invalid elements. In the set of superiors, only those values that
are above the subordinate will be considered. (This will be important to keep in
mind when we deﬁne the basis ordering relation.)
There is also a possibility of multiple representations for the same value.
For example, in Fig. 5 the encoding of h is given as 7, a, {e, f, g}, but the
representation 7, a, {f, g} would work just as well (since the sets have the same
upward closure). One could consider having a well-formedness requirement for
the set of superiors to be upward-closed. But this turns out not to be necessary,
since the extra values do not cause problems for any of the formal proofs.

3.3 Basis Ordering Relation

To perform the ideal completion, we need to define a preorder relation on the
basis. The basis value i, a, S should fall above a and below all the values in
set S that are above a. Accordingly, we define the relation (() as the smallest
reflexive, transitive relation that satisfies the following two introduction rules:
a ( i, a, S (6)
a ( b ∧ b ∈ S =⇒ i, a, S ( b (7)
Note that the relation (() is not antisymmetric. For example, we have both
a ( i, a, {a} and i, a, {a} ( a. However, for ideal completion this does not
matter. Basis values a and i, a, {a} generate the same principal ideal, so they
will be identified as elements of the universal domain.
Also note the extra hypothesis a ( b in Eq. (7). Because we have not banished
ill-formed subordinate/superiors combinations from the basis datatype, we must
explicitly consider only those elements of the set of superiors that are above the
subordinate.

3.4 Building the Embedding and Projection

In the HOLCF formalization, the embedding function emb from D to U is de-
fined using continuous extension. The first step is to define emb on basis elements,
generalizing the pattern shown in Fig. 5. The definition below uses wellfounded
recursion—all recursive calls to emb are on previously inserted values with smaller
place numbers:
⊥ if x = ⊥
emb(x) =
i, a, S otherwise
where i = place(x) (8)
a = emb(sub(x))
S = {emb(y) | place(y) < place(x) ∧ x ' y}
A Purely Definitional Universal Domain 271

The subordinate value a is computed using a helper function sub, which is defined
as sub(x) = approx n−1 (x), where n = rank (x). The ordering produced by the
place function ensures that no previously inserted value with the same rank as
x will be below x. Therefore the previously inserted value immediately below x
must be sub(x), which comes from the previous rank.
In order to complete the continuous extension, it is necessary to prove that
the basis embedding function is monotone. That is, we must show that for any
x and y in the basis of D, x ' y implies emb(x) ( emb(y). The proof is by
well-founded induction over the maximum of place(x) and place(y). There are
two main cases to consider:
– Case place(x) < place(y): Since x ' y, it must be the case that rank (x) <
rank (y). Then, using the definition of sub it can be shown that x ' sub(y);
thus by the inductive hypothesis we have emb(x) ( emb(sub(y)). Also, from
Eq. (6) we have emb(sub(y)) ( emb(y). Finally, by transitivity we have
emb(x) ( emb(y).
– Case place(y) < place(x): From the definition of sub we have sub(x) ' x. By
transitivity with x ' y this implies sub(x) ' y; therefore by the inductive
hypothesis we have emb(sub(x)) ( emb(y). Also, using Eq. (8), we have that
emb(y) is one of the superiors of emb(x). Ultimately, from Eq. (7) we have
emb(x) ( emb(y).
The projection function prj from U to D is also defined using continuous ex-
tension. The action of prj on basis elements is specified by the following recursive
definition:
emb −1 (a) if ∃x. emb(x) = a
prj (a) = (9)
prj (subordinate(a)) otherwise
To ensure that prj is well-defined, there are a couple of things to check. First
of all, the recursion always terminates: In the worst case, repeatedly taking the
subordinate of any starting value will eventually yield ⊥, at which point the first
branch will be taken since emb(⊥) = ⊥. Secondly, note that emb −1 is uniquely
defined, because emb is injective. Injectivity of emb is easy to prove, since each
embedded value has a different serial number.
Just like with emb, we also need to prove that the basis projection function
prj is monotone. That is, we must show that for any a and b in the basis of
U , a ( b implies prj (a) ' prj (b). Remember that the basis preorder (() is
an inductively defined relation; accordingly, the proof proceeds by induction on
a ( b. Compared to the proof of monotonicity for emb, the proof for prj is
relatively straightforward; details are omitted here.
Finally, we must prove that emb and prj form an ep-pair. The proof of prj ◦
emb = IdD is easy: Let x be any value in the basis of D. Then using Eq. (9), we
have prj (emb(x)) = emb −1 (emb(x)) = x. Since this equation is an admissible
predicate on x, proving it for compact x is sufficient to show that it holds for all
values in the ideal completion.
The proof of emb ◦ prj ' IdU takes a bit more work. As a lemma, we can show
that for any a in the basis of U , prj (a) is always equal to emb −1 (b) for some
272 B. Huffman

b ( a that is in the range of emb. Using this lemma, we then have emb(prj (a)) =
emb(emb −1 (b)) = b ( a. Finally, using admissibility, this is sufficient to show
that emb(prj (a)) ' a for all a in U .
To summarize the results of this section: We have formalized a type U , and
two polymorphic continuous functions emb and prj. For any omega-bifinite do-
main D, emb and prj form an ep-pair that embeds D into U . The full proof
scripts are available as part of the distribution of Isabelle2009, in the theory file
src/HOLCF/Universal.thy.

4 Algebraic Deﬂations and Deﬂation Combinators

To represent types, we need a type T consisting of all the algebraic deflations
over U , i.e. deflations whose image sets are omega-bifinite cpos. In HOLCF,
the algebraic deflations are defined using ideal completion from the set of finite
deflations, which have finite image sets. Note that as an ideal completion, T
is itself a bifinite cpo; this is important because it lets us use a fixed-point
combinator to define recursive values of type T , representing recursive types.
For each of the basic type constructors in HOLCF, we can define a deflation
combinator as a continuous function over type T . Using continuous extension, we
start by defining the combinators on finite deflations. Below are the definitions
for product and continuous function space: If D and E are finite deflations on
type U , then so are D ×F E and D →F E.

(D ×F E)(x) = case prj (x) of (a, b) → emb(D(a), E(b)) (10)

(D →F E)(x) = emb(E ◦ prj (x) ◦ D) (11)

Next, we can deﬁne combinators (×T ) and (→T ) of type T → T → T as con-

tinuous extensions of (×F ) and (→F ). Combinators (⊕T ) and (⊗T ) for strict
sums and products are defined similarly. Values unitT , boolT , intT :: T can also
be defined to represent basic types.
The deflation combinators, together with a least fixed-point operator, can be
used to define deflations for recursive types. Below are the definitions for the
Cont and Resumption datatypes from the introduction:

ContT (R, A) = ((A →T R) →T R) (12)

ResumptionT (R, A) = μD. A ⊕T ContT (R, D) (13)

As a recursive datatype, ResumptionT uses the fixed point operator in its def-
inition. Also note that the definition of ResumptionT refers to ContT on the
right-hand side—since ContT is a continuous function, it may be used freely
within other recursive definitions. Thus it is not necessary to transform indirect
recursion into mutual recursion, like the Isabelle datatype package does.
Once the deflations have been constructed, the actual Cont and Resumption
types can be defined using the image sets of their respective deflations. That
the Resumption type satisfies the appropriate domain isomorphism follows from
the fixed-point definition. Also, a simple induction principle (a form of take
A Purely Definitional Universal Domain 273

induction, like what would be axiomatized by the current domain package) can
be derived from the fact that ResumptionT is a least fixed-point.
Finally, the simple take induction rule can be used to derive the higher-level
induction rule shown in Eq. (1). The appearance of mapCont is due to the fact
that (modulo some coercions to and from U ) it coincides with the deflation
combinator (λD. ContT (R, D)). (This is similar to how the function map doubles
as the deflation combinator for lists.)

5 Related Work
An early example of the purely definitional approach to defining datatypes is
described by Melham, in the context of the HOL theorem prover [12]. Melham
defines a type (α)Tree of labelled trees, from which other recursive types are
defined as subsets. The design is similar in spirit to the one presented in this
paper—types are modeled as values, and abstract axioms that characterize each
datatype are proved as theorems. The main differences are that it uses ordinary
types instead of bifinite domains, and ordinary subsets instead of deflations.
The Isabelle/HOL datatype package uses a design very similar to the HOL
system. The type α node, which was originally used for defining recursive types
in Isabelle/HOL, was introduced by Paulson [14]; it is quite similar to the HOL
system’s (α)Tree type. Gunter later extended the labelled tree type of HOL to
support datatypes with arbitrary branching [9]. Berghofer and Wenzel used a
similarly extended type to implement Isabelle’s modern datatype package [4].
Agerholm used a variation of Melham’s labelled trees to define lazy lists and
other recursive domains in the HOL-CPO system [1]. Agerholm’s cpo of infinite
trees can represent arbitrary polynomial datatypes as subsets; however, negative
recursion is not supported.
Recent work by Benton, et al. uses the colimit construction to define recursive
domains in Coq [3]. Like the universal domain described in this paper, their
technique can handle both positive and negative recursion. Using colimits avoids
the need for a universal domain, but it requires a logic with dependent types;
the construction will not work in ordinary higher-order logic.
On the theoretical side, various publications by Gunter [6,7,8] were the pri-
mary sources of ideas for my universal domain construction. The construction
of the sequence of increments in Section 3 is just as described by Gunter [7, §5].
However, the use of ideal completion is original—Gunter defines the universal
domain using a colimit construction instead. Given a cpo D, Gunter defines a
type D+ that can embed any increment from D to D . The universal domain is
then defined as a solution to the domain equation D = D+ . The construction
of D+ is similar to my Basis datatype, except that it is non-recursive and does
not include serial numbers.

6 Conclusion and Future Work

The Isabelle/HOLCF library of domain theory now has all the basic infrastruc-
ture needed for deﬁning general recursive datatypes without introducing axioms.
274 B. Huﬀman

It provides a universal domain type U , into which any omega-bifinite domain can
be embedded. It also provides a type T of algebraic deflations, which represent
bifinite domains as values. Both are included as part of the Isabelle2009 release.
While the underlying theory is complete, the automation is not yet finished.
The first area of future work is to connect the new theories to the existing domain
package, so that instead of axiomatizing the type isomorphism and induction
rules, the domain package can prove them from the fixed-point definitions.
The domain package will also need to be extended with automation for
indirect-recursive datatypes. Such datatypes may have various possible induc-
tion rules, so this will require some design decisions about how to formulate the
rules, in addition to work on automating the proofs.
Other future directions explore limitations in the current design:
– Higher-order type constructors. Higher-order types can be represented by
deflation combinators with types like (T → T ) → T . The problem is that
Isabelle’s type system only supports first-order types. Although, see [11] for
an admittedly complicated workaround.
– Non-regular (nested) datatypes [5]. Deflation combinators for non-regular
datatypes can be defined by taking least fixed points at type T → T , rather
than type T . However, since Isabelle does not support type quantification or
polymorphic recursion, induction rules and recursive functions could not be
defined in the normal way.
– Higher-rank polymorphism. This is not supported by Isabelle’s type system.
However, the universal domain U could be used to model such types, using
the construction described by Amadio and Curien [2].
– Generalized abstract datatypes (GADTs). These are usually modeled in terms
of some kind of type equality constraints. For example, type equality con-
straints are a central feature of System FC [16], a compiler intermediate
language used to represent Haskell programs. But to the extent of this au-
thor’s knowledge, there is no way to model type equality constraints using
deflations.

Acknowledgments. I would like to thank my advisor, John Matthews, for many

encouraging discussions about HOLCF and domain theory, and also for suggest-
ing the example used in the introduction. Thanks also to James Hook for reading
drafts and providing helpful comments.

References
1. Agerholm, S.: A HOL Basis for Reasoning about Functional Programs. PhD thesis,
University of Aarhus (1994)
2. Amadio, R.M., Curien, P.-L.: Domains and Lambda-Calculi. Cambridge University
Press, New York (1998)
3. Benton, N., Kennedy, A., Varming, C.: Some domain theory and denotational
semantics in Coq. In: Proc. 22nd International Conference on Theorem Proving
in Higher Order Logics (TPHOLs 2009). LNCS, vol. 5674. Springer, Heidelberg
(2009)
A Purely Deﬁnitional Universal Domain 275

4. Berghofer, S., Wenzel, M.: Inductive datatypes in HOL - lessons learned in formal-
logic engineering. In: Bertot, Y., Dowek, G., Hirschowitz, A., Paulin, C., Théry, L.
(eds.) TPHOLs 1999. LNCS, vol. 1690, pp. 19–36. Springer, Heidelberg (1999)
5. Bird, R.S., Meertens, L.G.L.T.: Nested datatypes. In: Jeuring, J. (ed.) MPC 1998.
LNCS, vol. 1422, pp. 52–67. Springer, Heidelberg (1998)
6. Gunter, C.: Profinite Solutions for Recursive Domain Equations. PhD thesis, Uni-
versity of Wisconsin at Madison (1985)
7. Gunter, C.A.: Universal profinite domains. Information and Computation 72(1),
1–30 (1987)
8. Gunter, C.A.: Semantics of Programming Languages: Structures and Techniques.
In: Foundations of Computing. MIT Press, Cambridge (1992)
9. Gunter, E.L.: A broader class of trees for recursive type definitions for HOL. In:
Joyce, J.J., Seger, C.-J.H. (eds.) HUG 1993. LNCS, vol. 780, pp. 141–154. Springer,
Heidelberg (1994)
10. Huffman, B.: Reasoning with powerdomains in Isabelle/HOLCF. In: Mohamed,
O.A., Muñoz, C., Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 45–56.
Springer, Heidelberg (2008)
11. Huffman, B., Matthews, J., White, P.: Axiomatic constructor classes in Is-
abelle/HOLCF. In: Hurd, J., Melham, T. (eds.) TPHOLs 2005. LNCS, vol. 3603,
pp. 147–162. Springer, Heidelberg (2005)
12. Melham, T.F.: Automating recursive type definitions in higher order logic. In:
Current Trends in Hardware Verification and Automated Theorem Proving, pp.
341–386. Springer, Heidelberg (1989)
13. Müller, O., Nipkow, T., von Oheimb, D., Slotosch, O.: HOLCF = HOL + LCF.
Journal of Functional Programming 9, 191–223 (1999)
14. Paulson, L.C.: Mechanizing coinduction and corecursion in higher-order logic. Jour-
nal of Logic and Computation 7 (1997)
15. Plotkin, G.D.: A powerdomain construction. SIAM J. Comput. 5(3), 452–487
(1976)
16. Sulzmann, M., Chakravarty, M.M.T., Jones, S.P., Donnelly, K.: System F with
type equality coercions. In: TLDI 2007: Proceedings of the 2007 ACM SIGPLAN
international workshop on Types in languages design and implementation, pp. 53–
66. ACM, New York (2007)
Types, Maps and Separation Logic

Rafal Kolanski and Gerwin Klein

Sydney Research Lab., NICTA , Australia

School of Computer Science and Engineering, UNSW, Sydney, Australia
{rafal.kolanski,gerwin.klein}@nicta.com.au

Abstract. This paper presents a separation-logic framework for reason-

ing about low-level C code in the presence of virtual memory. We describe
our abstract, generic Isabelle/HOL framework for reasoning about vir-
tual memory in separation logic, and we instantiate this framework to
a precise, formal model of ARMv6 page tables. The logic supports the
usual separation logic rules, including the frame rule, and extends sepa-
ration logic with additional basic predicates for mapping virtual to phys-
ical addresses. We build on earlier work to parse potentially type-unsafe,
system-level C code directly into Isabelle/HOL and further instantiate
the separation logic framework to C.

1 Introduction

Virtual memory is a mechanism in modern computing systems that usual pro-

gramming language semantics gloss over. For the application level, the operating
system (OS) is expected to provide an abstraction of plain memory and details
like page faults are handled behind the scenes. While, strictly speaking, the pres-
ence of virtual memory is still observable via sharing, ignoring virtual memory
is therefore defendable for the application level.
For verifying lower-level software such as the operating system itself or soft-
ware for embedded devices without a complex OS layer, this is no longer true.
On this layer, virtual memory plays a prominent and directly observable role. It
is also the source of many defects that are frequently very frustrating to debug.
A wrong, unexpected mapping from virtual to physical addresses in the machine
can lead to garbled, unrecognisable data at a much later, seemingly unrelated
position in the code. A wrong, non-existing mapping will lead to a page fault: if
the machine attempts to read a code instruction or a data value from a virtual
address without valid mapping, on most architectures, a hardware exception is
raised and execution branches to the address of a registered page fault handler
(which often is virtually addressed itself). Defects in the page fault handler may
lead to even more obscure, non-local symptoms. The situation is complicated
by the fact that these virtual-to-physical mappings are themselves encoded in

NICTA is funded by the Australian Government as represented by the Department of
Broadband, Communications and the Digital Economy and the Australian Research
Council through the ICT Centre of Excellence program.

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 276–292, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Types, Maps and Separation Logic 277

memory, usually in a hardware-defined page table structure, and they are often
manipulated through the virtual memory layer.
As an example, the completion of the very first C implementation (at the time
untried and unverified) of the formally verified seL4 microkernel [8] in our group
was celebrated by loading the code onto our ARMv61 development board and
starting the boot process to generate a hello-world message. Quite expectedly,
nothing at all happened. The board was unresponsive and no debug information
was forthcoming. It took 3 weeks to write the C implementation following a
precise specification. It took 5 weeks debugging to get it running. It turned out
that the boot code had not set up the initial page table correctly, and since no
page fault handler was installed, the machine just kept faulting. This was the
first of a number of virtual-memory related bugs. What is worse, our verification
framework for C would, at the time, not have caught any of these bugs. We have
since explictly added the appropriate virtual memory proof obligations. They
are derived, in part, from the work presented in this paper.
We present a framework in Isabelle/HOL for the verification of low-level C
code with separation logic in the presence of virtual memory. The framework
itself is abstract and generic. In earlier work [16], we described a preliminary
version of it, instantiated to a hypothetical simple page table and a toy lan-
guage. In that work we concentrated on showing that the logic of the framework
is indeed an instance of abstract separation logic [5] and that it supports the
usual separation logic reasoning, including the frame rule. Here, we concentrate
on making the framework applicable to the verification of real C code. We have
instantiated the framework to the high-fidelity memory model for C by Tuch et
al [24] and connected it with the same C-parsing infrastructure for Isabelle/HOL
that was used there. On the hardware side, we have instantiated the framework
to a detailed and precise model of ARMv6 2-level hardware page tables. To
our knowledge, this is the first formalisation of the ARMv6 memory translation
mechanism. The resulting instantiation is a foundational, yet practical verifica-
tion framework for a large subset of standard C99 [13] with the ability to reason
about the effects of virtual memory when necessary and the ability to reason
abstractly in the traditional separation logic style when virtual memory is not
the focus.
The separation logic layer of the framework makes three additional basic pred-
icates available: mapping from a virtual address to a value, mapping from a phys-
ical address to a value, and mapping from a virtual to a physical address. For
the user of the framework, these integrate seamlessly with other separation logic
formulae and they support all expected, traditional reasoning principles. Inside
the framework, we invest significant effort to provide this nice abstraction, to
support the frame rule, and to shield the verification user from the considerable
complexity of the hardware page table layout in a modern architecture.
Our envisaged application area for this framework is low-level OS kernel code
that manipulates page tables and user-level page fault handlers in microkernel

1
The ARMv6 is a popular processor architecture for embedded systems, such as the
iPhone or Android.
278 R. Kolanski and G. Klein

systems. To stay in the same, foundational framework, it can also be used for
the remaining OS kernel without any significant reasoning overhead in a sepa-
ration logic setting. Our direct application area is the verification of the seL4
microkernel [8].
The remainder of this paper is structured as follows. After introducing nota-
tion in Sect. 2, we describe in Sect. 3 an abstract type class for encoding arbitrary
C types in memory. Sect. 4 describes our abstract, generic page table framework
and Sect. 5 instantiates this to ARMv6. Sect. 6 integrates virtual memory into
our abstract separation logic framework, first at the byte level, and then at the
structured types level. Sect. 7 makes the connection to C, and, finally, Sect. 8
discusses how translation caching mechanisms can be integrated into the model.

2 Notation

This section introduces Isabelle/HOL syntax where diﬀerent from standard

mathematical notation.
The space of total functions is denoted by ⇒. Type variables are written a,

b, etc. The notation t :: τ means that HOL term t has HOL type τ .
Pairs come with the two projection functions fst :: a × b ⇒ a and snd ::

a × b ⇒ b. Sets (type a set) follow the usual mathematical convention. Lists
support the empty list [] and cons, written x ·xs. The list of natural numbers
from a to (excluding) b is [a..<b]. We also use the standard zip and map from
functional programming. The option type
datatype a option = None | Some a

adjoins a new element None to a type a. We use a option to model partial
functions, writing *a+ instead of Some a and a b instead of a ⇒ b option.
The Some constructor has an underspeciﬁed inverse called the, satisfying the *x +
= x. Lifting functions to the option type is achieved by
option-map = (λf y. case y of None ⇒ None | !x " ⇒ !f x ")
Function update is written f (x := y) where f :: a ⇒ b, x :: a and y :: b and
f (x → y) stands for f (x := Some y). Finite integers are represented by the
type a word where a determines the word length in bits. The type supports
the usual bit operations like left-shift (<<) and bitwise and (&&). The function
unat converts to natural numbers (u for unsigned). Separation logic uses the
concepts of disjoined maps ⊥ and map addition ++. They are deﬁned below.
m 1 ⊥ m 2 ≡ dom m 1 ∩ dom m 2 = ∅
m 1 ++ m 2 ≡ λx . case m 2 x of None ⇒ m 1 x | !y" ⇒ !y"

3 Types and Value Storage

Our aim of reasoning about C programs requires a representation of the storage

of C values in memory. Similarly to Tuch et al [24], we deﬁne a mem-type type
Types, Maps and Separation Logic 279

class to represent these types. This section describes the abstract operations of
this class and its axioms. The ﬁrst such operations are serialising and restoring
a value into and from bytes:
to-bytes :: t::mem-type ⇒ byte list from-bytes (to-bytes v ) = v
from-bytes :: byte list ⇒ t::mem-type
For a particular type, all values occupy the same, non-zero number of bytes in
memory. We will refer to the number of these bytes as the size. The length of a
type’s serialisation is equal to its size. The term TYPE( t ) of type t itself makes
an Isabelle type avaiable as term.
size-of :: t::mem-type itself ⇒ nat length (to-bytes v ) = size-of TYPE( t)
0 < size-of TYPE( t)
For treating types as ﬁrst-class values, we require each to map to a unique tag:
type-tag :: t::mem-type itself ⇒ type-tag
In order to respect the alignment requirements of C types, mem-type instances
carry alignment information. Types may only be aligned to sizes which are di-
visors of both the physical and virtual address space sizes:
align-of :: t::mem-type itself ⇒ nat
align-of TYPE( a) dvd memory-size ∧ align-of TYPE( a) dvd addr-space-size
The model we present in this paper allows representation of all packed C types,
i.e. atomic types such as int, array, and structs without padding. Tuch’s work on
structured C types [23] demonstrates how to extend this model to allow padding.

4 Virtual Memory

This section deﬁnes addressing and pointer conventions and describes our ab-
stract interface to page table encodings.

4.1 Pointers and Addressing

Virtual memory is an abstraction layer on top of the physical memory in a

machine. Each executing process gets its own view of physical memory, wherein
each virtual address may be mapped to a physical address. We will henceforth
refer to the function translating virtual addresses to physical ones as the virtual
map and the application of the virtual map to a virtual address as a lookup.
The virtual map is partial and many-to-one — updates at one virtual address
may aﬀect values appearing at another. As in our previous work [16] memory
is a partial function. Unlike our previous work [16], the work presented here is
a realistic representation of physical memory and maps physical addresses to
bytes. The virtual map is encoded in memory in a structure called a page table.
Programs usually only have access to the virtual address layer, but devices may
access physical memory directly. We deﬁne addresses as:
datatype ( a, p) addr-t = Addr of a
280 R. Kolanski and G. Klein

where a is the underlying address size (e.g. 32 word for 32-bit) and p is a tag:
one of physical or virtual. For particular architectures, we instantiate addr-t into
speciﬁc virtual and physical addresses. For the ARMv6 both virtual and physical
addresses are 32-bit words, yielding the instantiations:
vaddr = (32 word, virtual) addr-t paddr = (32 word, physical) addr-t
ARMv6 is capable of natively addressing 8, 16 and 32 bit values in memory
(corresponding to char, short and int in C). We have shown that these are
instances of mem-type. We use addr-val (Addr a) = a to extract the address.

4.2 Page Table

We now introduce our abstract interface to page table encodings. There are many
such possible encodings: one-level tables, fixed multi-level tables, variable-depth
guarded page tables or even just hash tables. Usually, mappings are encoded in
blocks of addresses (pages, superpages, etc.), which are hardware-defined. The
page table also encodes extra information such as permissions and hardware-
defined flags. We generalise our previous abstract page table interface [16] slightly
to accomodate multiple page sizes and briefly summarise the other definitions.
ptable-lift :: ( paddr val ) ⇒ base ⇒ vaddr paddr
ptable-trace :: ( paddr val ) ⇒ base ⇒ vaddr ⇒ paddr set
get-page :: ( paddr val ) ⇒ base ⇒ vaddr ⇒ a
We use ptable-lift to extract a virtual map from memory, ptable-trace to find
all the physical addresses used looking up a virtual to a physical address, and
get-page to find which page a virtual address is on including any machine-specific
flags (such as permissions) that might be attached to it. The types paddr and

vaddr represent physical and virtual pointers, while base says where we can
find the page table in physical memory (e.g. the root of a two-level page table).
We leave a for a generic representation of what a page is.
In order to reason about memory access in the presence of a page table, we
require page table functions to conform to the rules in Fig. 1. Firstly, changing
memory in areas not related to a page table lookup must not affect the lookup:
if evaluation of ptable-lift and ptable-trace succeeds on smaller heap , it will also
succeed on a larger one. This corresponds to the safety monotonicity property of

ptable-lift h 0 r vp = p h0 ⊥ h1 ptable-lift h r vp = p h ⊥ h

ptable-lift (h 0 ++ h 1 ) r vp = p ptable-trace (h ++ h ) r vp = ptable-trace h r vp

p ∈
/ ptable-trace h r vp ptable-lift h r vp = p
ptable-trace (h(p → v )) r vp = ptable-trace h r vp

p ∈
/ ptable-trace h r vp ptable-lift h r vp = p
ptable-lift (h(p → v )) r vp = p

ptable-lift (h 0 ++ h 1 ) r vp = p h0 ⊥ h1
ptable-lift h 0 r vp = p ∨ ptable-lift h 0 r vp = None

Fig. 1. Abstract page table interface

Types, Maps and Separation Logic 281

separation logic [5]. Furthermore, a successful lookup must be unaﬀected by any

heap updates outside that lookup’s trace. Finally, corresponding to the frame
monotonicity property [5] of separation logic, removal of information from the
heap must either not aﬀect ptable-lift or cause it to fail. Heap reduction must
not return a diﬀerent successful result.

5 A Formal Model of ARMv6 Page Tables

In this section, we instantiate the abstract interface described above to ARMv6
2-level page tables. We support multiple page sizes, but we omit handling of
permissions — in our seL4 target setup, the ARM supervisor mode ignores per-
missions. Adding them would be simple.
Following ARM nomen-
clature [3], the first level
table is called the page
directory and the second
level the page table. In-
dividual entries at these
levels are called PDEs
and PTEs respectively, 32
bits wide in both cases.
There is one page di-
rectory with potentially Fig. 2. ARMv6 page table lookup for SmallPage
many page tables. The
base of the entire structure is the physical address of the page directory. Our
model uses the common ARMv6 page table format where subpages are disabled.
In this mode, the hardware supports mappings in four granularities: small (4Kb)
and large (64Kb) pages, as well as sections (1Mb) and supersections (16Mb):
datatype page-size = SmallPage | LargePage | Section | SuperSection
Apart from invalid/reserved, a PDE either encodes the physical base address
of a section, supersection or a second-level table. Within a second-level table, a
valid PTE encodes the physical base address of a large or small page:
datatype pde = InvalidPDE | ReservedPDE | PageTablePDE of paddr
| SectionPDE of paddr | SuperSectionPDE of paddr
datatype pte = InvalidPTE | LargePagePTE of paddr | SmallPagePTE of paddr
The idea of looking up a virtual address is shown in Fig. 2: figure out the base
address of the appropriate structure and its size, then add the virtual address
divided by that size. The get-frame function calculates the base and size:
get-frame :: heap ⇒ paddr ⇒ vaddr (paddr × page-size)
get-frame h root vp ≡
let vp-val = addr-val vp; pd-idx-offset = vp-val >> 20 << 2
in case decode-pde h (root + pd-idx-offset) of None ⇒ None
| !PageTablePDE pt-base" ⇒ get-frame-2nd h pt-base vp
| !SectionPDE base" ⇒ !(base, Section)"
| !SuperSectionPDE base" ⇒ !(base, SuperSection)" | !-" ⇒ None
282 R. Kolanski and G. Klein

The function works by looking up a virtual address just like the ARM hardware.
First, we look at the top 12 bits of the address as an index into the page directory.
We then shift the index by 2 as each PDE is 4 bytes in size, add it to the base
address of the page directory (root ). We decode the PDE at this address to
decide what to do next: fail on invalid/reserved, pass through the base address
for sections/supersections, and go look in the second-level table in the case of a
PTE pointer. We omit the deﬁnitions of decode-pde and decode-pte; they work
as described in the ARMv6 manual [3]. Second-level lookup is deﬁned similarly:

get-frame-2nd :: heap ⇒ paddr ⇒ vaddr (paddr × page-size)

get-frame-2nd h pt-base vp ≡
let vp-val = addr-val vp; pt-idx-oﬀset = (vp-val >> 12) && 0xFF << 2
in case decode-pte h (pt-base + pt-idx-oﬀset) of None ⇒ None
| !InvalidPTE" ⇒ None | !LargePagePTE base" ⇒ !(base, LargePage)"
| !SmallPagePTE base" ⇒ !(base, SmallPage)"

Starting at the physical address of the second-level table, we use the next 8 bits
of the virtual address (bits 12-19) as an index, decode the PTE there, fail on
invalid or return the base address of the frame along with its size.
Using get-frame, we can then implement the main lookup function ptable-lift
by masking out the appropriate bits from the virtual address and adding them
to the physical address of the frame:

vaddr-oﬀset p w ≡ w && mask (page-size-bits p)

ptable-lift h pt-root vp ≡
let vp-val = addr-val vp
in option-map (λ(base, pg-size). base + vaddr-oﬀset pg-size vp-val )
(get-frame h pt-root vp)

where page-size-bits is log2 of the page size.

Similarly, we can get the page a virtual address is on by masking out the
oﬀset bits. Also, since ARM allows multiple page sizes, the concept of a page
must involve its size, instantiating the page type a to (vaddr × page-size) option:

addr-base sz w ≡ w && (0xFFFFFFFF << page-size-bits sz )

get-page h root vp ≡
let vp-val = addr-val vp
in option-map (λ(base, pg-size). (Addr (addr-base pg-size vp-val ), pg-size))
(get-frame h root vp)

We deﬁne a sequence of n addresses starting at p as:

addr-seq p 0 = []
addr-seq p (Suc n) = p·addr-seq (p + 1) n

The ﬁnal function needed to instantiate the abstract page table model from
Sect. 4 is ptable-trace. The trace contains the bytes in any page directory or
table entry which has successfully contributed to looking up the virtual address:
Types, Maps and Separation Logic 283

ptable-trace h root vp ≡
let vp-val = addr-val vp; pd-idx-offset = vaddr-pd-index vp-val << 2;
pt-idx-offset = vaddr-pt-index vp-val << 2;
pd-touched = set (addr-seq (root + pd-idx-offset) 4);
pt-touched = λpt-base. set (addr-seq (pt-base + pt-idx-offset) 4)
in case decode-pde h (root + pd-idx-offset) of None ⇒ ∅
| !PageTablePDE pt-base" ⇒ pd-touched ∪ pt-touched pt-base
| !-" ⇒ pd-touched
We have proved that the ptable-lift, ptable-trace and get-page functions in this sec-
tion instantiate the abstract model from Sect. 4, including the axioms of Fig. 1.

6 Typed, Mapped Separation Logic

Based on our abstact page table interface of Sect. 4, we can now construct a
separation logic framework for reasoning about pointer programs with types.
This framework is independent of the particular page table instantiation.
Separation logic [18] is a tool for conventiently reasoning about memory and
aliasing. It views memory as a partial heap from addresses to values, allowing
for predicates which precisely state which part of the heap they hold on. At
its core is the concept of separating conjunction: when the assertion P ∧∗ Q
holds on a heap, the heap can be split into two disjoint parts, where P holds
on one part and Q on the other. Predicates which precisely define the domain
of the heap they hold on allow for convenient local reasoning. This leads to the
concept of local actions and the frame rule: for an action f , we can conclude
{P ∧∗ R} f {Q ∧∗ R} from {P } f {Q} for any R. This expresses that the actions
of f are local to the heaps described by P and Q, and therefore cannot affect
any separate heap described by R. We also say that predicates consume parts of
the heap under separating conjunction, because other predicates cannot depend
on the same parts of this heap.
The basic assertion of separation logic is the maps-to arrow, holding on a
heap containing only one address-value pair. From this simple assertion, more
complex ones can be built. For a simple heap (paddr byte) it takes the form:
(address → value) h ≡ h address = value ∧ dom h = {address}
Under separating conjunc-
tion, it consumes address in
the heap. Tuch et al extend
this basic concept all the way
to reasoning about C code
with structures [23].
A naive addition of virtual
Fig. 3. The three maps-to assertions memory to separation logic
breaks the concept of separat-
ing conjunction, the frame rule, as well as the assumption of Tuch’s work of val-
ues being stored contiguously in the heap. In previous work [16], we addressed
the first two in a simplified setting. In this section, we solve them in a realistic
284 R. Kolanski and G. Klein

setting and extend them to reasoning about typed pointers. We introduce new
maps-to arrows, as well as a new, more complex state that we use instead of a
simple heap.
Our eventual goal is to be able to write the new arrows of Fig. 3 with physical
or virtual addresses on the left and complex, typed C values on the right. The
new arrows in Fig. 3 describe (from left to right): mappings from physical address
to value, from virtual to physical address, and from virtual address to value. The
next section will introduce arrows that allow raw, single bytes and explicit type
information on the right. The section after that will lift this information to allow
structured C types on the right.

6.1 At the Bytes Level

Following Tuch et al and our own previous work, to support both types and
virtual memory, we annotate the heap with extra information, extending the
state for our assertions in a first step to:
(paddr type-data × byte) × ptable-base
where ptable-base is any extra information needed by the virtual memory sub-
system, such as the page table root (paddr in the case of ARMv6); type-data
annotates which higher-level type a byte is part of. On this level it is just passed
through, we will explain its purpose in Sect. 6.2.
For our maps-to assertions
to be useful in separation
logic, we must define which
parts of the heap they con-
sume (what their domain is).
Here we run into a problem,
illustrated in Fig. 4: two dis-
tinct virtual addresses map to
two values via distinct phys-
ical addresses, but using the
same page table entry for the Fig. 4. Two virtual addresses resolving through the
same page table entry
lookup. Writing to one vir-
tual pointer does not affect
the value at the other, so in this sense the two maps-to predicates are sepa-
rate. However, a single page table entry is involved in the lookup of both virtual
pointers. Under separating conjunction we can allow the entry to be consumed
by either mapping or neither mapping, but not both mappings. If one consumes
it, the other lacks information for a successful lookup. If neither consumes it,
we lose locality: we could state the entry is separate from both mappings even
though updating the entry can affect both virtual addresses!
The solution to this problem is to divide the page table entry up into two parts
and share the slices between the maps-to predicates involved in the separating
conjunction. This idea is similar to that of the fractional permission model of
Bornat [4], with three important differences. Firstly, we do not wish to perform
Types, Maps and Separation Logic 285

any explicit accounting of fractions in the most common case of the page table
not being modified. Secondly, the number of virtual addresses an entry can map
varies with the type of page table and the size of the mapped page. Thirdly,
we want to utilise rather than recreate the proofs about partial maps and map
disjunction in Isabelle/HOL. These issues are addressed by using a constant,
large-enough number of slices for entries in the heap and placing them in the
domain. The maximum useful number of slices is one entry mapping all virtual
addresses. Thus our final state for assertions is:
fheap-state = (paddr × vaddr type-data × byte) × ptable-base
We refer to the first component of this state as the typed, fragmented heap tfh.
With this new state, our physical memory maps-to predicate becomes:
p :→p v ≡ λ(h, r ). (∀ vp. h (p, vp) = !v ") ∧ dom h = {p} × U
Like the simple maps-to predicate shown earlier, the heap at address p evaluates
to value v. In the new state, it does so for all vp slices. The domain covers all
slices of p, i.e. the universal set U. This arrow works for the physical-to-value
level. To define the virtual-to-physical arrow, we use our abstract page table
interface. Unfortunately, this page table model knows nothing about slices and
type annotations. So, to perform a lookup on vp, we derive a view of the heap
tfh containing only slices associated with vp and discard type annotations:
h-view tfh vp ≡ option-map snd ◦ (λp. tfh (p, vp))
We can now define the virtual-to-physical arrow for mapping vp to p. It is just
a ptable-lift on a heap made of slices associated with vp. The assertion consumes
the vp slice of each byte used in its lookup, i.e. in ptable-trace:
vp :→v p ≡ λ(h,r ). let heap = h-view h vp; vmap = ptable-lift heap r
in vmap vp = !p" ∧ dom h = ptable-trace heap r vp × {vp}
The virtual-to-value mapping is then just the separating conjunction of virtual-
to-physical and physical-to-value.
vp :→ v ≡ λs. ∃ p. (vp :→v p ∧∗ p :→p v ) s
P ∧∗ Q ≡ λ(h, r ). ∃ h 0 h 1 . h 0 ⊥ h 1 ∧ h = h 0 ++ h 1 ∧ P (h 0 , r ) ∧ Q (h 1 , r )
For any of these levels, we can define the usual arrow variations [18]:
(p :→ –) s ≡ ∃ v . (p :→ v ) s (p :→ v ) s ≡ (p :→ v ∧∗ sep-true) s
(p :→ –) s ≡ ∃ v . (p :→ v ∧∗ sep-true) s sep-true ≡ λs. True
One property of this framework is that it is mostly independent of the value
space, the right-hand side of the maps-to arrows. Only in the interface to the page
table have we touched it at all, and then only to discard additional type informa-
tion. The basic assertions we get from this section are of the form vp :→ (b, t )
where b is the byte at virtual address vp, and t is the associated type annotation.

6.2 At the Types Level

This section uses the arrows for bytes and type information we have just deﬁned
to higher-level, typed assertions for any mem-type values. We deﬁne the concept
286 R. Kolanski and G. Klein

of pointers to typed values by wrapping our existing concept of addresses and

adding a phantom type, like Tuch et al [24]:
datatype ( a, p, t) ptr-t = Ptr of ( a, p) addr-t

Instantiated to the ARMv6:

t pptr = (32 word, physical, t) ptr-t
t vptr = (32 word, virtual, t) ptr-t

Like Tuch et al we mark locations belonging to mem-type values in the heap

with a type tag. The addition of virtual memory creates a new complication:
if a value crosses a page boundary in virtual memory, it is not guaranteed to
be contiguous at the physical level, nor even entirely loaded into memory. This
means we must not only tag each byte in the heap, but also note which oﬀset
it is within the larger structure it belongs to. Our type information associated
with each byte is:
type-data = type-tag × nat

We implement maps-to predicates at the typed level as a sequence of byte-level

maps-to predicates, folded over separating conjunction in the usual way. For
instance, we write vps [:→] vs for a sequence vps of virtual pointers mapping to
a sequence of values vs. Note that these values are each of the from (b,t ).
A value of type t ::mem-type seen in memory at either the virtual or physical
level is a sequence of bytes (to-bytes) where each byte is tagged by the type-tag
of t and its oﬀset in the list:
value-seq val ≡
zip (map (λseq. (type-tag TYPE( t), seq)) [0..<size-of TYPE( t)]) (to-bytes val )

We can now deﬁne maps-to predicates on typed pointers. Like Tuch et al [24]
we employ an arbitrary guard on the pointer itself to enforce constraints such as
alignment. We have not found it necessary yet to let the guard depend on the
state, but this could be added easily. Compared to Tuch et al, lifting sequences
of bytes to structured values is much simpler, because we already have byte-level
assertions available. Between virtual and physical levels only the arrows diﬀer.
g p →p v ≡ ptr-seq p TYPE( t) [:→p ] value-seq v !∧" (λs. g p)
g vp →v p ≡ ptr-seq vp TYPE( t) [:→v ] ptr-seq p TYPE( t) !∧" (λs. g vp)
g vp → v ≡ ptr-seq vp TYPE( t) [:→] value-seq v !∧" (λs. g vp)

where ptr-seq p T ≡ addr-seq (ptr-val p) (size-of T ) and addr-seq is deﬁned in

Sect. 4. Using these predicates, we can now make separation logic assertions
describing the presence of typed values on the heap, visible as contiguous in
either physical or virtual memory. In the common case, i.e. when not modifying
the page table, our model keeps the virtual memory mechanism under the hood.
We can just state, for instance, p → (| x = 10; y = 7 |) where the right hand
side is an Isabelle record of class mem-type corresponding to a C struct and the
left hand side is a virtual address.
Types, Maps and Separation Logic 287

7 Connecting with C

In this section, we will connect the framework to C and define loading and storing
of typed values in the program state. In the previous section, we have enriched
the usual C heap with additional information: slices for specifying the domain
of predicates under separating conjunction and type annotation information.
We therefore need to be careful to not introduce unwanted dependencies on the
additional information in the state and we need to make sure that C updates
operate consistently on the extended state. We formalise load and store for vir-
tually addressed access. Direct physical access would be similar, but simpler.
In C, loading and storing are total functions. Loading from a wrongly typed
or unmapped address or storing to it will produce garbage. For our intended
application (seL4), we do not need to model page faults directly, but we anno-
tate the C program with guards that make sure no page faults will occur. These
annotations are added automatically during the translation into Isabelle/HOL
and will produce proof obligations. Should a page-fault model be required for
different applications, it is easy to add: an access to an unmapped page, instead
of a guard, simply produces a branch to the page fault handler.
For a generic map h from pointers p to values, loading a mem-type value at p
is merely loading its size’s worth of sequential bytes starting at p (load-list-basic),
making sure h contains no gaps in that range (deoption-list) and passing it to
from-bytes from the type class interface.
load-list-basic h 0 p = []
load-list-basic h (Suc n) p = h p·load-list-basic h n (p + 1)
deoption-list xs ≡ if None ∈ set xs then None else !map the xs"
load-list h n p ≡ deoption-list (load-list-basic h n p)
load-value h p ≡ option-map from-bytes (load-list h (size-of TYPE( t)) p)
A pointer access in C is then just an application of load-value to the address-
space view of memory, ignoring any read failures. We drop the additional type
information that is only used in assertions, not in C, resulting in the heap type
load-value expects. The as-view function is similar to h-view, but uses ptable-lift
to arrive at a map from virtual addresses to values.
load-value-c s vp ≡ the (load-value (as-view s) (ptr-val vp))
As mentioned above, this function is total. The guard generated for each such
access is c-guard vp !→ –, ensuring that the load-value-c will produce a valid
result. The predicate c-guard p ensures that p is not Null and is correctly aligned
for its type size.
Heap updates are similar. For a single physical address, we update all slices at
that address and we leave the type annotation untouched. We can ignore entries
with None, because, again, the generated guard c-guard vp !→ – will ensure
this case does not occur. We then lift the single-byte update first to the virtual
layer to provide address translation via vmap-view, and then like in Tuch et al
to byte sequences to accomodate structured types.
288 R. Kolanski and G. Klein

void mapUserFrame(pde_t *pd, paddr_t paddr, vptr_t vptr) {

pde_t *pdSlot; pte_t *ptSlot, *pt, pte; unsigned int ptIndex;
pdSlot = lookupPDSlot(pd, vptr);
ptIndex = ((unsigned int)vptr >> ARMSmallPageBits) & MASK(PT_BITS);
pt = ptrFromPAddr(pde_coarse_ptr_get_address(pdSlot));
ptSlot = pt + ptIndex;
pte = pte_small_new(paddr,1,0,0,0,3,1,1,1,0);
*ptSlot = pte;
}

Fig. 5. Page table code extracted from seL4

tfheap-update tfh p v ≡
λppv . if fst ppv = p then option-map (λ(td , v ). (td , v )) (tfh ppv )
else tfh ppv
state-update-v s vp v ≡
case vmap-view s vp of None ⇒ s | !p" ⇒ (tfheap-update (fst s) p v , snd s)
state-update-v-list s [] = s
state-update-v-list s ((vp, v )·us) = state-update-v-list (state-update-v s vp v ) us
c-state-update vp v s ≡ state-update-v-list s (zip (ptr-seq vp TYPE( a1)) (to-bytes v ))
For interfacing to C code, we have adapted the C parser of Tuch et al [24]. It
translates a signiﬁcant subset of the C99 programming language into SIMPL [19],
a generic, imperative language framework in Isabelle/HOL.
As in the framework by Tuch et al we cannot prove the frame rule generically,
but we can prove it automatically for each individual program. This automatic
proof ultimately reduces everything to valid memory accesses and updates, based
on the following rule:
(c-guard vp → – ∧∗ P ) s
∗
(c-guard vp → v ∧ P ) (c-state-update vp v s)
With P = sep-true, this rule becomes the state update rule by Tuch et al, corre-
sponding to the assignment axiom in standard separation logic. The correspond-
ing rule for memory access holds as well, of course:
(g vp → v ) s
load-value-c s vp = v
Fig. 5 shows an excerpt of typical page table manipulation code that this frame-
work can handle. The last line of this code, for instance, would be translated
into the following SIMPL statement with guard:
Guard C-Guard {|c-guard ´ptSlot|}
(´globals :== heap-upd (c-state-update ´ptSlot ´pte))
The heap-upd function updates the C heap (our extended state) which is merely
a global variable in the semantics of the C program. The guard statement Guard
throws the guard error C-Guard if the condition {|c-guard ´ptSlot |} is false, and
otherwise executes the statement. In previous work [16], we have conducted a
detailed case study demonstrating how page table manipulations can be veriﬁed
in this framework for a simple, one-level page table. Reasoning on the C and
ARM level has precisely the same structure, it just involves more detail.
Types, Maps and Separation Logic 289

8 Translation Caching

Page table lookups are expensive; they potentially involve multiple memory
reads. To decrease this cost, these lookups are cached in most architectures
in a translation lookaside buffer (TLB). Abstractly, the TLB can be seen as a
finite, small set of virtual-to-physical mappings. They may include lookups for
code instructions as well as data. It is architecture-dependent whether these are
handled separately from each other or not, how large the TLBs are, and when a
mapping is removed from the TLB and replaced by another. Most architectures
provide assembler instructions for explicitly removing all or specific mappings
from the TLB, which is called flushing.
Although the page table should ultimately define what a mapping is, the hard-
ware will always first consult the TLB and ignore the contents of the page table if
a TLB entry is found. When we change the page table and the TLB contains the
mapping being changed, we may introduce an inconsistency. This inconsistency
can be resolved by flushing the TLB such that the new page table contents will
be loaded for future lookups. However, indiscriminate TLB flushes are expen-
sive, because they will incur additional memory reads. Kernel programmers like
to optimise by deferring TLB flushes as far as possible and by making them as
specific as possibly.
In our model, we can add the TLB by reducing it to its safety-relevant content:
whether the lookup for any specific virtual address may be inconsistent or not.
What makes a TLB entry inconsistent is a change to the page table. We can
turn this view around and instead keep track of inconsistent page table entries
— those that have been written to since the last flush. We can reduce machinery
by not caring whether a memory location currently is a page table entry or not,
we just keep track of all locations that have been changed since the last TLB
flush. If any memory read or write involves a page table entry whose location is
in this set, the TLB might be inconsistent for this lookup. We can now generate
guards that test for this case and require us to prove its absence.
This TLB model intergrates nicely with separating conjunction, because the
set mentioned above can be implemented as an additional boolean next to the
type information on the right-hand side of the maps-to arrow. Apart from the
type, none of the generic framework definitions would need to change.

9 Related Work
Our work touches three main areas: separation logic, virtual memory, and C
veriﬁcation. For an overview on OS veriﬁcation in general, see Klein [14].
Separation logic was originally conceived by O’Hearn and Reynolds et al.
[12,18] and has been formalised in mechanised theorem proving systems be-
fore [25,1]. We enhance these models with the ability to reason about properties
on virtual memory, adding new basic predicates, but preserving the feel and
reasoning principles of separation logic.
290 R. Kolanski and G. Klein

Virtual memory formalisations have appeared in the context of OS kernel

verification before [15,7,11]. Reasoning about programs running under virtual
memory, however, especially the operating systems which control it, remains
mostly unexplored. Among the exceptions is the development of the Nova micro-
hypervisor [20,21]. Like our work, the Nova developers aim to use a single se-
mantics to describe all forms of memory access which simplifies significantly in
the well-behaved case. They focus on reasoning about “plain memory” in which
no virtual aliasing occurs and split it into read-only and read-write regions, to
permit reading the page table while in plain memory. They do not use separation
logic. Our work is more abstract. We do not explicitly define “plain memory”.
Rather the concept emerges from the requirements and state. Tews et al also
include memory-mapped devices. The necessary alignment restrictions would
intergrate seamlessly into our framework via the guard mechanism. Alkassar et
al. [2] have proved the correctness of a kernel page fault handler, albeit not at
the separation logic level. They use a single level page table and prove that the
page fault handler establishes the illusion to the user of a plain memory abstrac-
tion, swapping in pages from disk as required. We instantiate our model to an
extensive, realistic model of ARMv6 2-level page tables. We are not aware of
other realistic formalisations of ARM page tables; Fox [10] formalises the ARM
instruction set, but omits memory translation, while Tews et al [21] formalise
memory translation for IA32.
In the C verification space, we build directly on the work by Tuch et al
[24,22,23] who employ separation logic to reasoning in a precise, foundational
model for C memory with Isabelle/HOL infrastructure to reason about low-level,
potentially type-unsafe C programs nicely and abstractly. This framework which
in turn builds on Schirmer’s SIMPL environment [19] is used in the verification
of the seL4 microkernel [8]. We enhance the fidelity of the framework with a
virtual memory layer for ARMv6 while inheriting its nice type-lifting and rea-
soning principles. Other work in C verification includes Key-C [17], VCC [6], and
Caduceus [9]. Key-C treats only on a type-safe subset of C. VCC, which also
supports concurrency, uses a memory model [6] that axiomatises a weaker ver-
sion of what Tuch proves [23] and what we extend to virtual memory. Caduceus
supports a large subset of C, but does not include virtual memory.

10 Conclusion and Future Work

We have presented an abstract framework for separation logic under virtual

memory and have instantiated it to the C programming language as well as
to ARMv6 page tables. We have shown in previous work that this framework
supports one-level page tables as well as traditional separation logic reasoning,
including the frame rule. We have shown here that the new instantiation sup-
ports the same basic rules for heap updates that Tuch et al provide for their C
verification framework that is used in the verification of the seL4 microkernel.
Next to applying the framework to seL4 page table code in a verification case
study, future work includes an Isabelle/HOL model for the translation caching
Types, Maps and Separation Logic 291

mechanism that is an interesting and correctness-relevant part of most virtual

memory architectures. We have sketched how the mechanism could be added
to the presented model without fundamental changes. We are not aware of any
other virtual memory frameworks that include TLB modelling.
The framework presented here makes the foundational verification of OS-level
C code practical. It brings a significant source of errors into the realm for formal,
machine-checked verification that otherwise formally verified code would ignore
and fail on embarrassingly. Only when reasoning about page table modifications
directly, the complexities of their encoding become visible. For reasoning on plain
memory, no additional verification overhead must be paid.

Acknowledgements. We thank Thomas Sewell for commenting on a draft of this

paper and Michael Norrish for help with integrating the C parser.

References
1. Affeldt, R., Marti, N.: Separation logic in Coq (2008),
http://savannah.nongnu.org/projects/seplog
2. Alkassar, E., Schirmer, N., Starostin, A.: Formal pervasive verification of a pag-
ing mechanism. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS,
vol. 4963, pp. 109–123. Springer, Heidelberg (2008)
3. ARM Limited. ARM Architecture Reference Manual (June 2000)
4. Bornat, R., Calcagno, C., O’Hearn, P., Parkinson, M.: Permission accounting in
separation logic. In: Proc. 32nd POPL, pp. 259–270. ACM, New York (2005)
5. Calcagno, C., O’Hearn, P.W., Yang, H.: Local action and abstract separation logic.
In: Proc. 22nd LICS, pp. 366–378. IEEE Computer Society, Los Alamitos (2007)
6. Cohen, E., Moskal, M., Schulte, W., Tobies, S.: A precise yet efficient memory
model for C (2008),
http://research.microsoft.com/apps/pubs/default.aspx?id=77174
7. Dalinger, I., Hillebrand, M.A., Paul, W.J.: On the verification of memory man-
agement mechanisms. In: Borrione, D., Paul, W.J. (eds.) CHARME 2005. LNCS,
vol. 3725, pp. 301–316. Springer, Heidelberg (2005)
8. Elphinstone, K., Klein, G., Derrin, P., Roscoe, T., Heiser, G.: Towards a practical,
verified kernel. In: Proc. 11th HOTOS, pp. 117–122 (2007)
9. Filliâtre, J.-C., Marché, C.: Multi-prover verification of C programs. In: Davies, J.,
Schulte, W., Barnett, M. (eds.) ICFEM 2004. LNCS, vol. 3308, pp. 15–29. Springer,
Heidelberg (2004)
10. Fox, A.: Formal specification and verification of ARM6. In: Basin, D., Wolff, B.
(eds.) TPHOLs 2003. LNCS, vol. 2758, pp. 25–40. Springer, Heidelberg (2003)
11. Hillebrand, M.: Address Spaces and Virtual Memory: Specification, Implementa-
tion, and Correctness. PhD thesis, Saarland University, Saarbrücken (2005)
12. Ishtiaq, S.S., O’Hearn, P.W.: BI as an assertion language for mutable data struc-
tures. In: Proc. 28th POPL, pp. 14–26. ACM, New York (2001)
13. Programming languages—C, ISO/IEC 9899:1999 (1999)
14. Klein, G.: Operating system verification—An overview. Sādhanā 34(1), 27–69
(2009)
15. Klein, G., Tuch, H.: Towards verified virtual memory in L4. In: Slind, K. (ed.)
TPHOLs Emerging Trends 2004, Park City, Utah, USA (2004)
292 R. Kolanski and G. Klein

16. Kolanski, R., Klein, G.: Mapped separation logic. In: Shankar, N., Woodcock, J.
(eds.) VSTTE 2008. LNCS, vol. 5295, pp. 15–29. Springer, Heidelberg (2008)
17. Mürk, O., Larsson, D., Hähnle, R.: KeY-C: A tool for verification of C programs. In:
Pfenning, F. (ed.) CADE 2007. LNCS, vol. 4603, pp. 385–390. Springer, Heidelberg
(2007)
18. Reynolds, J.C.: Separation logic: A logic for shared mutable data structures. In:
Proc. 17th IEEE Symposium on Logic in Computer Science, pp. 55–74 (2002)
19. Schirmer, N.: Verification of Sequential Imperative Programs in Isabelle/HOL. PhD
thesis, Technische Universität München (2006)
20. Tews, H.: Formal methods in the Robin project: Specification and verification of
the Nova microhypervisor. In: C/C++ Verification Workshop, Technical Report
ICIS-R07015, Oxford, UK, July 2007, pp. 59–68. Radboud University Nijmegen
(2007)
21. Tews, H., Weber, T., Völp, M.: Formal memory models for the verification of low-
level operating-system code. JAR 42(2–4), 189–227 (2009)
22. Tuch, H.: Formal Memory Models for Verifying C Systems Code. PhD thesis, School
Comp. Sci. & Engin., University NSW, Sydney 2052, Australia (August 2008)
23. Tuch, H.: Formal verification of C systems code: Structured types, separation logic
and theorem proving. JAR 42(2–4), 125–187 (2009)
24. Tuch, H., Klein, G., Norrish, M.: Types, bytes, and separation logic. In: Hofmann,
M., Felleisen, M. (eds.) POPL 2007, pp. 97–108. ACM, New York (2007)
25. Weber, T.: Towards mechanized program verification with separation logic. In:
Marcinkowski, J., Tarlecki, A. (eds.) CSL 2004. LNCS, vol. 3210, pp. 250–264.
Springer, Heidelberg (2004)
Acyclic Preferences and
Existence of Sequential Nash Equilibria:
A Formal and Constructive Equivalence

Stéphane Le Roux,

LIX, École Polytechnique, CEA, CNRS, INRIA

Abstract. In a game from game theory, a Nash equilibrium (NE) is a

combination of one strategy per agent such that no agent can increase its
payoff by unilaterally changing its strategy. Kuhn proved that all (tree-
like) sequential games have NE. Osborne and Rubinstein abstracted over
these games and Kuhn’s result: they proved a sufficient condition on
agents’ preferences for all games to have NE. This paper proves a nec-
essary and sufficient condition, thus accounting for the game-theoretic
frameworks that were left aside. The proof is formalised using Coq, and
contrary to usual game theory it adopts an inductive approach to trees for
definitions and proofs. By rephrasing a few game-theoretic concepts, by
ignoring useless ones, and by characterising the proof-theoretic strength
of Kuhn’s/Osborne and Rubinstein’s development, this paper also clar-
ifies sequential game theory. The introduction sketches these clarifica-
tions, while the rest of the paper details the formalisation.

Keywords: Coq, induction, sequential game theory, abstraction,

eﬀective generalisation.

1 Introduction

In game theory a few classes of games, together with related concepts, can model
a wide range of real-world competitive interactions between agents. Game theory
is applied to economics, biology, computer science, political science, etc.
Sequential games (a.k.a. games in extensive form) are a widely studied class
of games. They may help model games where agents play in turn, such as Chess
in [22]. Given an arbitrary set of outcomes, an (abstract) sequential game is a
ﬁnite rooted tree where each internal node is owned by an agent and each leaf
encloses an outcome. The left-hand game below involves agents a and b and
outcomes oc1 , oc2 and oc3 .
a a
b oc3 b oc3
oc1 oc2 oc1 oc2

http://www.lix.polytechnique.fr/Labo/Stephane.Leroux

Anonymous referees, especially one of them, made very constructive comments.

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 293–309, 2009.

c Springer-Verlag Berlin Heidelberg 2009
294 S. Le Roux

Informally, a play starts at the root. If the root is a leaf, the play ends with the
enclosed outcome; otherwise the root owner chooses in which child, i.e. subgame,
the play continues. However, the concept of play is not needed. A strategy profile
(profile for short) is a game where each internal node has chosen a child. Choices
of a profile induce a unique path from the root to a leaf, which induces a unique
outcome. The right-hand strategy profile above, where choices are represented
by double lines, induces outcome oc3 . The left-hand profile below induces oc1 .
An agent can convert a profile into another one by changing its own nodes’
choices. For instance agent a can convert the right-hand profile above into the
left-hand one below; and below, agent b can convert the left-hand one into the
right-hand one. Note that for each agent, convertibility is an equivalence relation.
a a
b oc3 b oc3
oc1 oc2 oc1 oc2
To each agent, an arbitrary binary relation over outcomes is given. It is called
the agent’s preference and it induces a relation over profiles via their induced
outcomes. Generally speaking, a Nash equilibrium (NE for short) is a situation,
i.e. a profile in the present case, that makes every agent happy, where an agent
is happy if it cannot convert the situation into another situation that it prefers.
This concept is defined in [13] and [14], and it captures the notion of NE in
different types of games.
Traditional game theory involves real-valued payoff functions instead of ab-
stract outcomes, i.e. functions mapping agents to real numbers. Agents implicitly
prefer greater payoffs. The profiles below involve payoff functions where the first
figure relates to agent a. The first profile is not a NE since agent a is not happy:
if it played according to the profile, it would get 2, whereas by changing its choice
from right to left it converts the profile into the right-hand one yielding payoff
3. The second profile below is a NE: by changing its choice agent a gets payoff
1 instead of 2 (so it is happy with the current profile), and b has no influence on
the induced outcome so it is happy too. The third profile below is also a NE.
a a a
b 2, 2 b 2, 2 b 2, 2
1, 0 3, 1 1, 0 3, 1 1, 0 3, 1
Subgame perfect equilibria [18] (SPE for short) are NE each of whose subgame
is also an SPE. The second profile above is not an SPE because the subgame
whose root is owned by agent b is not a NE. The third profile above is an SPE.
Kuhn [9] proved that all sequential games involving real-valued payoff func-
tions have NE. His proof uses a recursive procedure, called backward induction in
game theory, to build from each game an SPE (also NE). The backward induc-
tion on the left-hand game below starts by letting agent b (more generally agents
at nodes closer to the leaves) maximise its payoff, in the middle picture. Then
agent a (more generally agents at nodes closer to the root) maximises its payoff
according to what has been chosen by b (more generally in all the subtrees).
Acyclic Preferences and Existence of Sequential Nash Equilibria 295

a a a
b 2, 2 b 2, 2 b 2, 2
1, 0 3, 1 1, 0 3, 1 1, 0 3, 1

A special case of Kuhn’s theorem was formalised by Vestergaard [21] using

Coq [2]: in this case, games are binary, i.e. each internal node has two choices,
and payoffs are natural numbers. Instead of the usual game-theoretic approach
to trees (seen as connected, acyclic graphs), Vestergaard adopts an inductive
approach: a leaf is a tree, a node with two tree children is a tree. This approach
is supported by Coq and is very convenient for formalisation. Also, profiles are
defined directly, which shows that the traditional notion of strategy is not needed
as far as [21] is concerned. See [11] for another formalisation in game theory.
Osborne and Rubinstein [16] abstracted Kuhn’s proof and result. Translated
in the formalism used in this paper, their result states that if agent preferences
are all strict weak orders, then all sequential games with abstract outcomes have
NE/SPE. (Two remarks: first, a strict weak order is equivalently an asymmetric
relation whose negation is transitive, a partial order whose non-comparability is
an equivalence relation, or the negation of a total preorder; second, this paper
calls preference the inverse of the negation of what Osborne and Rubinstein
called preference, so their result actually referred to a total preorder.)
Unfortunately they [16] do not account for, e.g., multi-criteria games. These
games involve real-valued vector payoffs to express that agents think w.r.t. sev-
eral incommensurable dimensions, as in [19] and [3]. Agents still prefer greater
payoffs, so vector [1, 3] is better than [1, 2] since 1 ≥ 1 and 3 > 2, but [0, 3]
and [1, 2] are non comparable since 0 < 1 but 3 > 2. The left-hand multi-criteria
profile below corresponds to a backward induction since at each stage a maximal
vector is chosen: [0, 3] < [2, 2] and [0, 3] < [1, 1]. This profile induces payoff [1, 1]
to agent a who can convert it by changing its two choices and get [2, 2] > [1, 1]:
this backward induction did not yield a NE. More generally, with abstract out-
comes, if agent a prefers z to x but cannot compare y with either of them, then
the right-hand profile below is a backward induction but not a NE.

a a
a [1, 1] a x
[2, 2] [0, 3] z y

The following proof-theoretic characterisation of Kuhn’s proof says that the

result in [16] is the best possible while following Kuhn’s proof structure, i.e. a
mere backward induction. Since the preference used for multi-criteria games is
not a strict weak order, this explains why the result in [16] does not account for
multi-criteria games. The three propositions below are equivalent. (One proves
by contraposition that the second one implies the ﬁrst one, by building a game
like the right-hand one above if an agent’s preference is not a strict weak order.)
296 S. Le Roux

– The preference of each agent is a strict weak order.

– Backward induction always yields a Nash equilibrium.
– Backward induction always yields a subgame perfect equilibrium.

Nonetheless, Krieger [8] proved that every multi-criteria sequential game has
a NE. His proof uses probabilities and strategic games, a class of games into
which sequential games can be embedded. However, Krieger’s result still does
not account for all games with abstract outcomes and moreover his proof would
not be easily formalised. Fortunately a generalisation of both [8] and [16] is given
in [13] (ch. 4). Furthermore, instead of a mere sufficient condition on preferences
like in [16], the following three propositions are proved equivalent.
1. The preference of each agent is acyclic.
2. Every sequential game has a Nash equilibrium.
3. Every sequential game has a subgame perfect equilibrium.
Existence of Nash equilibria in the traditional and mutlicriteria frameworks
are direct corollaries of the theorem above. This is also true for other frameworks
of interest as detailed in [13] (ch. 4).
The result above may be proved via three implications. 3) ⇒ 2) by definition.
2) ⇒ 1) is proved by contraposition: let an agent a prefer x1 to x0 , x2 to x1 ,
and so on, and x0 to xn . The game displayed below has no Nash equilibrium.
a
x0 x1 . . . xn
The main implication is 1) ⇒ 3). It may be proved as follows: first, since the
preferences are acyclic, they can be linearly extended; second, following Kuhn’s
proof structure, for any game there exists an SPE w.r.t. the linear preferences;
third, this SPE is also valid w.r.t. the smaller original preferences. This triple
equivalence is formalised constructively in Coq (v8.1). In terms of proof burden,
the main implication is still 1) ⇒ 3). However, the existence of a linear extension
constitutes a substantial part1 of the formal proof. In the second proof step
described above one cannot merely follow Kuhn’s proof structure: the definitions
and proofs have to be completely rephrased (often inductively, as in [21]) and
simplified in order to keep things practical and clear. Also, the formalisation is
constructive: it yields an algorithm for computing equilibria, which is the main
purpose of algorithmic game theory for various classes of games. Here in addition,
the algorithm is certified since it was written in Coq.
This paragraph suggests that all the ingredients used in this generalisation
were already well-known, although not yet put together: Utility theory prescribes
embedding abstract outcomes and preferences into the real numbers and their
usual total order, thus performing more than a linear extension, whereas the
first proof step above is a mere linear extension; Choice theory uses abstract
preferences and is aware of property preservation by preference inclusion, as
1
The linear extension proof was also slightly modified to be part of the Coq-related
CoLoR library [4].
Acyclic Preferences and Existence of Sequential Nash Equilibria 297

invoked in the third proof step above. Also, [16] uses reflexive preference; Kreps
[7] uses irreflexive ones; this paper assumes nothing on preferences but the way
they relate to NE. Osborne and Rubinstein were most likely aware of the above-
mentioned facts and techniques, but their totally preordered preferences seem
utterly general and non-improvable at first glance. On contrary when considering
the inverse of the negation of their preferences (i.e. a strict weak order), one
sees that generality was just an illusion. Like natural languages, mathematical
notations structure and drive our thoughts! All this suggests that in general,
formalisation (and the related mindset) may not only provide a guarantee of
correctness but also help build a deeper insight of the field being formalised.
Two alternative proofs of this paper’s main result are mentioned below. Unlike
the first proof, they cannot proceed by structural induction on trees, but strong
induction on the number of internal nodes works well. One proof of 1) ⇒ 2) is
given in [13] (ch. 5). It uses only transitive closure instead of linear extension,
but the proof technique suits only NE, not SPE. The (polymorphic) proof below
works for both 1) ⇒ 2) and 1) ⇒ 3). Note that both alternative proofs show
that the notion of SPE is not required to prove NE existence.

Proof. It suﬃces to prove it for strict weak orders. Assume a game g with n + 1
internal nodes. (The 0 case is a leaf case.) Pick one whose children are all leaves.
This node is the root of g0 and is owned by a. Let x be an a-maximal outcome
occurring in g0 . In g replace g0 with a leaf enclosing x. This new game g has n
or less internal nodes, so there is a NE (resp. SPE) s for g . In s replace the leaf
enclosing x by a proﬁle on g0 where a chooses a leaf enclosing x. This yields a
NE (resp. SPE) for g. (Consider happiness of agent a, then other agents.)

This diversity of proofs not only proposes alternative viewpoints on the structure
of sequential equilibria, but also constitutes a pool of reusable techniques for
generalising the result, e.g., in graphs as started in [13] (ch. 6 and 7). Nonetheless,
only the ﬁrst proof has been formalised.
Section 2 summarises the Coq proof for topological sorting, which was also
published [12] as emerging trend in this conference. Section 3 deals with game
theory, and is also meant for readers that are not too familiar with Coq.

2 Topological Sorting
The calculus of binary relations was developed by De Morgan around 1860. Then
the notion of transitive closure of a binary relation (smallest transitive binary
relation including a given binary relation) was defined in different manners by
different people around 1890. See Pratt [17] for a historical account. In 1930,
Szpilrajn [20] proved that, assuming the axiom of choice, any partial order has a
linear extension, i.e., is included in some total order. The proof invokes a notion
close to transitive closure. In the late 1950’s, The US Navy [1] designed PERT
(Program Evaluation Research Task or Project Evaluation Review Techniques)
for management and scheduling purposes. This tool partly consists in splitting
298 S. Le Roux

a big project into small jobs on a chart and expressing with arrows when one
job has to be done before another one can start up. In order to study the re-
sulting directed graph, Jarnagin [15] introduced a finite and algorithmic version
of Szpilrajn’s result. This gave birth to the widely studied topological sorting,
which spread to the industry in the early 1960’s (see [10] and [5]). Some technical
details and computer-oriented examples can be found in Knuth’s book [6].
Section 2 summarises a few folklore results involving transitive closure and
linear extension. No proof is given, but hopefully the definitions, statements,
and explanations will help understand the overall structure of the development.
This section requires a basic knowledge of Coq and its standard library.
In the remainder of this section, A is a Set ; x and y have type A; R is a
binary relation over A; l is a list over A; and n is a natural number. A finite
“subset” of A is represented by any list involving all the elements of the subset.
For the sake of readability, types will sometimes be omitted according to the
above convention, even in formal statements where Coq could not infer them.
Proving constructively or computing properties about binary relations will
require the following definitions about excluded middle and decidability.

Deﬁnition eq midex := ∀ x y, x =y ∨ x =y.

Definition rel midex R := ∀ x y, R x y ∨ ¬R x y.
Definition eq dec := ∀ x y, {x =y}+{x =y}.
Definition rel dec R := ∀ x y, {R x y}+{¬R x y}.

In the Coq development similar results were proved for both excluded middle and
decidability, but this section focuses on excluded middle. The main result of the
section, which is invoked in section 3, says that given a middle-excluding relation
(rel midex ), it is acyclic and equality on its domain is middle-excluding iff its
restriction to any finite set has a middle-excluding irreflexive linear extension.
Section 2.1 gives basic new definitions about lists, relations, and finite restric-
tions, as well as part of the required lemmas; section 2.2 gives the definition of
paths and relates it to transitive closure; section 2.3 designs increasingly complex
functions leading to linear extension and the main result.

2.1 Lists, Relations, and Finite Restrictions

The lemma below will be used to extract simple, i.e. loop-free, paths from paths
represented by lists. It says that if equality on A is middle-excluding and if an
element occurs in a list over A, then the list can be decomposed into three parts:
a list, one occurrence of the element, and a second list free of the element.

Lemma In elim right : eq midex → ∀ x l,

In x l → ∃ l’, ∃ l”, l =l’ ++(x ::l”) ∧ ¬In x l”.

The predicate repeat free in Prop says that no element occurs more than once
in a list. It is deﬁned by recursion and used in the lemma below that will help
prove that a simple path is not longer than the path from which it is extracted.
Acyclic Preferences and Existence of Sequential Nash Equilibria 299

Lemma repeat free incl length : eq midex → ∀ l l’,

repeat free l → incl l l’ → length l ≤length l’.
The notion of subrelation is defined below, and the lemma that follows states
that a transitive relation contains its own transitive closure. (And conversely.)
Definition sub rel R R’ : Prop := ∀ x y, R x y → R’ x y.
Lemma transitive sub rel clos trans : ∀ R,
transitive R → sub rel (clos trans R) R.
As defined below, a relation is irreflexive if no element relates to itself. So ir-
reflexivity of a relation implies irreflexivity of its subrelations, as stated below.
Together with transitive closure, irreflexivity will help state absence of cycle.
Definition irreflexive R : Prop := ∀ x, ¬R x x.
Lemma irreflexive preserved : ∀ R R’,
sub rel R R’ → irreflexive R’ → irreflexive R.
The restriction of a relation to a finite set/list is defined below. Then the predi-
cate is restricted says that the support of a relation R is included in the list l.
The lemma that follows states that transitive closure preserves restriction to a
given list. This will help compute finite linear extensions of finite relations.
Definition restriction R l x y : Prop := In x l ∧ In y l ∧ R x y.
Definition is restricted R l : Prop := ∀ x y, R x y → In x l ∧ In y l.
Lemma restricted clos trans : ∀ R l,
is restricted R l → is restricted (clos trans R) l.

2.2 Paths
Transitive closure will help guarantee acyclicity and build linear extensions. But
its definition is not convenient if a witness is needed. A path is a witness, i.e. a
list recording consecutive steps of a given relation. It is formally defined below.
(Note that if relations were decidable, is path could return a Boolean.) The next
two lemmas state an equivalence between paths and transitive closure.
Fixpoint is path R x y l {struct l } : Prop :=
match l with
— nil ⇒ R x y
— z ::l’ ⇒ R x z ∧ is path R z y l’
end.
Lemma clos trans path : ∀ x y, clos trans R x y → ∃ l, is path R x y l.
Lemma path clos trans : ∀ y l x, is path R x y l → clos trans R x y.
The next lemma states that a path can be transformed into a simple path. It
will help bound the computation of transitive closure for finite relations.
Lemma path repeat free length : eq midex → ∀ y l x, is path R x y l →
∃ l’, ¬In x l’ ∧ ¬In y l’ ∧ repeat free l’ ∧
length l’ ≤ length l ∧ incl l’ l ∧ is path R x y l’.
300 S. Le Roux

The predicate bounded path R n below says whether two given elements are
related by a path of length n or less. It is intended to abstract over the list
witness of a path while bounding its length, which transitive closure cannot do.
Inductive bounded path R n : A → A → Prop :=
— bp intro : ∀ x y l, length l ≤ n → is path R x y l → bounded path R n x y.
Below, two lemmas show that bounded path is weaker than clos trans in gen-
eral, but equivalent on finite sets for some bound.
Lemma bounded path clos trans : ∀ R n,
sub rel (bounded path R n) (clos trans R).
Lemma clos trans bounded path : eq midex → ∀ R l,
is restricted R l → sub rel (clos trans R) (bounded path R (length l )) .
The first lemma below states that if a finite relation is excluding middle, so
are its ”bounded transitive closures”. Thanks to this and the lemma above, the
second lemma states that it also holds for the transitive closure.
Lemma bounded path midex : ∀ R l n,
is restricted R l → rel midex R → rel midex (bounded path R n).
Lemma restricted midex clos trans midex : eq midex → ∀ R l,
rel midex R → is restricted R l → rel midex (clos trans R).
The following theorems state the equivalence between decidability of a relation
and uniform decidability of the transitive closures of its finite restrictions. Note
that decidable equality is required only for the second implication. These results
remain correct when considering excluded middle instead of decidability.
Theorem clos trans restriction dec R dec : ∀ R
(∀ l, rel dec (clos trans (restriction R l ))) → rel dec R.
Theorem R dec clos trans restriction dec : eq dec → ∀ R
rel dec R → ∀ l, rel dec (clos trans (restriction R l )).

2.3 Linear Extension

This section presents a way of extending linearly an acyclic finite relation. (This
is not the fastest topological sort algorithm though.) The intuitive idea is to
repeat the following while it is possible: take the transitive closure of the rela-
tion and add an arc to the relation without creating 2-step cycles. Repetition
ensures saturation, absence of 2-step cycle ensures acyclicity, and finiteness en-
sures termination. In this section this idea is implemented through several stages
of increasing complexity. This not only helps describe the procedure clearly, but
it also facilitate the proof of correctness by splitting it into intermediate results.
Total relations will help define linear extensions. They are defined below.
Definition trichotomy R x y : Prop := R x y ∨ x =y ∨ R y x.
Definition total R l : Prop := ∀ x y, In x l → In y l → trichotomy R x y.
Acyclic Preferences and Existence of Sequential Nash Equilibria 301

The definition below adds an arc to a relation if not creating 2-step cycles.
Inductive try add arc R x y : A → A → Prop :=
— keep : ∀ z t, R z t → try add arc R x y z t
— try add : x =y → ¬R y x → try add arc R x y x y.
As stated below, try add arc creates no cycle in strict partial orders.
Lemma try add arc irrefl : eq midex → ∀ R x y,
transitive R → irreflexive R → irreflexive (clos trans (try add arc R x y)).

While creating no cycle, the function below alternately performs a transitive

closure and tries to add arcs starting at a given point and ending in a given list.
Fixpoint try add arc one to many R x l {struct l } : A → A → Prop :=
match l with
— nil ⇒ R
— y::l’ ⇒ clos trans (try add arc (try add arc one to many R x l’ ) x y)
end.
Preserving acyclicity, the function below alternately performs a transitive clo-
sure and tries to add all arcs starting in the first list and ending in the second.
Fixpoint try add arc many to many R l’ l {struct l’ } : A → A → Prop :=
match l’ with
— nil ⇒ R
— x ::l” ⇒ try add arc one to many (try add arc many to many R l” l ) x l
end.
While preserving acyclicity, the next function tries to add all arcs both starting
and ending in a list to the restriction of the relation to that list. This function
is the one that was meant informally in the beginning of section 2.3.
Definition LETS R l : A → A → Prop :=
try add arc many to many (clos trans (restriction R l )) l l.
The five lemmas below will help state that LETS constructs a middle-excluding
linear extension. They rely on similar intermediate results for the intermediate
functions above, but these intermediate results are not displayed.
Lemma LETS transitive : ∀ R l, transitive (LETS R l ).
Lemma LETS restricted : ∀ R l, is restricted (LETS R l ) l.
Lemma LETS irrefl : eq midex → ∀ R l,
(irreflexive (clos trans (restriction R l )) ↔ irreflexive (LETS R l )).
Lemma LETS total : eq midex → ∀ R l, rel midex R → total (LETS R l ) l.
Lemma LETS midex : eq midex →∀ R l, rel midex R → rel midex (LETS R l ).
Below, a linear extension (over a list) of a relation is a strict total order (over
the list) that is bigger than the original relation (restricted to the list).
Definition linear extension R l R’ := is restricted R’ l ∧
sub rel (restriction R l ) R’ ∧ transitive A R’ ∧ irreflexive R’ ∧ total R’ l.
302 S. Le Roux

Consider a middle-excluding relation. It is acyclic and equality is middle-

excluding iﬀ for any list there exists a decidable strict total order containing
the original relation (on the list). The witness R’ below is provided by LETS.
Theorem linearly extendable : ∀ R, rel midex R →
(eq midex ∧ irreﬂexive (clos trans R) ↔
∀ l, ∃ R’, linear extension R l R’ ∧ rel midex R’ ).

3 Sequential Games
Section 3.1 presents three preliminary concepts and a lemma on lists and pred-
icates; section 3.2 defines sequential games and strategy profiles; section 3.3
defines functions on games and profiles; section 3.4 defines the notions of pref-
erence, NE, and SPE; and section 3.5 shows that universal existence of these
equilibria is equivalent to acyclicity of preferences.

3.1 Preliminaries
The function listforall expects a predicate on a Set called A, and returns a
predicate on lists stating that all the elements in the list comply with the original
predicate. This will help define, e.g., the third concept below. It is recursively
defined along the inductive structure of the list argument. It is typed as follows.
listforall : (A → Prop) → list A → Prop
The function rel vector expects a binary relation and two lists over A and states
that the lists are element-wise related (which implies that they have the same
length). This will help define the convertibility between two profiles. It is recur-
sively defined along the first list argument and it is typed as follows.
rel vector : (A → A → Prop) → list A → list A → Prop
Given a binary relation, the predicate is no succ returns a proposition saying
that no element in a given list is the successor of a given element.
Definition is no succ P x l := listforall (fun y → ¬P x y) l.
The definition above help state lemma Choose and split that expects a decidable
relation over A and a non-empty list over A and splits it into one element and two
lists by choosing the first (from the head) element that is maximal among the
remaining elements. The example below involves divisibility over the naturals.
2 :: 3 :: 9 :: 4 :: 9 :: 6 :: 2 :: 16 :: nil

left list choice right list

2 :: 3 :: 9 :: 4 :: nil 9 6 :: 2 :: 16 :: nil
This lemma will help deﬁned backward induction for arbitrary preferences. Note
that if preferences are orders, it chooses a maximal element in the whole list.
Acyclic Preferences and Existence of Sequential Nash Equilibria 303

3.2 Sequential Games and Strategy Proﬁles

Games and proﬁles are inductively deﬁned in two steps. Graphically: if oc is an

outcome, the left (resp. right)-hand object below is a game (resp. proﬁle).

oc oc

Below, if a is an agent, g a game, and l a list of games, the left-hand object is

a game. If l is empty, g ensures that the internal node has as at least one child.
If s is a profile and r and t are lists of profiles, the right-hand object is a profile
where agent a at the root chooses the profile s.

a a

g r s t
l

In Coq, given two sets Outcome and Agent, sequential games and strategy
profiles are defined as below. (gL stands for game leaf and gN for game node.)
Variables (Outcome : Set )(Agent : Set ).
Inductive Game : Set :=
| gL : Outcome → Game
| gN : Agent → Game → list Game → Game.
Inductive Strat : Set :=
| sL : Outcome → Strat
| sN : Agent → list Strat → Strat → list Strat → Strat.
The induction principle that Coq automatically associates to games (resp.
profiles) ignores the inductive structure of lists. Mutually defining lists with
games may solve this but rules out using the Coq standard library for these new
lists. The principle stated below is built manually via a Fixpoint. There are four
premises: two for the horizontal induction along empty lists and compound lists,
and two for the vertical induction along leaf games and compound games.
Game ind2 : ∀ (P : Game → Prop) (Q : Game → list Game → Prop),
(∀ oc, P (gL oc)) →
(∀ g, P g → Q g nil ) →
(∀ g, P g → ∀ g’ l, Q g’ l → Q g’ (g :: l )) →
(∀ g l, Q g l → ∀ a, P (gN a g l )) →
∀ g, P g
In order to prove a property ∀ g : Game, P g with the induction principle
Game ind2, the user has to provide a predicate Q that is easily (yet appar-
ently not automatically in general) derived from P. Also note that the induction
principle for profiles requires one more premise since two lists are involved. These
principles are invoked in most of the proofs of the Coq development.
304 S. Le Roux

3.3 Structural Deﬁnitions

This subsection presents structural deﬁnitions, i.e. not involving preferences,

relating to games and strategy profiles.
Below, the function UsedOutcomes is defined recursively on the game struc-
ture: it expects a game and returns a list of the outcomes occurring in the game.
It will help restrict the agent preferences to the relevant finite set prior to the
topological sorting.

Fixpoint UsedOutcomes (g : Game) : list Outcome :=

match g with
| gL oc ⇒ oc::nil
| gN g’ l ⇒ ((ﬁx ListUsedOutcomes (l’ : list Game) : list Outcome :=
match l’ with
| nil ⇒ nil
| x ::m ⇒ (UsedOutcomes x )++(ListUsedOutcomes m)
end ) (g’ ::l ))
end.

A proﬁle induces a unique outcome, as computed by the 2-step rule be-

low. The intended function is also defined in Coq below, and it will help define
preferences over profiles from preferences over outcomes.

a
s
oc oc s
l l’

Fixpoint InducedOutcome (s : Strat ) : Outcome :=

match s with
| sL oc ⇒ oc
| sN a sl sc sr ⇒ InducedOutcome sc
end.

The function s2g expects a profile and returns its underlying game by forget-
ting the nodes’ choices. It will help state that given a game (and preferences),
a well-chosen profile is a NE for this game. s2g is computed by the 3-step rule
below. (The Coq definition is omitted.) Note that the two big steps are a case
splitting along the structure of the first list, i.e., whether it is empty or not.

s2g
a a

a a
s2g

s s2g s map s2g l

nil l’
Acyclic Preferences and Existence of Sequential Nash Equilibria 305

a a
s2g

s0 :: l s l’ s2g s0

map s2g (l + +s :: l )

Below, Conv a s s’ means that agent a is able to convert s into s’ by chang-

ing part of its choices. ListConv means component-wise convertibility of list of
profiles and it is defined by mutual induction with Conv.
Inductive Conv : Agent → Strat → Strat → Prop :=
| convLeaf : ∀ b oc, Conv b (sL oc)(sL oc)
| convNode : ∀ b a sl sl’ sc sc’ sr sr’, (length sl =length sl’ ∨ a = b) →
ListConv b (sl ++(sc::sr )) (sl’ ++(sc’ ::sr’ )) →
Conv b (sN a sl sc sr )(sN a sl’ sc’ sr’ )
with
ListConv : Agent → list Strat → list Strat → Prop :=
| lconvnil : ∀ b, ListConv b nil nil
| lconvcons : ∀ b s s’ tl tl’, Conv b s s’ → ListConv b tl tl’ →
ListConv b (s::tl )(s’ ::tl’ ).
Above, the subformula length sl =length sl’ ∨ a = b ensures that only the
owner of a node can change his choice at that node. ListConv b (sl ++(sc::sr ))
(sl’ ++(sc’ ::sr’ )) ensures that this property holds also in the subprofiles.
A suitable 4-premise induction principle for convertibility is generated with
the Coq command Scheme. This principle, which is not displayed, is invoked to
prove that two convertible profiles have the same underlying game. Also, it is
proved by induction on profiles that Conv a is an equivalence relation.

3.4 Concepts of Equilibrium

Below, OcPref a is the preference of agent a. It induces a preference over proﬁles.

Variable OcPref : Agent → Outcome → Outcome → Prop.
Definition StratPref (a : Agent )(s s’ : Strat ) : Prop :=
OcPref a (InducedOutcome s)(InducedOutcome s’ ).
Below, an agent is happy with a profile if it cannot convert it into a preferred
profile. A Nash equilibrium (NE for short) is a profile making every agent happy.
Definition Happy (s : Strat )(a : Agent ) : Prop := ∀ s’,
Conv a s s’ → ¬StratPref a s s’.
Definition Eq (s : Strat ) : Prop := ∀ a, Happy s a.
Instead of ¬StratPref a s s’ above, Osborne and Rubinstein would have written
StratPref OR a s’ s, hence the negated inversion relating the two notations.
306 S. Le Roux

A subgame perfect equilibrium (SPE) is a NE whose subproﬁles are SPE.

Fixpoint SPE (s : Strat ) : Prop := Eq s ∧
match s with
| sL oc ⇒ True
| sN a sl sc sr ⇒ (listforall SPE sl ) ∧ SPE sc ∧ (listforall SPE sr )
end.
The following key lemma means that if the root owner of a proﬁle s is happy
and chooses a NE, s is a NE. (The converse also holds but is not needed here.)
Lemma Eq subEq choice : ∀ a sl sc sr,
(∀ s s’, In s sl ∨ In s sr → Conv a s s’ → ¬StratPref a sc s’ ) →
Eq sc → Eq (sN a sl sc sr ).

3.5 Existence of Equilibria

Assume that preferences over outcomes are decidable. This implies the same for
preferences over profiles.
Hypothesis OcPref dec : ∀ (a : Agent ), rel dec (OcPref a).
Lemma StratPref dec : ∀ (a : Agent ), rel dec (StratPref a).
The backward induction function is defined below. Slight simplifications of
the actual Coq code were made for readability purpose.
Fixpoint BI (g : Game) : Strat :=
match g with
| gL oc ⇒ sL oc
| gN a g l ⇒ let (sl,sc,sr ):=
Choose and split (StratPref dec a) (map BI l ) (BI g) in
sN a sl sc sr
end.
It is proved (although not here) that BI preserves the underlying game.
Total order case. Preferences over outcomes are temporarily transitive and
irreflexive. Therefore those properties also hold for preferences over profiles.
Hypothesis OcPref irrefl : ∀ (a : Agent ), irreflexive (OcPref a).
Hypothesis OcPref trans : ∀ (a : Agent ), transitive (OcPref a).
Lemma StratPref irrefl : ∀ (a : Agent ), irreflexive (StratPref a).
Lemma StratPref trans : ∀ (a : Agent ), transitive (StratPref a).
If preferences are total over a given list of outcomes, BI yields SPE for any
sequential game using only outcomes from the list, as stated below.
Lemma BI SPE : ∀ loc : list Outcome, (∀ a : Agent, total (OcPref a) loc) →
∀ g : Game, incl (UsedOutcomes g) loc → SPE (BI g).
The proof relies on the key analytical lemma Eq subEq choice. The same proof
structure would work for strict weak orders and thus translates Kuhn’s and
Acyclic Preferences and Existence of Sequential Nash Equilibria 307

Osborne and Rubinstein’s results into the abstract sequential game formalism
of this paper. However, it is not even needed in the remainder.

General case. Until now equilibrium and related concepts have been defined
w.r.t. implicit given preferences, but from now these concepts take arbitrary
preferences as an explicit parameter. For instance, instead of writing Eq s, one
shall write Eq OcPref s to say that s is a NE with respect to the family OcPref.
The following lemma says two things: first, the equilibrium-hood of a pro-
file depends only on the restrictions of agent preferences to the outcomes that
are used in (the underlying game of) the profile; second, removing arcs from
agent preferences preserves equilibrium-hood (informally because less demand-
ing agents are more likely to be happy). A similar result holds for SPE.
Lemma Eq order inclusion : ∀ OcPref OcPref ’ s,
(∀ a, sub rel (restriction (OcPref a) (UsedOutcomes (s2g s))) (OcPref ’ a)) →
Eq OcPref ’ s → Eq OcPref s.
The theorem below constitutes the main implication of the triple equivalence
referred to in section 1. It generalises the results in [9], [16] and [8] to acyclic
preferences. It invokes theorem linearly extendable from section 2.3.

Theorem acyclic SPE : ∀ OcPref, (∀ a, rel dec (OcPref a)) →

(∀ a, irreﬂexive (clos trans Outcome (OcPref a))) →
∀ g, {s : Strat | s2g s=g ∧ SPE OcPref s}.

4 Conclusion

This paper and the related Coq development have thus abstracted, generalised,
formalised, and clarified existing game-theoretic results in [9], [16] and [8]: in-
stead of real-valued (vector) payoffs they refer to abstract outcomes; instead of
the usual order over the reals they refer to abstract preferences; instead of a mere
sufficient condition on preferences they prove a necessary and sufficient condi-
tion; instead of a set-theoretic approach they use an inductive-type approach.
The difficulties that were encountered in this proof are of two sorts: first, the
more general framework brings new issues, e.g. collecting the outcomes used in a
game, topological sorting, defining backward induction for arbitrary preferences;
second, the formal proof must cope with issues that were ignored in pen-and-
paper proofs, e.g. rigorous definitions, associated proof principles, underlying
game of a strategy profile. These second issues are already addressed in [21].
In the abstract-preference framework, it may sound tempting to prove for-
mally the necessary and sufficient condition only for binary trees and then argue
informally that the general case is similar or reducible to the binary case. Such
an argument may contradict the idea that formalisation is a useful step towards
guarantee of correctness and deeper insight: it ignores that general trees require
more complex representations and induction principles and that some property
holds for binary trees but not in general. Nonetheless the author believes that it
308 S. Le Roux

would be possible to first prove the result for binary trees and then reduce for-
mally general trees to binary trees. However this would require to define games,
profiles, convertibility, equilibria, etc., (but backward induction) for both the bi-
nary and the general settings, and it would still require the proof of topological
sorting: so it would be more complex than the proof discussed in this paper.
In section 3.3 convertibility between strategy profiles was defined as an induc-
tive type in Prop. This allows not even assuming decidability of agents’ equal-
ity. Alternatively, assuming decidability of agents’ equality would allow defining
convertibility as a recursive function onto Booleans. More generally, if the Coq
development were to be rephrased or generalised to graph structure instead of
trees (as in [13], chap. 6 and 7), it might be interesting to discharge part of
the proof burden on Coq’s computational ability and keep in the proof script
only the subtle reasoning. This may be achieved in part by using more recursive
Boolean functions at the acceptable expense of decidability assumptions.

References

1. Anonymous. Program evaluation research task. Summary report Phase 1 and 2,

U.S. Government Printing Office, Washington, D.C. (1958)
2. Berthot, Y., Castéran, P.: Interactive Theorem Proving and Program Development
Coq’Art: The Calculus of Inductive Constructions. Springer, Heidelberg (2004)
3. Blackwell, D.: An analog of the minimax theorem for vector payoffs. Pacific Journal
of Mathematics 6, 1–8 (1956)
4. Blanqui, F., Coupet-Grimal, S., Delobel, W., Hinderer, S., Koprowski, A.: CoLoR,
a Coq Library on rewriting and termination. In: Workshop on Termination (2006)
5. Kahn, A.B.: Topological sorting of large networks. Commun. ACM 5(11), 558–562
(1962)
6. Knuth, D.E.: The Art of Computer Programming, 2nd edn., vol. 1. Addison Wesley,
Reading (1973)
7. Kreps, D.M.: Notes on the Theory of Choice. Westview Press, Inc., Boulder (1988)
8. Krieger, T.: On pareto equilibria in vector-valued extensive form games. Mathe-
matical Methods of Operations Research 58, 449–458 (2003)
9. Kuhn, H.W.: Extensive games and the problem of information. Contributions to
the Theory of Games II (1953)
10. Lasser, D.J.: Topological ordering of a list of randomly-numbered elements of a
network. Commun. ACM 4(4), 167–168 (1961)
11. Le Roux, S.: Non-determinism and Nash equilibria for sequential game over partial
order. In: Computational Logic and Applications, CLA 2005. Discrete Mathematics
& Theoretical Computer Science (2006)
12. Le Roux, S.: Acyclicity and finite linear extendability: a formal and constructive
equivalence. In: Schneider, K., Brandt, J. (eds.) Theorem Proving in Higher Order
Logics: Emerging Trends Proceedings, September 2007, pp. 154–169. Department
of Computer Science, University of Kaiserslautern (2007)
13. Le Roux, S.: Generalisation and formalisation in game theory. Ph.d. thesis, Ecole
Normale Supérieure de Lyon (January 2008)
14. Le Roux, S., Lescanne, P., Vestergaard, R.: A discrete Nash theorem with quadratic
complexity and dynamic equilibria. Research report IS-RR-2006-006, JAIST (2006)
Acyclic Preferences and Existence of Sequential Nash Equilibria 309

15. Jarnagin, M.P.: Automatic machine methods of testing pert networks for con-
sistency. Technical Memorandum K-24/60, U. S. Naval Weapons Laboratory,
Dahlgren, Va (1960)
16. Osborne, M.J., Rubinstein, A.: A Course in Game Theory. The MIT Press, Cam-
bridge (1994)
17. Pratt, V.: Origins of the calculus of binary relations. In: Logic in Computer Science
(1992)
18. Selten, R.: Spieltheoretische Behandlung eines Oligopolmodells mit Nach-
frageträgheit. Zeitschrift für die desamte Staatswissenschaft 121 (1965)
19. Simon, H.A.: A behavioral model of rational choice. The Quarterly Journal of
Economics 69(1), 99–118 (1955)
20. Szpilrajn, E.: Sur l’extension de l’ordre partiel. Fund. Math. 16 (1930)
21. Vestergaard, R.: A constructive approach to sequential Nash equilibria. Information
Processing Letter 97, 46–51 (2006)
22. Zermelo, E.: Über eine Anwendung der Mengenlehre auf die Theorie des
Schachspiels. In: Proceedings of the Fifth International Congress of Mathemati-
cians, vol. 2 (1912)
Formalising FinFuns – Generating Code for
Functions as Data from Isabelle/HOL

Andreas Lochbihler

Universität Karlsruhe (TH), Germany

[email protected]

Abstract. FinFuns are total functions that are constant except for a fi-
nite set of points, i.e. a generalisation of finite maps. We formalise them
in Isabelle/HOL and present how to safely set up Isabelle’s code genera-
tor such that operations like equality testing and quantification on Fin-
Funs become executable. On the code output level, FinFuns are explicitly
represented by constant functions and pointwise updates, similarly to as-
sociative lists. Inside the logic, they behave like ordinary functions with
extensionality. Via the update/constant pattern, a recursion combinator
and an induction rule for FinFuns allow for defining and reasoning about
operators on FinFuns that directly become executable. We apply the ap-
proach to an executable formalisation of sets and use it for the semantics
for a subset of concurrent Java.

1 Introduction

In recent years, executable formalisations, proofs by reﬂection [8] and automated

generators for counter examples [1,5] have received much interest in the theorem
proving community. All major state-of-the-art theorem provers like Coq, ACL2,
PVS, HOL4 and Isabelle feature some interface to a standard (usually external)
functional programming language to directly extract high-assurance code from
theorems or proofs or both. Isabelle/HOL provides two code generators [3,6],
which support datatypes and recursively defined functions, where Haftmann’s
[6] is supposed to replace Berghofer’s [3]. Berghofer’s, which is used to search
for counter examples by default (quickcheck ) [1], can also deal with inductively
defined predicates, but not with type classes. Haftmann’s additionally supports
type classes and output in SML, OCaml and Haskell, but inductively defined
predicates are not yet available and quickcheck is still experimental.
Beyond these areas, code generation is currently rather limited in Is-
abelle/HOL. Consequently, the everyday Isabelle user invokes the quickcheck
facility on some conjecture and frequently encounters an error message such
as “Unable to generate code for op = (λx . True )” or “No such mode [1,
2] for ...”. Typically, such a message means that an assumption or conclusion
involves a test on function equality (which underlies both universal and existen-
tial quantifiers) or an inductive predicate no code for which can be produced. In
particular, the following restrictions curb quickcheck ’s usefulness:

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 310–326, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Formalising FinFuns 311

– Equality on functions is only possible if the domain is ﬁnite and enumerable.

– Quantiﬁers are only executable if they are bounded by a ﬁnite set (e.g.
∀ x∈A. P x ).
– (Finite) sets are explicitly represented by lists, but as the set type has been
merged with predicates in version Isabelle 2008, only Berghofer’s code gen-
erator can work with sets properly.

The very same problems reoccur when provably correct code from a formalisation
is to be extracted, although one is willing to commit more effort in adjusting
the formalisation and setting up the code generator for it in that case. To apply
quickcheck to their formalisations, end-users expect to supply little or no effort.
In the area of programming languages, states (like memories, stores, and
thread pools) are usually finite, even though the identifiers (addresses, variable
names, thread IDs, ...) are typically taken from an infinite pool. Such a state is
most easily formalised as a (partial) function from identifiers to values. Hence,
enumerating all threads or comparing two stores is not executable by default.
Yet, a finite set of identifier-value pairs could easily store such state informa-
tion, which is normally modified point-wisely. Explicitly using associative lists
in one’s formalisation, however, incurs a lot of work because one state has in
general multiple representations and AC1 unification is not supported.
For such kind of data, we propose to use a new type FinFun of total functions
that are constant except for finitely many points. They generalise maps, which
formally are total functions of type a ⇒ b option that map to None (“undefined”)
almost everywhere, in two ways: First, they can replace (total) functions of
arbitrary type a ⇒ b. Second, their default value is not fixed to a predetermined
value (like None ). Our main technical contributions are:1
1. On the code level, every FinFun is represented as explicit data via two
datatype constructors: constant FinFuns and pointwise update (cf. Sec. 2).
quickcheck is set up for FinFuns and working.
2. Inside the logic, FinFuns feel very much like ordinary functions (e.g. exten-
sionality: f = g ←→ (∀ x. f x = g x)) and are thus easily integrated into
existent formalisations. We demonstrate this in two applications (Sec. 5):
(a) A formalisation of sets as FinFuns allows sets to be represented explicitly
in the generated code.
(b) We report on our experience in using FinFuns to represent state informa-
tion for JinjaThreads [12], a semantics for a subset of concurrent Java.
3. Equality tests on, quantification over and other operators on FinFuns are all
handled by Isabelle’s new code generator (cf. Sec. 3).
4. All equations for code generation have passed through Isabelle’s inference
kernel, i.e., the trusted code base cannot be compromised by ad-hoc transla-
tions where constants in the logic are explicitly substituted by functions of
the target language.
5. A recursion combinator allows to directly define functions that are recursive
in an argument of FinFun type (Sec. 4).
1
The FinFun formalisation is available in the Archive of Formal Proofs [13].
312 A. Lochbihler

FinFuns are a rather restricted class of functions. To represent such functions

as associative lists is common knowledge in computer science, but we focus
on how to practically hide the problems that such representation issues raise
during reasoning without losing the benefits of executability. In Sec. 6, we discuss
which functions FinFuns can replace and which not, and compare the techniques
and ideas we use with other applications. Isabelle-specific notation is defined in
appendix A.

2 Type Deﬁnition and Basic Properties

To start with, we construct the new type a ⇒f b for FinFuns. This type contains
all functions from a to b which map only finitely many points a :: a to some
value other than some constant b :: b, i.e. are constant except for finitely many
points. We show that all elements of this type can be built from two constructors:
The everywhere constant FinFun and pointwise update of a FinFun (Sec. 2.1).
Code generated for operators on FinFuns will be recursive via these two kernel
functions (cf. Sec. 2.2).
In Isabelle/HOL, a new type is declared by specifying a non-empty carrier set
as a subset of an already existent type. The new type for FinFuns is isomorphic
to the set of functions that deviate from a constant at only finitely many points:

typedef ( a, b) ﬁnfun = {f :: a⇒ b | ∃ b. ﬁnite {a | f a = b}}

Apart from the new type ( a, b) finfun (written a ⇒f b), this introduces
the set finfun :: ( a ⇒ b) set given on the right-hand side and the two bijection
functions Abs-finfun and Rep-finfun between the sets UNIV :: ( a ⇒f b) set and
finfun such that Rep-finfun is surjective and they are inverses of each other:

Rep-ﬁnfun f̂ ∈ ﬁnfun (1)

Abs-finfun (Rep-finfun f̂ ) = f̂ (2)
f ∈ finfun −→ Rep-finfun (Abs-finfun f ) = f (3)

For clarity, we decorate all variable identiﬁers of FinFun type a ⇒f b with

a hat ˆ to distinguish them from those of ordinary function type a ⇒ b. Note
that the default value b of the function, to which it does not map only finitely
many points, is not stored in the type elements themselves. In case a is infinite,
any such b is uniquely determined and would therefore be redundant. If not,
{a | f a = b} is finite for all f :: a ⇒ b and b:: b, i.e. finfun = UNIV. Moreover, if
that default value was fixed, then equality on a ⇒f b would not be as expected,
cf. (5).
The function finfun-default f̂ returns the default value of f̂ for infinite domains.
For finite domains, we fix it to undefined which is an arbitrary (but fixed) constant
to represent undefinedness in Isabelle:

finfun-default f̂ ≡ if finite UNIV then undefined else ιb. finite {a | Rep-finfun f̂ a = b}

Formalising FinFuns 313

2.1 Kernel Functions for FinFuns

Having manually defined the type, we now show that every FinFun can be
generated from two kernel functions similarly to a datatype element from
its constructors: The constant function and pointwise update. For b:: b, let
K f b:: a ⇒f b represent the FinFun that maps everything to b. It is defined
by lifting the constant function λx:: a. b via Abs-finfun to the FinFun type. Sim-
ilarly, pointwise update finfun-update, written ( :=f ), is defined in terms of
pointwise function update on ordinary functions:

K f b ≡ Abs-finfun (λx. b) and f̂ (a :=f b) ≡ Abs-finfun ((Rep-finfun f̂ )(a := b))

Note that these two kernel functions replace λ-abstraction of ordinary func-
tions. Since the code generator will internally use these two constructors to
represent FinFuns as data objects, proper λ-abstraction (via Abs-finfun) is not
executable and is therefore deprecated. Consequently, all executable operators
on FinFuns are to be defined (recursively) in terms of these two kernel func-
tions. On the logic level, λ-abstraction is of course available via Abs-finfun, but
it will be tedious to reason about such functions: Arbitrary λ-abstraction does
not guarantee the finiteness constraint in the type definition for a ⇒f b, hence
this constraint must always be shown separately.
We can now already define what function application on a ⇒f b will be,
namely Rep-finfun. To facilitate replacing ordinary functions with FinFuns in
existent formalisations, we write function applications as a postfix subscript f :
f̂ f a ≡ Rep-finfun f̂ a. This directly gives the kernel functions their semantics:

(K f b)f a = b and f̂ (a :=f b)f a = (if a = a then b else f̂ f a ) (4)

Moreover, we already see that extensionality for HOL functions carries over to
FinFuns, i.e. = on FinFuns does denote what it intuitively ought to:

f̂ = ĝ ←→ (∀ x. f̂ f x = ĝ f x) (5)

There are only few characteristic theorems about these two kernel functions.
In particular, they are not free constructors, as e.g. the following equalities hold:

(K f b)(a :=f b) = K f b (6)

f̂ (a :=f b)(a :=f b ) = f̂ (a :=f b ) (7)

a = a −→ f̂ (a :=f b)(a :=f b ) = f̂ (a :=f b )(a :=f b) (8)

This is natural, because FinFuns are meant to behave like ordinary functions and
these equalities correspond to the standard ones for pointwise update on ordinary
functions. Only K f is injective: (K f b) = (K f b ) ←→ b = b . From a logician’s
point of view, non-free constructors are not desirable because recursion and
case analysis becomes much more complicated. However, the savings in proof
automation that extensionality for FinFuns permit are worth the extra eﬀort
when it comes to deﬁning operators on FinFuns.
314 A. Lochbihler

More importantly, these two kernel functions exhaust the type a ⇒f b. This
is most easily stated by the following induction rule, which is proven by induction
on the ﬁnite set on which Rep-ﬁnfun ĝ does not take the default value:

∀ b. P (K f b) ∀ f̂ a b. P f̂ −→ P f̂ (a :=f b)
(9)
P ĝ

Intuitively, P holds already for all FinFuns ĝ if (i) P (K f b) holds for all constant
FinFuns K f b and (ii) whenever P f̂ holds, then P f̂ (a :=f b) holds, too. From
this, a case distinction theorem is easily derived:

(∃ b. ĝ = (K f b)) ∨ (∃ f̂ a b. ĝ = f̂ (a :=f b)) (10)

Both induction rule and case distinction theorem are weak in the sense that
the f̂ in the case for point-wise update is quantiﬁed without further constraints.
Since K f and pointwise update are not distinct – cf. (6), proofs that do case
analysis on FinFuns must always handle both cases even for constant FinFuns.
Stronger induction and case analysis theorems could, however, be derived.

2.2 Representing FinFuns in the Code Generator

As mentioned above, the code generator represents FinFuns as a datatype with
constant FinFun and pointwise update as (free) constructors. In Haskell, e.g.,
the following code is generated:
data Finfun a b = Finfun_upda t e_ c od e ( Finfun a b ) a b
| Finfun_const b ;
For efficiency reasons, we do not use finfun-update as a constructor for the Finfun
datatype, as overwritten updates then would not get removed, the function’s rep-
resentation would keep growing. Instead, the HOL constant finfun-update-code ,
denoted (| :=f |), is employed, which is semantically equivalent: f̂ (|a :=f b|) ≡
f̂ (a :=f b). The code for finfun-update, however, is generated from (11) and (12):

(K f b)(a :=f b ) = if b = b then K f b else (K f b)(|a :=f b |) (11)

f̂ (|a :=f b|)(a :=f b ) = if a = a then f̂ (a :=f b ) else f̂ (a :=f b )(|a :=f b|) (12)

finfun_update :: forall a b . (Eq a , Eq b ) =>

Finfun a b -> a -> b -> Finfun a b ;
finfun_update ( Finfun_upd a te _ co d e f a b ) a ’ b ’ =
( i f eqop a a ’ then finfun_update f a b ’
e l s e Finfun_upd at e _c o de ( finfun_update f a ’ b ’) a b );
finfun_update ( Finfun_const b ) a b ’ =
( i f eqop b b ’ then Finfun_const b
e l s e Finfun_upd at e _c o de ( Finfun_const b ) a b ’);
where eqop is the HOL equality operator given by eqop a = (\ b -> a == b);.
Hence, an update with ( :=f ) is checked against all other updates, all overwrit-
ten updates are thereby removed, and inserted only if it does not update to the
Formalising FinFuns 315

default value. Using ( :=f ) in the logic ensures that on the code level, every
FinFun is stored with as few updates as possible given the fixed default value.2
Let, e.g., f̂ = (K f 0)(|1 :=f 5|)(|2 :=f 6|). When f̂ is updated at 1 to 0, f̂ (1 :=f 0)
evaluates on the code level to (K f 0)(|2 :=f 6|), where all redundant updates at 1
have been removed. If the explicit code update function had been used instead,
the last update would have been added to the list of updates: f̂ (|1 :=f 0|) evaluates
to (K f 0)(|1 :=f 5|)(|2 :=f 6|)(|1 :=f 0|). Exactly this problem of superfluous updates
would occur if ( :=f ) was directly used as a constructor in the exported code.
In case this optimisation is undesired, one can use finfun-update-code instead
of finfun-update. Redundant updates in the representation on the code level can
subsequently be deleted by invoking the finfun-clearjunk operator: Semantically,
this is the identity function: finfun-clearjunk ≡ id, but it is implemented using
the following to equations that remove all redundant updates:
finfun-clearjunk (K f b) = (K f b) and finfun-clearjunk f̂ (|a :=f b|) = f̂ (a :=f b)

Consequently, every function that is deﬁned recursively on FinFuns must provide

two such equations for K f and (| :=f |) for being executable. For function
application, e.g., those from (4) are used with ﬁnfun-update being replaced by
ﬁnfun-update-code.
For quickcheck , we have installed a sampling function that randomly creates
a FinFun which has been updated at a few random points to random values.
Hence, quickcheck can now both evaluate operators involving FinFuns and sam-
ple random values for the free variables of FinFun type in a conjecture.

3 Operators for FinFuns

In the previous section, we have shown how FinFuns are deﬁned in Isabelle/HOL
and how they are implemented in code. This section introduces more executable
operators on FinFuns moving from basic ones towards executable equality.

3.1 Function Composition

The most important operation on functions and FinFuns alike – apart from
application – is composition. It creates new FinFuns from old ones without
losing executability: Every ordinary function g:: b ⇒ c can be composed with a
FinFun f̂ of type a ⇒f b to produce another FinFun g ◦f f̂ of type a ⇒f c.
The operator ◦f is defined like the kernel functions via Abs-finfun and Rep-finfun:
g ◦f f̂ ≡ Abs-finfun (g ◦ Rep-finfun f̂ )

To the code generator, two recursive equations are provided:

g ◦f (K f c) = (K f g c) and g ◦f f̂ (|a :=f b|) = (g ◦f f̂ )(|a :=f g b|) (13)
2
Minimal is relative to the default value in the representation (which need not coincide
with ﬁnfun-default) – i.e. this does not include the case where changing this default
value would require less updates. (K f 0)(True :=f 1)(False :=f 1) of type bool ⇒f
nat, e.g., is stored as (K f 0)(|False :=f 1|)(|True :=f 1|), whereas K f 1 would also do.
316 A. Lochbihler

◦f is more versatile than composition on FinFuns only, because ordinary func-

tions can be written directly thanks to λ abstraction. Yet, a FinFun ĝ is equally
easily composed with another FinFun f̂ if we convert the first one back to ordi-
nary functions: ĝ f ◦f f̂. However, composing a FinFun with an ordinary function
is not as simple. Although the definition is again straightforward:
f̂ f◦ g ≡ Abs-finfun (Rep-finfun f̂ ◦ g),
reasoning about f ◦ is more difficult: Take, e.g., f̂ = (K f 2)(1 :=f 1) and g = (λx.
x mod 2). Then, f̂ f ◦ g ought to be the function that maps even numbers to 2
and odd ones to 1, which is not a FinFun any more. Hence, (3) can no longer be
used to reason about f̂ f ◦ g, so nothing nontrivial can be deduced about f̂ f ◦ g.
If g is injective (written inj g ), then f̂ f ◦ g behaves as expected on updates:
f̂ (b :=f c) f ◦ g = (if b ∈ range g then (f̂ f◦ g)(g −1 b :=f c) else f̂ f◦ g), (14)
where range g denotes the range of g and g −1 is the inverse of g. Clearly, both
b ∈ range g and g −1 b are not executable for arbitrary g, so this conditional
equality is not suited for code generation. If terms involving f ◦ are to be executed,
the above equation must be specialised to a specific g to become executable. The
constant case is trivial for all g and need not be specialised: (K f c) f ◦ g = (K f c).
This composition operator is good for reindexing the domain of a FinFun:
Suppose, e.g., we need ĥf x = f̂ f (x + a) for some a::int, then ĥ could be defined
as ĥ ≡ f̂ f ◦ g with g = (λx. x + a). Clearly, inj g, range g = UNIV and g −1 = (λx.
x − a), so (14) simplifies to f̂ (b :=f c) f ◦ g = (f̂ f ◦ g)(b − a :=f c). Unfortunately,
the code generator cannot deal with such specialised recursion equations where
the second parameter of f ◦ is instantiated to g, so a new constant shift f̂ a ≡
f̂ f ◦ (λx. x + a) must be introduced for the code generator with the recursion
equations shift (K f b) a = (K f b) and shift f̂ (|a :=f b|) a = (shift f̂ a)(a − a :=f b).

3.2 FinFuns and Pairs

Apart from composing FinFuns one after another, one often has to “run” FinFuns
in parallel, i.e. evaluate both on the same argument and return both results as
a pair. For two functions f and g, this is done by the term λx. (f x, g x). For two
FinFuns f̂ and ĝ, λ abstraction is not executable, but an appropriate operator
(f̂ , ĝ)f is easily defined as
(f̂ , ĝ)f ≡ Abs-finfun (λx. (Rep-finfun f̂ x, Rep-finfun ĝ x)).

This operator is most useful when two FinFuns are to be combined pointwise
by some combinator h, which is then ◦f -composed with this diagonal operator:
Suppose, e.g., that f̂ and ĝ are two integer FinFuns and we need their pointwise
sum, which is (λ(x, y). x + y) ◦f (f̂ , ĝ)f , i.e. h is uncurried addition. The code
equations are straight forward again:
(K f b, K f c)f = K f (b, c) (15)
(K b, ĝ(|a :=f c|))f = (K f b, ĝ)f (|a :=f (b, c)|)
f
(16)
f f
(f̂ (|a :=f b|), ĝ) = (f̂ , ĝ) (a :=f (b, ĝ f a)) (17)
Formalising FinFuns 317

3.3 Executable Quantiﬁers

Quantifiers in Isabelle/HOL are defined as higher-order functions. The universal
quantifier All is defined by All P ≡ P = (λx. True) where P is a predicate and the
binder notation ∀ x. P x is then just syntactic sugar for All (λx. P x). This also
explains the error message of the code generator from Sec. 1. However, with-
out λ-abstraction, there is no such nice notation for FinFuns, but the operator
finfun-All for universal quantification over FinFun predicates is straightforward:
finfun-All P̂ ≡ ∀ x. P̂ f x .
Clearly, reducing universal quantification over FinFuns to All does not help
with code generation, which was the main point in introducing FinFuns in the
first place. However, we can exploit the explicit representation of P̂. To that end,
a more general operator ff-All of type a list ⇒ a ⇒f bool ⇒ bool is necessary
which ignores all points of P̂ that are listed in the first argument:
ff-All as P̂ ≡ ∀ a. a ∈ set as ∨ P̂ f a

Clearly, finfun-All = ff-All [] holds. The extra list as keeps track of which points
have already been updated and can be ignored in recursive calls:
ff-All as (K f b) ←→ b ∨ set as = UNIV (18)
ff-All as P̂(|a :=f b|) ←→ (a ∈ set as ∨ b) ∧ ff-All (a·as) P̂ (19)
In the recursive case, the update a to b must either be overwritten by a previous
update (a ∈ set as ) or have b equal to True. Then, for the recursive call, a is
added to the list as of visited points. In the constant case, either the constant is
True itself or all points of the domain a have been updated (set as = UNIV ).
Via finfun-All = ff-All [], finfun-All is now executable, provided the test
set as = UNIV can be operationalised. Since as:: a list is a (finite) list, set as
is by construction always finite. Thus, for infinite domains a, this test always
fails. Otherwise, if a is finite, such a test can be easily implemented.
Note that this distinction can be directly made on the basis of type informa-
tion. Hence, we shift this subtle distinction into a type class such that the code
automatically picks the right implementation for set as = UNIV based on type
information. Axiomatic type classes [7] allow for HOL constants being safely
overloaded for different types and are correctly handled by Haftmann’s code
generator [6]. If the output language supports type classes like e.g. Haskell does,
this feature is directly employed. Otherwise, functions in generated code are
provided with an additional dictionary parameter that selects the appropriate
implementation for overloaded constants at runtime.
For our purpose, we introduce a new type class card-UNIV with one parameter
card-UNIV and the axiom that card-UNIV :: a itself ⇒ nat returns the cardinality
of a’s universe:
card-UNIV x = card UNIV (20)
By default, the cardinality of a type’s universe is just a natural number of type
nat, which itself is not related to a at all. Hence, card-UNIV takes an artificial
parameter of type a itself, where itself represents types at the level of values:
TYPE( a) is the value associated with the type a.
318 A. Lochbihler

As every HOL type is inhabited, card-UNIV TYPE( a) can indeed be used to
discriminate between types with finite and infinite universes by testing against 0:
finite (UNIV :: a set) ←→ 0 < card-UNIV TYPE( a)
Moreover, the test set as = UNIV can now be written as is-list-UNIV as with
is-list-UNIV as ≡
let c = card-UNIV TYPE ( a) in if c = 0 then False else |remdups as| = c
where remdups as removes all duplicates from the list as.
Note that the constraint (20) on the type class parameter card-UNIV, which
is to be overloaded, is purely definitional. Thus, every type could be made mem-
ber of the type class card-UNIV by instantiating card-UNIV to λa. card UNIV.
However, for executability, it must be instantiated such that the code generator
can generate code for it. This has been done for the standard HOL types like
unit, bool, char, nat, int, and a list, for which it is straightforward if one remem-
bers that card A = 0 for all infinite sets A. For the type bool, e.g., card-UNIV
a ≡ 2 for all a::bool itself . The cardinality of the universe for polymorphic type
constructors like e.g. a × b is computed by recursion on the type parameters:
card-UNIV TYPE( a × b) = card-UNIV TYPE( a) · card-UNIV TYPE( b)
We have similarly instantiated card-UNIV for the type constructors a ⇒ b,

a option and a + b.
As we have the universal quantifier finfun-All, the executable existential
quantifier is straightforward by duality: finfun-Ex P̂ ≡ ¬ finfun-All (Not ◦f P̂). As
before, the pretty-print syntax ∃ x. P x for Ex (λx. P x) in HOL cannot be trans-
ferred to FinFuns because λ-abstraction is not suited for code generation.

3.4 Executable Equality on FinFuns

Our second main goal with FinFuns, besides executable quantifiers, is executable
equality tests on FinFuns. Extensionality – cf. (5) – reduces function equality to
equality on every argument. However, (5) does not directly yield an implemen-
tation because it uses the universal quantifier All for ordinary HOL predicates,
but some rewriting does the trick:
f̂ = ĝ ←→ finfun-All ((λ(x, y). x = y) ◦f (f̂ , ĝ)f ) (21)
By instantiating the HOL type class eq appropriately, the equality operator =
becomes executable and in the generated code, an appropriate equality relation
on the datatype is generated. In Haskell, e.g., the equality operator == on the
type Finfun a b then really denotes equality like on the logic level:
eq_finfun :: forall a b . ( FinFun . Card_UNIV a , Eq a , Eq b ) =>
FinFun . Finfun a b -> FinFun . Finfun a b -> Bool;
eq_finfun f g = FinFun . finfun_All
( FinFun . finfun_comp (\ ( a @ ( aa , b )) -> aa == b )
( FinFun . finfun_Diag f g ));
instance ( FinFun . Card_UNIV a , Eq a , Eq b ) =>
Eq ( FinFun . Finfun a b ) where { (==) = FinFun . eq_finfun ; };
Formalising FinFuns 319

3.5 Complexity
In this section, we briefly discuss the complexity of the above operators. We
assume that equality tests require constant time. For a FinFun f̂, let #f̂ denote
the number of updates in its code representation. For an ordinary function g, let
#g denote the complexity of evaluating g a for any a.
K f has constant complexity as it is a finfun constructor. Since ( :=f )
automatically removes redundant updates (11, 12), f̂ ( :=f ) is linear in #f̂, and
so is application f̂ f (4). For g ◦f f̂, eq. (13) is recursive in f̂ and each recursion
step involves ( :=f ) and evaluating g, so the complexity is O((#f̂ )2 + #f̂ · #g).
For the product (f̂ , ĝ)f , we get: The base case (K f b, ĝ)f (15, 16) is linear in
#ĝ and we have #(K f b, ĝ)f = #ĝ. An update in the first parameter (f̂ (|a :=f b|),
ĝ)f (17) executes ĝ f a (O(#ĝ)), the recursive call and the update (O(#(f̂ , ĝ)f )).
Since there are #f̂ recursive calls and #(f̂ , ĝ)f ≤ #f̂ + #ĝ, the total complexity
is bound by O(#f̂ · (#f̂ + #ĝ)).
Since finfun-All is directly implemented in terms of ff-All, it is sufficient to anal-
yse the latter’s complexity: The base case (18) essentially executes is-list-UNIV.
If we assume that the cardinality of the type universe is computed in constant
time, is-list-UNIV as is bound by O(|as|2 ) since remdups as takes O(|as|2 ) steps. In
case of an update (19), the updated point is checked against the list as (O(|as|))
and the recursive call is executed with the list as being one element longer, i.e.
|as| grows by one for each recursive call. As there are #P̂ many recursive calls,
ff-All as P̂ has complexity #P̂ · O(#P̂ + |as|) + O((#P̂ + |as|)2 ) = O((#P̂ + |as|)2 ).
Hence, finfun-All P̂ has complexity O((#P̂)2 ).
Equality on FinFuns f̂ and ĝ is then straightforward (21): (f̂ , ĝ)f is in
O(#f̂ · (#f̂ + #ĝ)). Composing this with λ(x, y). x = y takes O((#(f̂ , ĝ)f )2 )
⊆ O((#f̂ + #ĝ)2 ). Finally, executing finfun-All is quadratic in #((λ(x, y). x = y)
◦f (f̂ , ĝ)f ) ≤ #(f̂ , ĝ)f . In total, f̂ = ĝ has complexity O((#f̂ + #ĝ)2 ).

4 A Recursion Combinator
In the previous section, we have presented several operators on FinFuns that
suffice for most purposes, cf. Sec. 5. However, we had to define function com-
position with FinFuns on either side and operations on products manually by
going back to the type’s carrier set finfun via Rep-finfun and Abs-finfun. This is
not only inconvenient, but also loses the abstraction from the details of the finite
set of updated points that FinFuns provide. In particular, one has to derive extra
recursion equations for the code generator and prove each of them correct.
Yet, the induction rule (9) states that the recursive equations uniquely deter-
mine any function that satisfies these. Operations on FinFuns could therefore
be defined by primitive recursion similarly to datatypes (cf. [2]). Alas, the two
FinFun constructors are not free, so not every pair of recursive equations does
indeed define a function. It might also well be the case that the equations are
contradictory: For example, suppose we want to define a function count that
counts the number of updates, i.e. count (K f c) = 0 and count f̂ (|a :=f b|) = count
f̂ + 1. Such a function does not exist for FinFuns in Isabelle, although it could
320 A. Lochbihler

be deﬁned in Haskell to, e.g., compute extra-logic data such as memory con-
sumption. Take, e.g., f̂ ≡ (K f 0)(|0 :=f 0|). Then, count f̂ = count (K f 0) + 1 = 1,
but f̂ = (K f 0) by (6) and thus count f̂ = 0 would equally have to hold, because
equality is congruent w.r.t. function application, a contradiction.

4.1 Lifting Recursion from Finite Sets to FinFuns

More abstractly, the right hand side of the recursive equations can be considered
as a function: For the constant case, such a function c:: b ⇒ c takes the constant
value of the FinFun and evaluates to the right hand side. In the recursive case,
u:: a ⇒ b ⇒ c ⇒ c takes the point of the update, the new value at that point and
the result of the recursive call. In this section, we define a combinator finfun-rec
that takes c and u and defines the corresponding operator on FinFuns, simi-
larly to the primitive recursion combinators that are automatically generated
for datatypes. That is, finfun-rec must satisfy (22) and (23), subject to certain
well-formedness conditions on c and u, which will be examined in Sec. 4.2.
finfun-rec c u (K f b) = c b (22)
finfun-rec c u f̂ (a :=f b) = u a b (finfun-rec c u f̂ ) (23)
The standard means in Isabelle for defining recursive functions, namely recdef
and the function package [10], are not suited for this task because both need a
termination proof, i.e. a well-founded relation in which all recursive calls always
decrease. Since K f and ( :=f ) are not free constructors, there is no such
termination order for (22) and (23). Hence, we define finfun-rec by recursion on
the finite set of updated points using the recursion operator fold for finite sets:
finfun-rec c u f̂ ≡
let b = finfun-default f̂ ;
g = (ιg. f̂ = Abs-finfun (map-default b g) ∧ finite (dom g) ∧ b ∈
/ ran g)
in fold (λa. u a (map-default b g a)) (c b) (dom g)
In the let expression, f̂ is unpacked into its default value b (cf. Sec. 2) and a
partial function g:: a b such that f̂ = Abs-finfun (map-default b g) and the
finite domain of g contains only points at which f̂ differs from its default value
b, i.e. g stores precisely the updates of f̂. Then, the update function u is folded
over the finite set of points dom g where f̂ does not take its default value b.
All FinFun operators that we have defined in Sec. 3 via Abs-finfun and
Rep-finfun can also be defined directly via finfun-rec. For example, the functions
for ◦f directly show up in the recursive equations from (13):
g ◦f f̂ ≡ finfun-rec (λb. K f g b) (λa b f̂ . f̂ (a :=f g b)) f̂.

4.2 Well-Formedness Conditions

Since all functions in HOL are total, finfun-rec c u is defined for every combination
of c and u. Any nontrivial property of finfun-rec is only provable if u is left-
commutative because fold is unspecified for other functions. Thus, the next step
is to establish conditions on the FinFun level that ensure (22) and (23). It turns
out that four are sufficient:
Formalising FinFuns 321

u a b (c b) = c b (24)
u a b (u a b (c b)) = u a b (c b) (25)

a = a −→ u a b (u a b d) = u a b (u a b d) (26)

finite UNIV −→ fold (λa. u a b ) (c b) UNIV = c b (27)
Eq. (24), (25), and (26) naturally reflect the equalities between the constructors
from (6), (7), and (8), respectively. It is sufficient to restrict overwriting updates
(25) to constant FinFuns because the general case directly follows from this by
induction and (26). The last equation (27) arises from the identity
finite UNIV −→ fold (λa f̂ . f̂ (a :=f b )) (K f b) UNIV = (K f b ). (28)
Eq. (24), (25), and (26) are sufficient for proving (23). For a FinFun operator
like ◦f , these constraints must be shown for specific c and u, which is usually
completely automatic. Even though (27), which is required to deduce (22), must
usually be proven by induction, this normally is also automatic, because for finite
types a, a ⇒ b and a ⇒f b are isomorphic via Abs-finfun and Rep-finfun.

5 Applications
In this section, we present two applications for FinFuns to demonstrate that the
operations from Sec. 3 form a reasonably complete set of abstract operations.
1. They can be used to represent sets as predicates with the standard opera-
tions all being executable: membership and subset test, union, intersection,
complement and bounded quantiﬁcation.
2. FinFuns have been inspired by the needs of JinjaThreads [12], which is a
formal semantics of multithreaded Java in Isabelle. We show how FinFuns
prove essential on the way to generating an interpreter for concurrent Java.

5.1 Representing Sets with Finfuns

In Isabelle 2008, the proper type a set for sets has been removed in favour of
predicates of type a ⇒ bool to eliminate redundancies in the implementation
and in the library. As a consequence, Isabelle’s new code generator is no longer
able to generate code for sets as before: A finite set had been coded as the list
of its elements. Hence, e.g. the complement operator has not been executable
because the complement of a finite set might no longer be a finite set. Neither
are collections of the form {a | P a} suited for code generation.
Since FinFuns are designed for code generation, they can be used for repre-
senting sets in explicit form without explicitly introducing a set type of its own.
FinFun set operations like membership and inclusion test, union, intersection
and even complement are straightforward using ◦f . As before, these operators
are decorated with f subscripts to distinguish them from their analogues on sets:
f̂ ⊆f ĝ ≡ finfun-All ((λ(x, y). x −→ y) ◦f (f̂ , ĝ)f ) − f̂ ≡ (λb. ¬ b) ◦f f̂
f̂ ∪f ĝ ≡ (λ(x, y). x ∨ y) ◦f (f̂ , ĝ)f f̂ ∩f ĝ ≡ (λ(x, y). x ∧ y) ◦f (f̂ , ĝ)f
Obviously, these equations can be directly translated into executable code.
322 A. Lochbihler

However, if we were to reason with them directly, most theorems about sets
(as predicates) would have to be replicated for FinFuns. Although this would be
straightforward, loads of redundancy would be reintroduced this way. Instead,
we propose to inject FinFun sets via f into ordinary sets and use the standard
operations on sets to work with them. The code generator is set up such that
it preprocesses all equations for code generation and automatically replaces set
operations with their FinFun equivalents by unfolding equations such as Af ⊆ B f
←→ A ⊆f B and Af ∪ B f = (A ∪f B)f . This approach works for quickcheck , too.
Besides the above operations, bounded quantification is also straightforward:
finfun-Ball Â P ≡ ∀ x∈Âf . P x and finfun-Bex Â P ≡ ∃ x∈Âf . P x
Clearly, they are not executable right away. Take, e.g., Â = (K f True), i.e. the
universal set, then finfun-Ball Â P ←→ (∀ x. P x), which is undecidable if x ranges
over an infinite domain. However, if we go for partial correctness, correct code can
be generated: Like for the universal quantifier finfun-All for FinFun predicates
(cf. Sec. 3.3), ff-Ball is introduced which takes an additional parameter xs to
remember the list of points which have already been checked at previous calls.
ff-Ball xs Â P ≡ ∀ a∈Âf . a ∈ set xs ∨ P a.

This now permits to set up recursive equations for the code generator:
ff-Ball xs (K f b) P ←→ ¬ b ∨ set xs = UNIV ∨ loop (λu. ff-Ball xs (K f b) P)
ff-Ball xs Â(|a :=f b|) P ←→ (a ∈ set xs ∨ (b −→ P a)) ∧ ff-Ball (a·xs) Â P

In the constant case, if b is false, i.e. the set is empty, ﬀ-Ball holds; similarly, if
all elements of the universe have been checked already, this test is again imple-
mented by the overloaded term is-list-UNIV xs (Sec. 3.3). Otherwise, one would
have to check whether P holds at all points except xs, which is not computable
for arbitrary P and a. Thus, instead of evaluating its argument, the code for
loop never terminates. In Isabelle, however, loop is simply the unit-lifted identity
function: loop f ≡ f (). Of course, an exception could equally be raised in place of
non-termination. The bounded existential quantiﬁer is implemented analogously.

5.2 JinjaThreads
Jinja [9] is an executable formal semantics for a large subset of Java source-
code and bytecode in Isabelle/HOL. JinjaThreads [11] extends Jinja with Java’s
thread features on both levels. It contains a framework semantics which inter-
leaves the individual threads whose small-step semantics is given to it as a pa-
rameter. This framework semantics takes care of all management issues related
to threads: The thread pool itself, the lock state, monitor wait sets, spawning
and joining a thread, etc. Individual threads communicate via the shared mem-
ory with each other and via thread actions like Lock, Unlock, Join, etc. with the
framework semantics. At every step, the thread speciﬁes which locks to acquire
or release how many times, which thread to create or join on. In our previous
work [12], this communication was modelled as a list of such actions, and a lot
Formalising FinFuns 323

of pointless work went into identifying permutations of such lists which are se-
mantically equivalent. Therefore, this has been changed such that every lock of
type l now has its own list. Since only finitely many locks need to be changed
in any single step, these lists are stored in a FinFun such that checking whether
a step’s actions are feasible in a given state is executable.
Moreover, in developing JinjaThreads, we have found that most lemmas about
the framework semantics contain non-executable assumptions about the thread
pool or the lock state, in particular universal quantifiers or predicates defined in
terms of them. Therefore, we replaced ordinary functions that model the lock
state (type l ⇒ t lock ) and the thread pool (type t ( x, l) thread ) with
FinFuns. Rewriting the existing proofs took very little effort because mostly,
only f s in subscript or superscript had to be added to the proof texts because
Isabelle’s simplifier and classical reasoner are set up such that FinFuns indeed
behave like ordinary functions.
Not to break the proofs, we did not remove the universal quantifiers in the
definitions of predicates themselves, but provided simple lemmas to the code
generator. For example, locks-ok ls t las checks whether all lock requests las of
thread t can be met in the lock state ls and is defined as locks-ok ls t las ≡ ∀ l.
lock-ok (ls f l) t (las f l), whereas the equation for code generation is

locks-ok ls t las = ﬁnfun-All ((λ(l, la). lock-ok l t la) ◦f (ls, las)f ).

Unfortunately, JinjaThreads is not yet fully executable because the semantics

of a single thread relies on inductive predicates. Once the code generator will
handle these, we will have a certiﬁed Jinja virtual machine with concurrency to
execute multithreaded Jinja programs as has been done for sequential ones [9].

6 Related Work and Conclusion

Related work. To represent (partial) functions explicitly by a list of point-value
pairs is common knowledge in computer science, partial functions a b with
finite domain have even been formalised as associative lists in the Isabelle/HOL
library. However, it is cumbersome to reason with them because one single func-
tion has multiple representations, i.e. associative lists are not extensional. Coq
and HOL4, e.g., also come with a formalisation of finite maps of their own and
both of them fix their default value to None. Collins and Syme [4] have already
provided a theory of partial functions with finite domain in terms of the ev-
erywhere undefined function and pointwise update. Similar to (4), (7), and (8),
they axiomatize a type ( a, b) fmap in terms of abstract operations Empty, Up-
date, Apply :: ( a, b) fmap ⇒ a ⇒ b, and Domain and present two models: Maps

a b with finite domain and associative lists where the order of their elements
is determined with Hilbert’s choice operator, but neither of these supports code
generation. Moreover, equality is not extensional like ours (5), but guarded by
the domains. Since these partial functions have an unspecified default value that
is implicitly fixed by the codomain type and the model, they cannot be used for
almost everywhere constant functions where the default value may differ from
function to function. Consequently, (28) is not expressible in their setting.
324 A. Lochbihler

Recursion over non-free kernel functions is also a well-known concept: Nipkow

and Paulson [14], e.g., define a fold operator for finite sets which are built from
the empty set and insertion of one element. However, they do not introduce a
new type for finite sets, so all equations are guarded by the predicate finite, i.e.
they cannot be leveraged by the code generator.
Nominal Isabelle [16] is used to facilitate reasoning about α-equivalent terms
with binders, where the binders are non-free term constructors. The HOL type
for terms is obtained by quotienting the datatype with the (free) term construc-
tors w.r.t. α-equivalence classes. Primitive-recursive definitions must then be
shown compatible with α-equivalence using a notion of freshness [17]. It is tempt-
ing to define the FinFun type universe similarly as the quotient of the datatype
with constructors K f and (| :=f |) w.r.t. the identities (6), (7), (8), and (28),
because this would settle exhaustion, induction and recursion almost automati-
cally. However, this construction is not directly possible because (28) cannot be
expressed as an equality of kernel functions. Instead, we have defined the carrier
set finfun directly in terms of the function space and established easy, sufficient
(and almost necessary) conditions for recursive definitions being well-formed.

Conclusion. FinFuns generalise ﬁnite maps by continuing them with a default

value in the logic, but for the code generator, they are implemented like associa-
tive lists which suffer from multiple representations for a single function. Thus,
they bridge the gap between easy reasoning and these implementation issues
arising from functions as data: They are as easy to use as ordinary functions.
By not fixing a default value (like None for maps), we have been able to easily
apply them to very diverse settings.
We have decided to restrict the FinFun carrier set finfun to functions that
are constant almost everywhere. Although everything from Sec. 3 would equally
work if that restriction was lifted, the induction rule (9) and recursion operator
(Sec. 4) would then no longer be available, i.e. the datatype generated by the
code generator would not exhaust the type in the logic. Thus, the user could
not be sure that every FinFun from his formalisation can be represented as data
in the generated code. Conversely, not every operator can be lifted to FinFuns:
The image operator ‘ on sets, e.g., has no analogue on FinFun sets.
Clearly, FinFuns are a very restricted set of functions, but we have demon-
strated that this lightweight formalisation is in fact useful and easy to use. In
Sec. 3, we have outlined the way to executing equality on FinFuns, but we
need not stop there: Other operators like e.g. currying, λ-abstraction for Fin-
Funs a ⇒f b with a finite, and even the definite description operator ιx. P̂ f x
can all be made executable via the code generator. In terms of usability, Fin-
Funs currently provide little support for defining new operators that can not be
expressed by the existing ones: For example, recursive equations for the code
generator must be stated explicitly, even if the definition explicitly uses the re-
cursion combinator. But with some implementation effort, definitions and the
code generator setup could be automated in the future.
For quickcheck , our implementation with at most quadratic complexity is
sufficiently efficient because random FinFuns involve only a few updates. For
Formalising FinFuns 325

larger applications, however, one is interested in more eﬃcient representations. If,

e.g., the domain of a FinFun is totally ordered, binary search trees are a natural
option, but this requires considerable amount of work: (Balanced) binary trees
must be formalised and proven correct, which could be based e.g. on [15], and all
the operators that are recursive on a FinFun must be reimplemented. In practice,
the user should not care about which implementation the code generator chooses,
but such automation must overcome some technical restrictions, such as only one
type variable for type classes or only unconditional rewrite rules for the code
generator, perhaps by recurring on ad-hoc translations.

References
1. Berghofer, S., Nipkow, T.: Random testing in Isabelle/HOL. In: Proc. SEFM 2004,
pp. 230–239. IEEE Computer Society, Los Alamitos (2004)
2. Berghofer, S., Wenzel, M.: Inductive datatypes in HOL – lessons learned in formal-
logic engineering. In: Bertot, Y., Dowek, G., Hirschowitz, A., Paulin, C., Théry, L.
(eds.) TPHOLs 1999. LNCS, vol. 1690, pp. 19–36. Springer, Heidelberg (1999)
3. Berghofer, S., Nipkow, T.: Executing higher order logic. In: Callaghan, P., Luo, Z.,
McKinna, J., Pollack, R. (eds.) TYPES 2000. LNCS, vol. 2277, pp. 24–40. Springer,
Heidelberg (2002)
4. Collins, G., Syme, D.: A theory of finite maps. In: Schubert, E.T., Alves-Foss, J.,
Windley, P. (eds.) HUG 1995. LNCS, vol. 971, pp. 122–137. Springer, Heidelberg
(1995)
5. Dybjer, P., Haiyan, Q., Takeyama, M.: Combining testing and proving in dependent
type theory. In: Basin, D., Wolff, B. (eds.) TPHOLs 2003. LNCS, vol. 2758, pp.
188–203. Springer, Heidelberg (2003)
6. Haftmann, F., Nipkow, T.: A code generator framework for Isabelle/HOL. Techni-
cal Report 364/07, Dept. of Computer Science, University of Kaiserslautern (2007)
7. Haftmann, F., Wenzel, M.: Constructive type classes in Isabelle. In: Altenkirch,
T., McBride, C. (eds.) TYPES 2006. LNCS, vol. 4502, pp. 160–174. Springer,
Heidelberg (2007)
8. Harrison, J.: Metatheory and reflection in theorem proving: A survey and critique.
Technical Report CRC-053, SRI International Cambridge Computer Science Re-
search Centre (1995)
9. Klein, G., Nipkow, T.: A machine-checked model for a Java-like language, virtual
machine and compiler. ACM TOPLAS 28, 619–695 (2006)
10. Krauss, A.: Partial recursive functions in higher-order logic. In: Furbach, U.,
Shankar, N. (eds.) IJCAR 2006. LNCS, vol. 4130, pp. 589–603. Springer, Hei-
delberg (2006)
11. Lochbihler, A.: Jinja with threads. The Archive of Formal Proofs. Formal proof
development (2007), http://afp.sf.net/entries/JinjaThreads.shtml
12. Lochbihler, A.: Type safe nondeterminism - a formal semantics of Java threads. In:
FOOL 2008 (2008)
13. Lochbihler, A.: Code generation for functions as data. The Archive of Formal
Proofs. Formal proof development (2009),
http://afp.sf.net/entries/FinFun.shtml
14. Nipkow, T., Paulson, L.C.: Proof pearl: Defining functions over finite sets. In: Hurd,
J., Melham, T. (eds.) TPHOLs 2005. LNCS, vol. 3603, pp. 385–396. Springer,
Heidelberg (2005)
326 A. Lochbihler

15. Nipkow, T., Pusch, C.: AVL trees. The Archive of Formal Proofs. Formal proof
development (2004), http://afp.sf.net/entries/AVL-Trees.shtml
16. Urban, C.: Nominal techniques in Isabelle/HOL. Journal of Automatic Reason-
ing 40(4), 327–356 (2008)
17. Urban, C., Berghofer, S.: A recursion combinator for nominal datatypes implemen-
ted in Isabelle/HOL. In: Furbach, U., Shankar, N. (eds.) IJCAR 2006. LNCS,
vol. 4130, pp. 498–512. Springer, Heidelberg (2006)

A Notation
Isabelle/HOL formulae and propositions are close to standard mathematical
notation. This subsection introduces non-standard notation, a few basic data
types and their primitive operations.
Types is the set of all types which contains, in particular, the type of truth
values bool, natural numbers nat, integers int, and the singleton type unit with its
only element (). The space of total functions is denoted by a ⇒ b. Type variables
are written a, b, etc. The notation t::τ means that the HOL term t has type τ .
Pairs come with two projection functions fst and snd. Tuples are identified
with pairs nested to the right: (a, b, c) is identical to (a, (b, c)) and a × b × c
to a × ( b × c). Dually, the disjoint union of a and b is written a + b.
Sets are represented as predicates (type a set is shorthand for a ⇒ bool ), but
follow the usual mathematical conventions. UNIV :: a set is the set of all elements
of type a. The image operator f ‘ A applies the function f to every element of
A, i.e. f ‘ A ≡ {y | ∃ x∈A. y = f x}. The predicate finite on sets characterises all
finite sets. card A denotes the cardinality of the finite set A, or 0 if A is infinite.
fold f z A folds a left-commutative3 function f :: a ⇒ b ⇒ b over a finite set A ::

a set with initial value z:: b.
Lists (type a list) come with the empty list [] and the infix constructor ·
for consing. Variable names ending in “s” usually stand for lists and |xs| is the
length of xs. The function set converts a list to the set of its elements.
Function update is defined as follows: Let f :: a ⇒ b, a:: a and b:: b. Then
f (a := b) ≡ λx. if x = a then b else f x.
The option data type a option adjoins a new element None to a type a. All
existing elements in type a are also in a option, but are prefixed by Some. For
succinctness, we write !a" for Some a. Hence, for example, bool option has the
values None, !True" and !False".
Partial functions are modelled as functions of type a ⇒ b option where None
represents undefined and f x = !y" means x is mapped to y. Instead of a ⇒ b
option, we write a b and call such functions maps. f (x → y) is shorthand for
f (x := !y"). The domain of f (written dom f ) is the set of points at which f is
defined, ran f denotes the range of f. The function map-default b f takes a partial
function f and continues it at its undefined points with b.
The definite description ιx. Q x is known as Russell’s ι-operator. It denotes
the unique x such that Q x holds, provided exactly one exists.
3
f is left-commutative, if it satisfies f x (f y z) = f y (f x z) for all x, y, and z.
Packaging Mathematical Structures

François Garillot1 , Georges Gonthier2 , Assia Mahboubi3 , and Laurence Rideau4

1
Microsoft Research - INRIA Joint Centre
[email protected]
2
Microsoft Research Cambridge
[email protected]
3
Inria Saclay and LIX, École Polytechnique
[email protected]
4
Inria Sophia-Antipolis – Méditerranée
[email protected]

Abstract. This paper proposes generic design patterns to deﬁne and

combine algebraic structures, using dependent records, coercions and
type inference, inside the Coq system. This alternative to telescopes in
particular supports multiple inheritance, maximal sharing of notations
and theories, and automated structure inference. Our methodology is
robust enough to handle a hierarchy comprising a broad variety of alge-
braic structures, from types with a choice operator to algebraically closed
ﬁelds. Interfaces for the structures enjoy the convenience of a classical
setting, without requiring any axiom. Finally, we present two applica-
tions of our proof techniques: a key lemma for characterising the discrete
logarithm, and a matrix decomposition problem.

Keywords: Formalization of Algebra, Coercive subtyping, Type infer-

ence, Coq, SSReflect.

1 Introduction
Large developments of formalized mathematics demand a careful organization.
Fortunately mathematical theories are quite organized, e.g., every algebra text-
book [1] describes a hierarchy of structures, from monoids and groups to rings
and fields. There is a substantial literature [2,3,4,5,6,7] devoted to their formal-
ization within formal proof systems.
In spite of this body of prior work, however, we have found it difficult to make
practical use of the algebraic hierarchy in our project to formalize the Feit-
Thompson Theorem in the Coq system; this paper describes some of the prob-
lems we have faced and how they were resolved. The proof of the Feit-Thompson
Theorem covers a broad range of mathematical theories, and organizing this for-
malization into modules is central to our research agenda. We’ve developed[8] an
extensive set of modules for the combinatorics and set and group theory required
for the “local analysis” part of the proof, which includes a rudimentary algebraic
hierarchy needed to support combinatorial summations[9].
Extending this hierarchy to accommodate the linear algebra, Galois theory
and representation theory needed for the “character theoretic” part of the proof
has proved problematic. Specifically, we have found that well-known encodings

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 327–342, 2009.

c Springer-Verlag Berlin Heidelberg 2009
328 F. Garillot et al.

of algebraic structures using dependent types and records [2] break down in the
face of complexity; we address this issue in section 2 of this paper.
Many of the cited works focused on the definition of the hierarchy rather than
its use, making simplifying assumptions that would have masked the problems
we encountered. For example some assume that only one or two structures are
involved at any time, or that all structures are explicitly specified. The examples
in section 4 show that such assumptions are impractical: they involve several
different structures, often within the same expression, and some of which need
to be synthesized for existing types.
We have come to realize that algebraic structures are not “modules” in the
software engineering sense, but rather “interfaces”. Indeed, the mathematical
theory of, say, an abstract ring, is fairly thin. However, abstract rings provide
an interface that allows “modules” with actual contents, such as polynomials
and matrices, to be defined and, crucially, composed. The main function of an
algebraic structure is to provide common notation for expressions and for proofs
(e.g., basic lemmas) to facilitate the composition and application of these generic
modules. Insisting that an interface be instantiated explicitly each time it is used
negates this function, so it is critical that structures be inferred on the fly; we’ll
see in the next section how this can be accomplished.
Similarly, we must ensure that our algebraic interfaces are consistent with
the other modules in our development: in particular they should integrate the
existing combinatoric interfaces [8], as algebra requires equality. As described in
section 3, we have therefore adapted classical algebra to our constructive com-
binatorics. In addition to philosophical motivations (viz., allowing constructive
proof of a finitary result like the Feit-Thompson Theorem), we have practical uses
for a constructive framework: it provides basic but quite useful proof automa-
tion, via the small-scale reflection methodology supported by the SSReflect
extension to Coq [10].
Due to space constraints, we will assume some familiarity with the Coq type
system [11] (dependent types and records, proof types, type inference with im-
plicit terms and higher-order resolution) in section 2, and with the basic design
choices in the Feit-Thompson Theorem development [8] (boolean reflection, con-
crete finite sets) in sections 3 and 4.

2 Encoding Structures
2.1 Mixins
An algebraic or combinatorial structure comprises representation types (usually
only one), constants and operations on the type(s), and axioms satisﬁed by the
operations. Within the propositions-as-types framework of Coq, the interface
for all of these components can be uniformly described by a collection of depen-
dent types: the type of operations depends on the representation type, and the
statement (also a “type”) of axioms depends on both the representation type and
the actual operations.
For example, a path in a combinatorial graph amounts to
– a representation type T for nodes
– an edge relation e : rel T
Packaging Mathematical Structures 329

– an initial node x0 : T
– the sequence p : seq T of nodes that follow x0
– the axiom pP : path e x0 p asserting that e holds pairwise along x0 :: p.
The path “structure” is actually best left unbundled, with each component being
passed as a separate argument to definitions and theorems, as there is no one-to-
one relation between any of the components (there can be multiple paths with the
same starting point and relation, and conversely a given sequence can be a path
for different relations). Because it depends on all the other components, only the
axiom pP needs to be passed around explicitly; type inference can figure out T ,
e, x0 and p from the type of pP , so that in practice the entire path “structure”
can be assimilated to pP .
While this unbundling allows for maximal flexibility, it also induces a prolifer-
ation of arguments that is rapidly overwhelming. A typical algebraic structure,
such as a ring, involves half a dozen constants and even more axioms. More-
over such structures are often nested, e.g., for the Cayley-Hamilton theorem one
needs to consider the ring of polynomials over the ring of matrices over a gen-
eral commutative ring. The size of the terms involved grows as C n , where C is
the number of separate components of a structure, and n is the structure nest-
ing depth. For Cayley-Hamilton we would have C = 15 and n = 3, and thus
terms large enough to make theorem proving impractical, given that algorithms
in user-level tactics are more often than not nonlinear.
Thus, at the very least, related operations and axioms should be packed using
Coq’s dependent records (Σ-types); we call such records mixins. Here is, for
example, the mixin for a Z-module, i.e., the additive group of a vector space or
a ring:
Module Zmodule.
Record mixin_of (M : Type) : Type := Mixin {
zero : M; opp : M -> M; add : M -> M -> M;
_ : associative add; _ : commutative add;
_ : left_id zero add; _ : left_inverse zero opp add
}. ...
End Zmodule.

Here we are using a Coq Module solely to avoid name clashes with similar mixin
deﬁnitions.
Note that mixins typically provide only part of a structure; for instance a ring
structure would actually comprise a representation type and three mixins: one
for equality, one for the additive group, and one for the multiplicative monoid
together with distributivity. A mixin can depend on another one: e.g., the ring
multiplicative mixin depends on the additive one for its distributivity axioms.
Since types don’t depend on mixins (it’s the converse) type inference usually
cannot ﬁll in omitted mixin parameters; however, the type class mechanism of
Coq 8.2 [12] can do so by running ad hoc tactics after type inference.

2.2 Packed Structures

The geometric dependency of C n on n is rather treacherous: it is quite possible
to develop an extensive structure package in an abstract setting (when n = 1)
330 F. Garillot et al.

that will fail dramatically when used in practice for even moderate values of n.
The only case when this does not occur is with C = 1 — when each structure
is encapsulated into a single object. Thus, in addition to aesthetics, there is a
strong pragmatic rationale for achieving full encapsulation.
While mixins provide some degree of packaging, it falls short of C = 1.
However, mixins require one object per level in the structure hierarchy. This
is far from C = 1 because theorem proving requires deeper structure hierarchies
than programming, as structures with identical operations can differ by axioms;
indeed, despite our best efforts, our algebraic hierarchy is nine levels deep.
For the topmost structure in the hierarchy, encapsulation just amounts to
using a dependent record to package a mixin with its representation type. For
example, the top structure in our hierarchy, which describes a type with an
equality comparison operation (see [8]), could be defined as follows:

Module Equality.
Record mixin_of (T : Type) : Type :=
Mixin {op : rel T; _ : forall x y, reflect (x = y) (op x y)}.
Structure type : Type :=
Pack {sort :> Type; mixin : mixin_of sort}.
End Equality.
Notation eqType := Equality.type.
Notation EqType := Equality.Pack.
Definition eq_op T := Equality.op (Equality.mixin T).
Notation "x == y" := (@eq_op _ x y).

Coq provides two features that support this style of interface, Coercion and
Canonical Structure. The sort :> Type declaration above makes the sort pro-
jection into a coercion from type to Type. This form of explicit subtyping allows
any T : eqType to be used as a Type, e.g., the declaration x : T is understood
as x : sort T . This allows x == x to be understood as @eq_op T x x by simple
first-order unification in the Hindley-Milner type inference, as @eq_op α expects
arguments of type sort α.
Coercions are mostly useful for establishing generic theorems for abstract
structures. A different mechanism is needed to work with specific structures and
types, such as integers, permutations, polynomials, or matrices, as this calls for
construing a more specific Type as a structure object (e.g., an eqType): coercions
and more generally subtyping will not do, as they are constrained to work in the
opposite direction.
Coq solves this problem by using higher-order unification in combination
with Canonical Structure hints. For example, assuming int is the type of signed
integers, and given

Definition int_eqMixin := @Equality.Mixin int eqz ...

Canonical Structure int_eqType := EqType int_eqMixin.

Coq will interpret 2 == 2 as @eq_op int_eqType 2 2, which is convertible to

eqz 2 2. Thanks to the Canonical Structure hint, Coq ﬁnds the solution α =
int_eqType to the higher-order uniﬁcation problem sort α ≡βιδ int that arises
during type inference.
Packaging Mathematical Structures 331

2.3 Telescopes

The simplest way of packing deeper structures of a hierarchy consists in repeating

the design pattern above, substituting “the parent structure” for “representation
type”. For instance, we could end Module Zmodule with
Structure zmodType : Type := Pack {sort :> eqType; _ : mixin_of sort}.

This makes zmodType a subtype of eqType and (transitively) of Type, and allows
for the declaration of generic operator syntax (0, x + y, −x, x − y, x ∗ i), and
the declaration of canonical structures such as
Canonical Structure int_zmodType := Zmodule.Pack int_zmodMixin.

Many authors [2,13,7,5] have formalized an algebraic hierarchy using such nested
packed structures, which are sometimes referred to as telescopes [14], the term
we shall use henceforth.
As the coercion of a telescope to a representation Type is obtained by transitiv-
ity, it comprises a chain of elementary coercions: given T : zmodType, the declara-
tion x : T is understood as x : Equality.sort(Zmodule.sort T ). It is this explicit
chain that drives the resolution of higher-order unification problems and allows
structure inference for specific types. For example, the implicit α : zmodType in
the term 2 + 2 is resolved as follows: first Hindley-Milner type inference gener-
ates the constraint Equality.sort(Zmodule.sort α) ≡βιδ int. Coq then looks
up the Canonical Structure int_eqType declaration associated with the pair
(Equality.sort, int), reduces the constraint to Zmodule.sort α ≡βιδ int_eqType
which it solves using the Canonical Structure int_zmodType declaration asso-
ciated with the pair (Zmodule.sort, int_eqType). Note that int_eqType is an
eqType, not a Type: canonical projection values are not restricted to types.
Although this clever double use of coercion chains makes telescopes the sim-
plest way of packing structure hierarchies, it raises several theoretical and prac-
tical issues for deep or complex hierarchies.
Perhaps the most obvious one is that telescopes are restricted to single inher-
itance. While multiple inheritance is rare, it does occur in classical algebra, e.g.,
rings can be unitary and/or commutative. possible to fake multiple inheritance
by extending one base structure with the mixin of a second one (similarly to
what we do in Section 3.2), provided this mixin was not inlined in the definition
of the second base structure.
A more serious limitation is that the head constant of the representation type
of any structure in the hierarchy is always equal to the head of the coercion chain,
i.e., the Type projection of the topmost structure (Equality.sort here). This is
a problem because for both efficiency and robustness, coercions and canonical
projections for a type are determined by its head constant, and the topmost
projection says very little about the properties of the type (e.g., only that it has
equality, not that it is a ring or field).
There is also a severe efficiency issue: the complexity of Coq’s term compar-
ison algorithm is exponential in the length of the coercion chain. While this is
clearly a problem specific to the current Coq implementation, it is hard and
unlikely to be resolved soon, so it seems prudent to seek a design that does not
run into it.
332 F. Garillot et al.

2.4 Packed Classes

We now describe a design that achieves full encapsulation of structures, like
telescopes, but without the troublesome coercion chains. The key idea is to
introduce an intermediate record that bundles all the mixins of a structure,
but not the representation type; the latter is packed in a second stage, similarly
to the top structure of a telescope. We call this intermediate record a class, by
analogy with open-recursion models of objects, and Haskell type classes; hence
in our design structures are represented by packed classes.

type
Zmod
Class
Zmod

Mixin Mixin T Mixin Mixin

T Eq Zmod
Eq Zmod

Fig. 1. Telescopes for Equality and Fig. 2. Packed class for Zmodule
Zmodule

Here is the code for the packed class for a Z-module:

Module Zmodule.
Record mixin_of (T : Type) : Type := ...
Record class_of (T : Type) : Type :=
Class {base :> Equality.class_of T; ext :> mixin_of T}.
Structure type : Type :=
Pack {sort :> Type; class : class_of sort; _ : Type}.
Definition unpack K (k : forall T (c : class_of T), K T c) cT :=
let: Pack T c _ := cT return K _ (class cT) in k _ c.
Definition pack :=
let k T c m := Pack (Class c m) T in Equality.unpack k.
Coercion eqType cT := Equality.Pack (class cT) cT.
End Zmodule.
Notation zmodType := Zmodule.type.
Notation ZmodType := Zmodule.pack.
Canonical Structure Zmodule.eqType.

The definitions of the class_of and type records are straightforward; unpack
is a general dependent destructor for cT : type whose type is expressed in terms
of sort cT and class cT. Almost all of the code is fixed by the design pattern1 ;
indeed the definitions of type and unpack are literally identical for all packed
classes, while usually only the name of the parent class module (here, Equality)
changes in the definitions of class_of and pack.
Indeed, the code assumes that Module Equality is similarly defined. Because
Equality is a top structure, the definitions of class_of and pack in Equality
reduce to
1
It is nevertheless impractical to use the Coq Module construct to package these three
fixed definitions, because of its verbose syntax and technical limitations.
Packaging Mathematical Structures 333

Notation class_of := mixin_of.

Definition pack T c := @Pack T c T.

While Pack is the primitive constructor for type, the usual constructor is pack,
whose only explicit argument is a Z-module mixin: it uses Equality.unpack to
break the packed eqType supplied by type inference into a type and class, which
it combines with the mixin to create the packed zmodType class. Note that pack
ensures that the canonical Type projections of the eqType and zmodType structure
are exactly equal.
The inconspicuous Canonical Structure Zmodule.eqType declaration is the
keystone of the packed class design, because it allows Coq’s higher order unifica-
tion to unify Equality.sort and Zmodule.sort. Note that, crucially, int_eqType
and Zmodule.eqType int_zmodType and are convertible; this holds in general be-
cause Zmodule.eqType merely rearranges pieces of a zmodType. For a deeper struc-
ture, we will need to define one such conversion for each parent of the structure.
This is hardly inconvenient since each definition is one line, and the convertibility
property holds for any composition of such conversions.

3 Description of the Hierarchy

Figure 3 gives an account for the organization of the main structures defined in
our libraries. Starred blocks denote algebraic structures that would collapse on an
unstarred one in either a classical or an untyped setting. The interface for each
structure supplies notation, definitions, basic theory, and generic connections
with other structures (like a field being a ring).
In the following, we comment on the main design choices governing the defini-
tion of interfaces. For more details, the complete description of all the structures
and their related theory, see module ssralg on http://coqfinitgroup.gforge.
inria.fr/.
We do not package as interfaces all the possible combinations of the mixins
we define: a structure is only packaged when it will be populated in practice. For
instance integral domains and fields are defined on top of commutative rings as
in standard textbooks [1], and we do not develop a theory for non commutative
algebra, which hardly shares results with its commutative counterpart.

3.1 Combinatorial Structures

SubType structures. To handle mathematical objects like“the units of Z/nZ”,

one needs to define new types in comprehension-style, by giving a specification
over an existing type. The Coq system already provides a way to build such
new types, by the means of Σ–types (dependent pairs). Unfortunately, in gen-
eral, to compare two inhabitants of such a Σ-type, one needs to compare both
components of the pairs, i.e. comparing the elements and comparing the related
proofs.
To take advantage of the proof-irrelevance on boolean predicates when defining
these new types, we use the following subType structure:
334 F. Garillot et al.

Type

Equality
*

Type

Choice
*
SubType
*

Zmodule CountType

Ring FinType

Commutative Unit *
Ring Ring

Commutative *
Unit
Ring

IntegralDomain

Field

Decidable Field
*

Closed
Field

Fig. 3. The algebraic hierarchy in the ssreﬂect libraries

Structure subType (T : Type)(P : pred T): Type := SubType {

sub_sort :> Type;
val : sub_sort -> T;
Sub : forall x, P x -> sub_sort;
_ : forall K (_ : forall x Px, K (@Sub x Px)) u, K u;
_ : forall x Px, val (@Sub x Px) = x}.

This interface gathers a new type sub_sort for the inhabitants of type T satisfying
the boolean predicate P, with a projection val on type T, a Sub constructor,
and an elimination scheme. Now, the val projection can be proved injective: to
compare two elements of a subType structure on type T it is enough to compare
their projections on T. A simple example of subType structure equips the type of
ﬁnite ordinals:
Inductive ordinal (n : nat) := Ordinal m of m < n.

where < stands for the boolean strict order on natural numbers. Crucially, replac-
ing a primitive Coq Σ-type by this encoding makes it possible to coerce ordinal
Packaging Mathematical Structures 335

to nat. Moreover, in Coq, the deﬁnition of this inductive type automatically

generates the ord_rect associated elimination scheme. We can hence easily build
a (canonical) structure of subType on top of ordinal, by providing ord_rect to
the SubType constructor, as the other arguments are trivial.

Types with a choice function. Our intentional, proof-irrelevant representa-

tion of finite sets was sufficient to address quotients of finite objects like finite
groups [8]. However, this method does not apply to an infinite setting, where
arguments like the incomplete basis theorem are pervasive.
The construction of quotients and its practice inside type theory based proofs
assistants has been quite intensively studied. In classical systems like HOL, the
infrastructure needed to work with quotient types is now well understood [15].
Coq provides support for Setoids [16], which is a way to define quotients by
explicitly handling the involved equivalence relation, and the proved substitu-
tive contexts. In our case, quotients have to be dependent types, the dependent
parameters often being themselves (dependent) quotients. This combination of
dependent types with setoids, which has proved successful in an extensional set-
ting like NuPRL [2], is not adapted to an intentional theory like the one of
Coq. Crafting and implementing a Curry-Howard based system featuring the
appropriate balance between intentional and extensional type theories, as well
as an internalized quotient construction is still work in progress [17].
To circumvent this limitation, we combine the structure of types with equality
with a choice operator on decidable predicates in a Choice structure. This structure,
at the top of the hierarchy, is embedded in every lower level algebraic structure.
To construct objects like linear bases, we need to choose sequences of elements.
Yet a choice operator on a given type does not canonically supply a choice
operator on sequences of elements of this type. This would indeed require a
canonical encoding of (seq T) into T which is in general not possible: for the
empty void type, (seq void) has a single inhabitant, while (seq (seq void)) is
isomorphic to nat. The solution is to require a choice operator on (seq (seq T)).
This leads to a canonical structure of choice for T and any (seq .. (seq T)),
using a Gödel-style encoding.
Thus we arrive at the following definition for the Choice mixin and class:
Module Choice.
Definition xfun (T : Type) := forall P : pred T, (exists x, P x) -> T.
Definition correct (f : xfun) := forall (P : pred T) xP, P (f P xP).
Definition extensional (f : xfun) := forall P Q xP xQ,
P =1 Q -> f P xP = f Q xQ.

Record mixin_of (T : Type) : Type := Mixin {

xchoose : xfun T;
xchooseP : correct xchoose;
eq_xchoose : extensional xchoose}.

Record class_of (T : Type) : Type := Class {

base :> Equality.class_of T; ext2 : mixin_of (seq (seq T)) }.
...
End Choice.
336 F. Garillot et al.

The xfun choice operator for boolean predicates should return a witness satis-
fying P, given a proof of the existence of such a witness. It is extensional with
respect to both the proofs of existence and the predicates.
Countable structures. A choice structure will still not be transmitted to
any desired construction (like product) over types featuring themselves a choice
structure. Types with countably many inhabitants on the other side are more
amenable to transmit their countability. This leads us to define a structure for
these countable types, by requiring an injection pickle : T -> nat on the un-
derlying type T.
Since the Calculus of Inductive Constructions [11] validates the axiom of
countable choice, it is possible to derive a Choice structure from any count-
able type. However since a generic choice construction on arbitrary countable
types would not always lead to the expected choice operator, we prefer to embed
a Choice structure as base class for the Countable structure.
Finite types structures. The structure of types with a finite number of inhab-
itants is at the heart of our formalization of finite quotients [8]. The Finite mixin
still corresponds to the description given in this reference, but the FinType struc-
ture now packs this mixin with a Countable base instead of an eqType. Proofs
like the cardinal of the cartesian product of finite types make the most of this
computational content for the enumeration. Indeed the use of (computations of)
list iterators shrinks the sizes of such proofs by a factor of five compared to the
abstract case.

3.2 Advanced Algebraic Structures

Commutative rings, rings with units, commutative rings with units.
We package two diﬀerent structures for both commutative and plain rings, as
well as rings enjoying a decidable discrimination of their units. The latter struc-
ture is for instance the minimum required on a ring for a polynomial to bound
the number of roots of a polynomial on that ring by the number of its roots
(see lemma max_ring_poly_roots in module poly). For the ring Z/nZ, this unit
predicate selects coprimes to n. For matrices, it selects those having a non-zero
determinant. Its semantic and computational content can prove very eﬃcient
when developing proofs.
Yet we also want to package a structure combining the ComRing structure of
commutative ring and the UnitRing deciding units, equipping for instance Z/nZ.
This ComUnitRing structure has no mixin of its own:
Module ComUnitRing.
Record class_of (R : Type) : Type := Class {
base1 :> ComRing.class_of R;
ext :> UnitRing.mixin_of (Ring.Pack base1 R)}.
Coercion base2 R m := UnitRing.Class (@ext R m).
...
End ComUnitRing.

Its class packages the class of a ComRing structure with the mixin of a UnitRing
(which reﬂects a natural order for further instantiation). The base1 projection
Packaging Mathematical Structures 337

coerces the ComUnitRing class to its ComRing base class. Note that this deﬁnition
does not provide the required coercion path from a ComUnitRing class to its
underlying UnitRing class, which is only provided by base2. Now the canonical
structures of ComRing and UnitRing for a ComUnitRing structure will let the latter
enjoy both theories with a correct treatment of type constraints.

Decidable ﬁelds. The DecidableField structure models ﬁelds with a decidable

first order theory. One motivation for defining such a structure is our need for
the decidability of the irreductibility of representation of finite groups, which is
a valid but highly non trivial [18] property, pervasive in representation theory.
For this purpose, we define a reflected representation of first order formu-
las. The structure requires the decidability of satisfiability of atoms and their
negation. Proving quantifier elimination leads to the decidability for the full
first-order theory.

Closed fields. Algebraically closed fields are defined by requiring that any non
constant monic polynomial has a root. Since such a structure enjoys quantifier
elimination, any closed field canonically enjoys a structure of decidable field.

4 Population of the Hierarchy

The objective of this section is to give a hint of how well we meet the challenge
presented in section 1: deﬁning a concrete datatype and extending it externally
with several algebraic structures that can be used in reasoning on this type. We
aim at showing that this method works smoothly by going through the proofs of
easy lemmas that reach across our algebraic hierarchy and manipulate a variety
of structures.

4.1 Multiplicative Finite Subgroups of Fields

Motivation, notations and framework. Our first example is the well-known
property that a finite multiplicative subgroup of a field is cyclic. When applied
to F ∗ , the multiplicative group of non-null elements of a finite field F , it is
instrumental in defining the discrete logarithm, a crucial tool for cryptography.
Various textbook proofs of this result exist [19,1], prompting us to state it as:
1 Lemma field_mul_group_cyclic : forall (gT: finGroupType)
2 (G : {group gT}) (F : fieldType) (f : gT -> F),
3 {in G & G, {morph f : u v / u * v >-> (u * v)%R}} ->
4 {in G, forall x, f x = 1%R <-> x = 1} ->
5 cyclic G.

The correspondence of this lemma with its natural language counterpart be-
comes straightforward, once we dispense with a few notations:
%R : is a scope notation for ring operations.
{group gT} :
The types we deﬁned in section 3 are convenient for framing their elements in
a precise algebraic setting. However, since a large proportion of the properties
338 F. Garillot et al.

we have to consider deal with relations between sets of elements sharing

such an algebraic setting, we have chosen to define the corresponding set-
theoretic notions, for instance sets and groups, as a selection of elements of
their underlying type, as covered in [8].
{morph f : u v / u * v >-> (u * v)%R} :
This reads as :∀x, y, f (x ∗ y) = (f x) ∗R (f y).
{in G, P} :
If P is of the form ∀x, Q(x) this means ∀ x ∈ G, Q(x). Additional & symbols
(as in line 3 above) extend the notation to relativize multiple quantifiers.
The type of f along with the scope notation %R, allows Coq to infer the correct
interpretation for 1 and the product operator on line 3. field_mul_group_cyclic
therefore states that any finite group G mapped to a field F by f, an injective
group morphism for the multiplicative law of F, is cyclic.2
Fun with polynomials. Our proof progresses as follows: if a is an element of
any such group C of order n, we already know that an = 1. C thus provides at
least n distinct solutions to X n = 1 in any group G it is contained in. Moreover,
reading the two last lines of our goal above, it is clear that f maps injectively
the roots of that equation in G to roots of X n = 1 in F . Since the polynomial
X n −1 has at most n roots in F , the arbitrarily chosen C is exactly the collection
of roots of the equation in G.
This suffices to show that for a given n, G contains at most one cyclic group
of order n. Thanks to a classic lemma ([19], 2.17), this means that G is cyclic.
An extensive development of polynomial theory on a unitary ring allows us
to simply state the following definition in our proof script:
pose P : {poly F} := (’X^n - 1)%R.
The construction of the ring of polynomials with coefficients in a unitary ring
(with F canonically unifying with such a ring) is triggered by the type anno-
tation, and allows us to transparently use properties based on the datatype of
polynomials, such as degree and root lemmas, and properties of the Ring struc-
ture built on the aforementioned datatype, such as having an additive inverse.
This part of the proof can therefore be quickly dispatched in Coq. The final
lemma on cyclicity works with the cardinal of a partition of G, and is a good use
case for the methods developed in [9]; we complete its proof in a manner similar
to the provided reference.
Importing various proof contexts inside a proof script is therefore a manage-
able transaction : here, we only had to provide Coq with the type of a mapping
to an appropriate unitary ring for it to infer the correct polynomial theory.

4.2 Practical Linear Algebra

Motivations. Reasoning on an algorithm that aims at solving systems of linear
equations seems a good benchmark of our formalization of matrices. Indeed, the
issue of representing fixed-size arrays using dependent types has pretty much
become the effigy of the benefits of dependent type-checking, at least for its
programatically-minded proponents.
2
Unlike in [8], cyclic is a boolean predicate that corresponds to the usual meaning
of the adjective.
Packaging Mathematical Structures 339

However, writing functions that deal with those types implies some challenges,
among which is dealing with size arguments. We want our library to simplify
this task, while sharing operator symbols, and exposing structural properties of
objects as soon as their shape ensures they are valid.

LUP decomposition. The LUP decomposition is a recursive function that

returns, for any non-singular matrix A, three matrices P, L, U such that L is a
lower-triangular matrix, U is an upper-triangular matrix, and P is a permutation
matrix, and P A = LU .
We invite the reader to refer to ([20], 28.3) for more details about this no-
torious algorithm. Our implementation is strikingly similar to a tail-recursive
version of its textbook parent. Its ﬁrst line features a type annotation that does
all of the work of dealing with matrix dimensions:
1 Fixpoint cormen_lup n : let M := ’M_n.+1 in M -> M * M * M :=
2 match n return let M := ’M_(1 + n) in M -> M * M * M with
3 | 0 => fun A => (1%:M, 1%:M, A)
4 | n’.+1 => fun A =>
5 let k := odflt 0 (pick [pred k | A k 0 != 0]) in
6 let A’ := rswap A 0 k in
7 let Q := tperm_mx F 0 k in
8 let Schur := ((A k 0)^-1 *m: llsubmx A’) *m ursubmx A’ in
9 let: (P’, L’, U’) := cormen_lup (lrsubmx A’ - Schur) in
10 let P := block_mx 1 0 0 P’ * Q in
11 let L := block_mx 1 0 ((A k 0)^-1 *m: (P’ *m llsubmx A’)) L’ in
12 let U := block_mx (ulsubmx A’) (ursubmx A’) 0 U’ in
13 (P, L, U)
14 end.

Here, in a fashion congruent with the philosophy of our archive, we return a

value for any square matrix A, rather than just for non-singular matrices, and
use the following shorthand:

odflt 0 (pick [pred k:fT | P k])

returns k, an inhabitant of the finType fT such that P k if it exists, and
returns 0 otherwise.
blockmx Aul Aur All Alr
Aul Aur

reads as All Alr .
ulsubmx, llsubmx, ursubmx, lrsubmx
are auxiliary functions that use the shape 3 of the dependent parameter of
their argument A to return respectively Aul , All , Aur , Alr when A is as repre-
sented above. Notice we will now denote their application using the same
subscript pattern.

The rest of our notations can be readily interpreted, with 1 and 0 coercing
respectively to identity and null matrices of the right dimension, and (A i j)
returning the appropriate ai,j coeﬃcient of A through coercion.
3
As crafted in line 2 above
340 F. Garillot et al.

Correctness of the algorithm. We will omit in this article some of the steps
involved in proving that the LUP decomposition is correct: showing that P is a
permutation matrix, for instance, involved building a theory about those ma-
trices that correspond to a permutation map over a ﬁnite vector. But while
studying the behavior of this subclass with respect to matrix operations gave
some hint of the usability of our matrix library 4 , it is not the part where our
infrastructure shines the most.
The core of the correction lies in the following equation:
Lemma cormen_lup_correct : forall n A,
let: (P, L, U) := @cormen_lup F n A in P * A = L * U.

Its proof proceeds by induction on the size of the matrix. Once we make sure
that A’ and Q (line 7) are deﬁned coherently, it is not hard to see that we are
proving is5 :
⎛ ⎞ ⎛ ⎞ ⎛ A ⎞
1 0 ... 0 1 0 ... 0 ul A ur
⎜0 ⎟ ⎜ ⎟ ⎜ 0 ⎟
⎜ ⎟ ⎜ ⎜
⎟*⎜ . ⎟
⎜ .. ⎟ = ⎝ −1 ⎠ ⎝ . ⎟ (1)
⎝. P *A’ ⎠
ak,0 · P’ *m A’ll L . U ⎠
0 0

where P’,L’,U’ are provided by the induction hypothesis

−1
P’ A’lr − ak,0 · A’ll *m A’ur = L’*U’ (2)

Notice that we transcribe the distinction Coq does with the three product oper-
ations involved: the scalar multiplication (·), the square matrix product (*), and
the matrix product, accepting arbitrary sized matrices (*m ). Using block product
expansion and a few easy lemmas allows us to transform (1) into:
⎛ ⎞ ⎛ ⎞
A ul A ur A ul A ur
⎜ ⎟ ⎜ ⎟
⎝ P =⎜ ⎟ (3)
*m A’ll P *m A’lr ⎠ ⎝ a−1
k,0 · P’ *m A’ll *m A’ul a−1
k,0 · P’ *m A’ll *m A’ur
⎠
+ L’ *m U’

At this stage, we would like to rewrite our goal with (2) —named IHn in our
script—, even though its right-hand side does not occur exactly in the equation.
However, SSReflect has no trouble expanding the definition of the ring mul-
tiplication provided in (2) to see it exactly matches the pattern6 -[L’ *m U’]IHn.
We conclude by identifying the blocks of (3) one by one. The most tedious
step consists in treating the lower left block, which depends on whether we have
been able to chose a non-null pivot in creating A’ from A. Each alternative is
resolved by case on the coefficients of that block, and it is only in that part that
we use the fact that the matrix coefficients belong to a field. The complete proof
is fourteen lines long.
4
The theory, while expressed in a general manner, is less than ninety lines long.
5
We will write block expressions modulo associativity and commutativity, to reduce
parenthesis clutter.
6
See [10] for details on the involved notation for the rewrite tactic.
Packaging Mathematical Structures 341

5 Related Work

The need for packaging algebraic structures and formalizing their relative in-
heritance and sharing inside proof assistants is reported in literature as soon as
these tools prove mature enough to allow the formalisation of significant pieces
of algebra [2]. The set-theoretic Mizar Mathematical Library (MML) certainly
features the largest corpus of formalized mathematics, yet covering rather differ-
ent theories than the algebraic ones we presented here. Little report is available
on the organization a revision of this collection of structures, apart from com-
ments [7] on the difficulty to maintain it. The Isabelle/HOL system provides
foundations for developing abstract algebra in a classical framework contain-
ing algebraic structures as first-class citizens of the logic and using a type-class
like mechanism [6]. This library proves Sylow theorems on groups and the basic
theory of rings of polynomials.
Two main algebraic hierarchies have been built using the Coq system: the
seminal abstract Algebra repository [4], covering algebraic structures from mo-
noids to modules, and the CCorn hierarchy [5], mainly devoted to a constructive
formalisation of real numbers, and including a proof of the fundamental theorem
of algebra. Both are axiomatic, constructive, and setoid based. They have proved
rather difficult to extend with theories like linear or multilinear algebra, and to
populate with more concrete instances. In both cases, limitations mainly come
from the pervasive use of setoids and the drawbacks of telescope based hierarchies
pointed in section 2.
The closest work to ours is certainly the hierarchy built in Matita [21], us-
ing telescopes and a more liberal system of coercions. This hierarchy, despite
including a large development in constructive analysis [22], is currently less pop-
ulated than ours. For example, no counterpart of the treatment of polynomials
presented in section 4 is described in the Matita system.
We are currently extending our hierarchy to extend the infrastructure to the
generic theory of vector spaces and modules.

References

1. Lang, S.: Algebra. Springer, Heidelberg (2002)

2. Jackson, P.: Enhancing the Nuprl proof-development system and applying it to
computational abstract algebra. PhD thesis, Cornell University (1995)
3. Betarte, G., Tasistro, A.: Formalisation of systems of algebras using dependent
record types and subtyping: An example. In: Proc. 7th Nordic workshop on Pro-
gramming Theory (1995)
4. Pottier, L.: User contributions in Coq, Algebra (1999),
http://coq.inria.fr/contribs/Algebra.html
5. Geuvers, H., Pollack, R., Wiedijk, F., Zwanenburg, J.: A constructive algebraic
hierarchy in Coq. Journal of Symbolic Computation 34(4), 271–286 (2002)
6. Haftmann, F., Wenzel, M.: Local theory speciﬁcations in Isabelle/Isar. In:
Berardi, S., Damiani, F., de’Liguoro, U. (eds.) TYPES 2008. LNCS, vol. 5497,
pp. 153–168. Springer, Heidelberg (2009)
7. Rudnicki, P., Schwarzweller, C., Trybulec, A.: Commutative algebra in the Mizar
system. J. Symb. Comput. 32(1), 143–169 (2001)
342 F. Garillot et al.

8. Gonthier, G., Mahboubi, A., Rideau, L., Tassi, E., Théry, L.: A Modular Formali-
sation of Finite Group Theory. In: Schneider, K., Brandt, J. (eds.) TPHOLs 2007.
LNCS, vol. 4732, pp. 86–101. Springer, Heidelberg (2007)
9. Bertot, Y., Gonthier, G., Ould Biha, S., Pasca, I.: Canonical big operators. In:
Mohamed, O.A., Muñoz, C., Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp.
86–101. Springer, Heidelberg (2008)
10. Gonthier, G., Mahboubi, A.: A small scale reflection extension for the Coq system.
INRIA Technical report, http://hal.inria.fr/inria-00258384
11. Paulin-Mohring, C.: Définitions Inductives en Théorie des Types d’Ordre
Supérieur. Habilitation à diriger les recherches, Université Claude Bernard Lyon I
(1996)
12. Sozeau, M., Oury, N.: First-Class Type Classes. In: Mohamed, O.A., Muñoz, C.,
Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 278–293. Springer, Heidelberg
(2008)
13. Pollack, R.: Dependently typed records in type theory. Formal Aspects of Com-
puting 13, 386–402 (2002)
14. Bruijn, N.G.D.: Telescopic mappings in typed lambda calculus. Information and
Computation 91, 189–204 (1991)
15. Paulson, L.C.: Defining Functions on Equivalence Classes. ACM Transactions on
Computational Logic 7(4), 658–675 (2006)
16. Barthe, G., Capretta, V., Pons, O.: Setoids in type theory. Journal of Functional
Programming 13(2), 261–293 (2003)
17. Altenkirch, T., McBride, C., Swierstra, W.: Observational equality, now! In: Pro-
ceedings of the PLPV 2007 workshop, pp. 57–68. ACM, New York (2007)
18. Olteanu, G.: Computing the Wedderburn decomposition of group algebras by the
Brauer-Witt theorem. Mathematics of Computation 76(258), 1073–1087 (2007)
19. Rotman, J.J.: An Introduction to the Theory of Groups. Springer, Heidelberg
(1994)
20. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms,
2nd edn. McGraw-Hill, New York (2003)
21. Sacerdoti Coen, C., Tassi, E.: Working with Mathematical Structures in Type
Theory. In: Miculan, M., Scagnetto, I., Honsell, F. (eds.) TYPES 2007. LNCS,
vol. 4941, pp. 157–172. Springer, Heidelberg (2008)
22. Sacerdoti Coen, C., Tassi, E.: A constructive and formal proof of Lebesgue Domi-
nated Convergence Theorem in the interactive theorem prover Matita. Journal of
Formalized Reasoning 1, 51–89 (2008)
Practical Tactics for Separation Logic

Andrew McCreight

Portland State University

[email protected]

Abstract. We present a comprehensive set of tactics that make it prac-

tical to use separation logic in a proof assistant. These tactics enable the
verification of partial correctness properties of complex pointer-intensive
programs. Our goal is to make separation logic as easy to use as the
standard logic of a proof assistant. We have developed tactics for the
simplification, rearranging, splitting, matching and rewriting of separa-
tion logic assertions as well as the discharging of a program verification
condition using a separation logic description of the machine state. We
have implemented our tactics in the Coq proof assistant, applying them
to a deep embedding of Cminor, a C-like intermediate language used by
Leroy’s verified CompCert compiler. We have used our tactics to verify
the safety and completeness of a Cheney copying garbage collector writ-
ten in Cminor. Our ideas should be applicable to other substructural
logics and imperative languages.

1 Introduction

Separation logic [1] is an extension of Hoare logic for reasoning about shared mu-
table data structures. Separation logic specifies the contents of individual cells of
memory in a manner similar to linear logic [2], avoiding problems with reasoning
about aliasing in a very natural fashion. For this reason, it has been successfully
applied to the verification of a number of pointer-intensive applications such as
garbage collectors [3,4].
However, most work on separation logic has involved paper, rather than
machine-checkable, proofs. Mechanizing a proof can increase our confidence in
the proof and potentially automate away some of the tedium in its construction.
We would like to use separation logic in a proof assistant to verify deep proper-
ties of programs that may be hard to check fully automatically. This is difficult
because the standard tactics of proof assistants such as Coq [5] cannot effectively
deal with the linearity properties of separation logic. In contrast, work such as
Smallfoot [6] focuses on the automated verification of lightweight specifications.
We discuss other related work in Sect. 7.
In this paper, we address this problem with a suite of tools for separation-
logic-based program verification of complex pointer-intensive programs. These
tools are intended for the interactive verification of Cminor programs [7] in the
Coq proof assistant, but should be readily adaptable to similar settings. We
have chosen Cminor because it can be compiled using the CompCert verified

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 343–358, 2009.

c Springer-Verlag Berlin Heidelberg 2009
344 A. McCreight

compiler [7], allowing for some properties of source programs to be carried down
to executable code. We have tested the applicability of these tools by using them
to verifying the safety of a Cheney garbage collector [8], as well as a number of
smaller examples.
The main contributions of this paper are a comprehensive set of tactics for
reasoning about separation logic assertions (including simplification, rearrang-
ing, splitting, matching and rewriting) and a program logic and accompanying
set of tactics for program verification using separation logic that strongly sep-
arate reasoning about memory from more standard reasoning. Together these
tactics essentially transform Coq into a proof assistant for separation logic. The
tactics are implemented in a combination of direct and reflective styles. The Coq
implementation is available online from http://cs.pdx.edu/~mccreigh/ptsl/
Our tool suite has two major components. First, we have tactics for reason-
ing about separation logic assertions. These are focused on easing the difficulty
of working with a linear-style logic within a more conventional proof assistant.
These tools enable the simplification and manipulation of separation logic hy-
potheses and goals, as well as the discharging of goals that on paper would
be trivial. These tactics are fairly modular and should be readily adaptable to
other settings, from separation logic with other memory models to embeddings
of linear logic in proof assistants.
The second component of our tool set is a program logic and related tactics.
The program logic relates the dynamic semantics of the program to its specifica-
tion. The tactics step through a procedure one statement at a time, enabling the
“programmer’s intuition” to guide the “logician’s intuition”. At each program
step, there is a separation logic-based description of the current program state. A
verified verification condition generator produces a precondition given the post-
condition of the statement. The tactics are able to automatically solve many
such steps, and update the description of the state once the current statement
has been verified. Loop and branch join point annotations must be manually
specified.

Organization of the Paper. We discuss the Cminor abstract machine in Sect. 2.

In Sect. 3, we discuss the standard separation logic assertions we use, and the
series of tactics we have created for reasoning with them. Then we discuss our
program logic and its associated tactics in Sect. 4. In Sect. 5 we show how all
of these pieces come together to verify the loop body of an in-place linked list
reversal program. Finally, we brieﬂy discuss our use of the previously described
tactics to verify a garbage collector in Sect. 6, then discuss related work in more
detail and conclude in Sect. 7.

2 Cminor

Our program tools verify programs written in Cminor, a C-like imperative lan-
guage. Cminor is an intermediate language of the CompCert [7] compiler, which
is a semantics preserving compiler from C to PowerPC assembly, giving us a
Practical Tactics for Separation Logic 345

v ::= Vundef | Vword(w) | Vptr(a)

e ::= v | x | [e] | e + e | e!=e | ...
s ::= x := e | [e1 ] := e2 | s1 ; s2 | while(e) s | return e
σ ::= (m, V )

Fig. 1. Cminor syntax

y := NULL; // y is the part of list that has been reversed

while(x != NULL) (
t := [x+4]; [x+4] := y; // t is next x, store prev x
y := x; x := t // advance y and x
);
return y

Fig. 2. In-place linked list reversal

potential path to veriﬁed machine code. We use a simpliﬁed variant of Cminor

that only supports 32-bit integer and pointer values. Fig. 1 gives the syntax. We
write w for 32-bit integers and a for addresses, which are always 32-bit aligned. A
value v is either undefined, a 32-bit word value or a pointer. We write NULL for
Vword(0). An expression e is either a value, a program variable, a memory load
(i.e., a dereference), or a standard arithmetic or logical operations such as addi-
tion or comparison. In this paper, a statement s is either a variable assignment,
a store to memory, a sequence of statements, a while loop or a return. In the
actual implementation, while loops are implemented in terms of more primitive
control structures (loops, branches, blocks and exits to enclosing blocks) capable
of encoding all structured control flow. Our implementation also has procedure
calls. We omit discussion of these constructs, which are supported by all of the
logics and tactics presented in this paper, for reasons of space.
A memory m of type Mem is a partial mapping from addresses to values, while
a variable environment V is a partial mapping from Cminor variables x to values.
A state σ is a memory plus a variable environment. In our implementation, a
state also includes a stack pointer. We define two projections mem(σ) and venv(σ)
to extract the memory and variable environment components of a state.
We have formally defined a standard small-step semantics for Cminor [9],
omitted here for reasons of space. Expression evaluation eval(σ, e) evaluates ex-
pression e in the context of state σ and either returns Some(v) if execution
succeeds or None if it fails. Execution of an expression will only fail if it accesses
an undefined variable or an invalid memory location. A valid memory location
has been allocated by some means, such as a function call, but not freed. All
other degenerate cases (such as Vword(3) + Vundef) evaluate to Some(Vundef).
Unlike in C, all address arithmetic is done in terms of bytes, so the second field
of an object located at address a is located at address a + 4.
We will the program fragment shown in Fig. 2 in the remainder of this paper
to demonstrate our suite of tactics. In-place list reversal takes an argument x
346 A. McCreight

that is a linked list and reverses the linked list in place, returning a pointer to
the new head of the list. A linked list structure at address a has two ﬁelds, at a
and a + 4.

3 Separation Logic Assertions

Imperative programs often have complex data structures. To reason about these
data structures, separation logic assertions [1] describe memory by treating mem-
ory cells as a linear resource. In this section, we will describe separation logic
assertions and associated tactics for Cminor, but they should be applicable to
other imperative languages.
Fig. 3 gives the standard deﬁnitions of the separation logic assertions we use
in this paper. We write P for propositions and T for types in the underlying
logic, which in our case is the Calculus of Inductive Constructions [10] (CIC).
Propositions have type Prop. We write A and B for separation logic assertions,
implemented using a shallow embedding [11]. Each separation logic assertion is a
memory predicate with type Mem → Prop, so we write A m for the proposition
that memory m can be described by separation logic predicate A.
The separation logic assertion contains, written v → v , holds on a memory m
if v is an address that is the only element of the domain of m and m(v) = v . The
empty assertion emp only holds on empty memory. The trivial assertion true
holds on every memory. The modal operator !P from linear logic (also adapted
to separation logic by Appel [12]) holds on a memory m if the proposition P
is true and m is empty. The existential ∃x : T. A is analogous to the standard
existential operator. We omit the type T when it is clear from context, and follow
common practice and write a → − for ∃x. a → x.
The ﬁnal and most crucial separation logic operator we will be using in this
paper is separating conjunction, written A ∗ B. This holds on a memory m if m
can be split into two non-overlapping memories m1 and m2 such that A holds
on m1 and B holds on m2 . (m = m1 m2 holds if m is equal to m1 ∪ m2
and the domains of m1 and m2 are disjoint.) This operator is associative and
commutative, and we write (A ∗ B ∗ C) for (A ∗ (B ∗ C)). This operator is used to
specify the frame rule, which is written as follows in conventional Hoare logic:

{A}s{A }
{A ∗ B}s{A ∗ B}

(v → v ) m ::= (m = {v ; v }) emp m ::= (m = ∅)

true m ::= True (!P ) m ::= P ∧ emp m

(∃x : T. A) m ::= (∃x : T. A m)

(A ∗ B) m ::= ∃m1 , m2 . (m = m1 m2 ) ∧ A m1 ∧ B m2

Fig. 3. Deﬁnition of separation logic assertions

Practical Tactics for Separation Logic 347

B describes parts of memory that s does not interact with. The frame rule
is most commonly applied at procedure call sites. We have found that we do
not need to manually instantiate the frame rule, thanks to our tactics and
program logic.
We can use these basic operators in conjunction with Coq’s standard facilities
for inductive and recursive deﬁnitions to build assertions for more complex data
structures. For instance, we can inductively deﬁne a separation logic assertion
llist(v, l) that holds on a memory that consists entirely of a linked list with its
head at v containing the values in the list l. A list l (at the logical level) is either
empty (written nil) or contains an element X appended to the front of another
list l (written X :: l).

llist(a, nil) ::= !(a = NULL)

llist(a, v :: l ) ::= ∃a . a → v ∗ (a + 4) → a ∗ llist(a , l )

From this deﬁnition and basic lemmas about separation logic assertions, we
can prove a number of useful properties of linked lists. For instance, if llist(v, l)
holds on a memory and v is not NULL then the linked list is non-empty.
We can use this predicate to deﬁne part of the loop invariant for the list
reversal example given in Section 2. If the variables x and y have the value v1 and
v2 , then memory m must contain two separate, non-overlapping linked lists with
values l1 and l2 . In separation logic, this is written (llist(v1 , l1 ) ∗ llist(v2 , l2 )) m.

3.1 Tactics

Defining the basic separation logic predicates and verifying their basic properties
is not difficult, even in a mechanized setting. What can be difficult is actually
constructing proofs in a proof assistant such as Coq because we are attempting
to carry out linear-style reasoning in a proof assistant with a native logic that is
not linear.
If A, B, C and D are regular propositions, then the proposition that (A ∧
B ∧ C ∧ D) implies (B ∧ (A ∧ D) ∧ C) can be easily proved in a proof assistant.
The assumption and goal can be automatically decomposed into their respective
components, which can in turn be easily solved.
If A, B, C and D are separation logic assertions, proving the equivalent goal,
that for all m, (A ∗ B ∗ C ∗ D) m implies (B ∗ (A ∗ D) ∗ C) m, is more difficult.
Unfolding the definition of * from Fig. 3 and breaking down the assumption in a
similar way will involve large numbers of side conditions about memory equality
and disjointedness. While Marti et al. [13] have used this approach, it throws
away the abstract reasoning of separation logic.
Instead, we follow the approach of Reynolds [1] and others and reason about
separation logic assertions using basic laws like associativity and commutativity.
However this is not the end of our troubles. Proving the above implication re-
quires about four applications of associativity and commutativity lemmas. This
can be done manually, but becomes tedious as assertions grow larger. In real
proofs, these assertions can contain more than a dozen components.
348 A. McCreight

To mitigate these problems we have constructed a variety of separation logic

tactics. Our goal is to make constructing proofs about separation logic predi-
cates no more diﬃcult than reasoning about Coq’s standard logic by making the
application of basic laws about separation logic assertions as simple as possible.
Now we give an example of the combined usage of our tactics. In the initial
state, we have a hypothesis H that describes the current memory, and a goal
on that same memory that we wish to prove. First the rewriting tactic uses a
previously proved lemma about empty linked lists to simplify the assertion:
H : (B ∗ A) m H : (B ∗ A) m
→
(A ∗ llist(v, nil) ∗ B ) m (A∗!(v = NULL) ∗ B ) m

Next the simpliﬁcation tactic extracts !(v = NULL), creating a new subgoal
v = NULL (not shown). Finally, a matching tactic cancels out the common parts
of the hypothesis and goal, leaving a smaller proof obligation.
H : (B ∗ A) m H : B m
→ →
(A ∗ B ) m B m

The m in the ﬁnal proof state is fresh: it must be shown that B m implies
B m for all m1 . This example shows most of our tactics for separation logic

assertions: simpliﬁcation, rearranging, splitting, matching, and rewriting. We

discuss general implementation issues, then discuss each type of tactic.
Implementation. Efficiency matters for these tools. If they are slow or produce
gigantic proofs they will not be useful. Our goal is to enable the interactive
development of proofs, so tactics should run quickly. Our tactics are implemented
entirely in Coq’s tactic language Ltac . To improve efficiency and reliability, some
tactics are implemented reflectively [14]. Implementing a tactic reflectively means
implementing it mostly in a strongly typed functional language (CIC) instead
of Coq’s untyped imperative tactic language. Reflective tactics can be efficient
because the heavy work is done at the level of propositions, not proofs. Strong
typing allows many kinds of errors in the tactics to be statically detected.
Simplification. The simplification tactic ssimpl puts a separation logic asser-
tion into a normal form to make further manipulation simpler and to clean up
assertions after other tactics have been applied. ssimpl in H simplifies the hy-
pothesis H and ssimpl simplifies the goal. Here is an example of the simplification
of a hypothesis (if a goal was being simplified, the simplification would produce
three goals instead of three hypotheses):
x : addr
H : (A ∗ x → v ∗ B ∗ true) m
H : (A ∗ ∃x.x → v ∗ emp ∗ true ∗ (B ∗ true)∗ !P ) m H : P
→
... ...

1
The ﬁnal step could be more speciﬁc (for instance, we know that m must be a subset
of m), but this is rarely useful in practice.
Practical Tactics for Separation Logic 349

The basic simpliﬁcations are as follows:

1. All separation logic existentials are eliminated or introduced with Coq’s
meta-level existential variables (which must eventually be instantiated).
2. All modal predicates !P are removed from the assertion, and either turned
into new goals or hypotheses. The tactic attempts to solve any new goals.
3. All instances of true are combined and moved to the right.
4. For all A, (A ∗ emp) and (emp ∗ A) are replaced with A.
5. For all A, B and C, ((A ∗ B) ∗ C) is replaced with (A ∗ (B ∗ C)).
In addition, there are a few common cases where the simplifier can make more
radical simplifications:
1. If Vundef → v or 0 → v are present anywhere in the assertion (for any value
v), the entire assertion is equivalent to False, because only addresses can be
in the domain of →.
2. If an entire goal assertion can be simplified to true the simplifier tactic
immediately solves the goal.
3. The tactic attempts to solve a newly simplified goal by assumption.
While most of this simplification could be implemented using rewriting, we
have implemented it reflectively. The reflective tactic examines the assertion and
simplifies it in a single pass, instead of the multiple passes that may be required
for rewriting. In informal testing, this approach was faster than rewriting.
Rearranging. Our rearranging tactic srearr allows the separating conjunction’s
commutativity and associativity properties to be applied in a concise and declar-
ative fashion. This is useful because the order and association of the components
of a separation logic assertion affect the behavior of the splitting and rewriting
tactics, which are described later. The rearranging tactic is invoked by srearr T ,
where T is a tree describing the final shape of the assertion. There is an equiv-
alent tactic for hypotheses. The shape of the tree gives the desired association
of the assertion while the numbering of the nodes give the desired order. De-
scribing the desired term without explicitly writing it in the proof script makes
the tactic less brittle. As with other tactics, srearr assumes that ssimpl has been
used already, and thus is of the form (A1 ∗ A2 ∗ ... ∗ An ) m. The tactic fails if the
ordering given is not a valid permutation.
Here is an example, invoked2 with srearr [7, 6, [[5, 4], 3], 1, 2]:
... ...
→
(A ∗ B ∗ C ∗ D ∗ E ∗ F ∗ G) m (G ∗ F ∗ ((E ∗ D) ∗ C) ∗ A ∗ B) m
Permutation and reassociation are implemented as separate passes. Permu-
tation is implemented reflectively: the tree describing the rearrangement is flat-
tened into a list, and the assertion is also transformed into a list. For instance, the
2
The actual syntax of the command is slightly different, to avoid colliding with exist-
ing syntactic definitions: srearr .[7, 6, .[ .[5, 4], 3], 1, 2]%SL. Coq’s abil-
ity to define new syntax along with its implicit coercions makes this lighter weight
than it would be otherwise.
350 A. McCreight

tree [[3, 1], 2] would become the list [3, 1, 2] and the assertion (A ∗ B ∗ C) would
become the list [A, B, C]. The permutation list [3, 1, 2] is used to reorder the
assertion list [A, B, C]. In this example, the resulting assertion list is [C, A, B].
The initial assertion list is logically equivalent to the ﬁnal one if the permuta-
tion list is a permutation of the indices of the assertion list. This requirement
is dynamically checked by the tactic. Reassociation is implemented directly by
examining the shape of the tree.
Splitting. The tactic ssplit subdivides a separation logic proof by creating a new
subgoal for each corresponding part of the hypothesis and goal. This uses the
standard separation logic property that if (∀m. A m → A m) and (∀m. B m →
B m), then (∀m. (A ∗ B) m → (A ∗ B ) m). Here is an example of the basic
use of this tactic:
H : (A ∗ B ∗ C) m H1 : A m 1 H2 : B m 2 H3 : C m 3
−→
(E ∗ F ∗ G) m E m1 F m2 G m3
Initially there is a goal and a hypothesis H describing memory. Afterward,
there are three goals, each with a hypothesis. Memories m1 , m2 and m3 are
freshly created, and represent disjoint subsets covering the original memory m.
Splitting must be done with care as it can lead to a proof state where no further
progress is possible.
The splitting tactic also has a number of special cases to solve subgoals involv-
ing →, and applies heuristics to try to solve address equalities that are generated.
Here are two examples of this:
H : (a → v) m H : (a → v) m
−→ −→
(a → v ) m v = v (b → v) m a=b

Matching. The matching tactic searchMatch cancels out matching parts of the
hypothesis and goal. This matching is just syntactic equality, with the addition
of some special cases for →. Here is an example of this tactic:
E : v1 = v1 E : v1 = v1
H : (D ∗ A ∗ a → v1 ∗ B ∗ b → v2 ∗ true) m H : (B ∗ true) m
−→
(B ∗ b → − ∗ D ∗ a → v1 ∗ A ∗ true) m (B ∗ true) m

The assertion for address a is cancelled due to the equality v1 = v1 . Notice that
the predicate true, present in both the hypothesis and goal, is not cancelled. If B
implies (B ∗ true) then cancelling out true from goal and hypothesis will cause
a provable goal to become unprovable. This is the same problem presented by
the additive unit in linear logic, which can consume any set of linear resources.
We do not have this problem for matching other separation logic predicates as
they generally do not have this sort of slack.
Matching is implemented by iterating over the assertions in the goal and
hypothesis, looking for matches. Any matches that are found are placed in cor-
responding positions in the assertions, allowing the splitting tactic to carry out
the actual cancellation.
Practical Tactics for Separation Logic 351

Rewriting. In Coq, supporting rewriting of logically equivalent assertions must

be implemented using the setoid rewriting facility. We have done this for the as-
sertions described in this paper. By adding rewrite rules to a particular database,
the user can extend our tools to support simpliﬁcation of their own assertions.
At the beginning of Sect. 3.1, we gave an example of the use of the rewriting
tactic. In the ﬁrst proof state, the goal that contains an empty linked list that
must be eliminated. Assume we have proved a theorem arrayEmpty having type
∀x. array(x, nil) ⇒ emp. The tactic srewrite arrayEmpty will change the proof
state to the second proof state.

4 Program Logic

Our Cminor program logic is a veriﬁed veriﬁcation condition generator (VCG).

We discuss our VCG then the related tactics.

4.1 Veriﬁcation Condition Generator

The VCG, stmPre, is a weakest precondition generator deﬁned as a recursive

function in Coq’s logic. It takes as arguments the statement to be verified along
with specifications for the various ways to exit the statement, and returns a
state predicate that is the precondition for the statement. Verification requires
showing that a user-specified precondition is as least as strong as the VC.
The design of the VCG is based on Appel and Blazy’s program logic for Cmi-
nor [9]. Their program logic is structured as a traditional Hoare triple (though
with more than three components), directly incorporates separation logic, and
has more side conditions for some rules. On the other hand, our VCG is defined
in terms of the operational semantics of the machine.
For the subset of Cminor we have defined in Sect. 2, the VCG only needs
one specification argument, a state predicate q that is the postcondition of the
current statement. The full version of the VCG that we have implemented takes
other arguments giving the specifications of function calls, the precondition of
program points that can be jumped to, and the post condition of the current
procedure.
The definition of some of the cases of the VCG is given in Fig. 4. To simplify
the presentation, only arguments required for these cases are included, leaving
three arguments to stmPre: the statement, its postcondition q, and the current
state σ. The precondition of a sequence of statements is simply the composition
of VCs for the two statements: we generate a precondition for s , then use that
as the postcondition of s.
For more complex statements, we have to handle the possibility of execution
failing. To do this in a readable way, we use a Haskell-style do-notation to encode
a sort of error monad. Operations that can fail, such as expression evaluation,
return Some(R) if they succeed with result R, and None if they fail. Syntactically,
the term “do x ← M ; N ” is similar to a let expression “let x = M in N ”: the
variable x is given the value of M , and is bound in the body N . However, the
352 A. McCreight

stmPre (s; s ) q σ ::= stmPre (x := e) q σ ::= stmPre ([e1 ] := e2 ) q σ ::=

stmPre s (stmPre s q) σ do v ← eval(σ, e); do v1 ← eval(σ, e1 );
do σ ← setVar(σ, x, v); do v2 ← eval(σ, e2 );
q σ do σ ← storeVal σ v1 v2 ;
q σ

stmPre (while(e) s) q σ ::=

∃I. I σ ∧ ∀σ . I σ →
do v ← eval(σ , e);
(v = Vundef) ∧ (trueV al(v) → stmPre s I σ ) ∧ (v = NULL → q σ )

(do x ← Some(v); P ) ::= P [v/x] (do x ← None; P ) ::= False

Fig. 4. Veriﬁcation condition generator

reduction of this term differs, as shown in Fig. 4. In the case where M = None,
evaluation has failed, so the entire VC becomes equivalent to False, because
failure is not allowed.
The cases for variable assignment and storing to memory follow the dynamic
semantics of the machine. Variable assignment attempts to evaluate the expres-
sion e, then attempts to update the value of the variable x. If it succeeds, then
the postcondition q must hold on the resulting state σ . Store works in the same
way: evaluation of the two expressions and a store are attempted. For the store
to succeed, v1 must be a valid address in memory. As with assignment, the post-
condition q must hold on the resulting state. With both of these statements, if
any of the intermediate evaluations fail then the entire VC will end up being
False, and thus impossible to prove.
The case for while loops is fairly standard. A state predicate I must be selected
as a loop invariant. I must hold on the initial state σ. Furthermore, for any other
states σ such that I σ holds, it must be possible to evaluate the expression e to
a value v, which cannot be Vundef. If the value v is a “true” value (i.e., is either
a pointer or a non-zero word value) then the precondition of the loop body s
must hold, where the postcondition of the body is I. If the value is false (equal
to Vword(0)) then the postcondition of the entire loop must hold.
We have mechanically verified the soundness of the verification condition gen-
erator as part of the safety of the program logic: if the program is well-formed,
then we can either take another step or we have reached a valid termination
state for the program.

4.2 Variable Environment Reasoning

We use separation logic to reason about memory, but deﬁne a new predicate to
reason about variable environments. At any given point in a program, the contents
of the variable environment V can be described by a predicate veEqv S V V . S is a
set of variables that are valid in V , and V gives the values of some of the variables
in V .
Practical Tactics for Separation Logic 353

4.3 Tactics

The tactic vcSteps lazily unfold the VC and attempts to use the separation logic
description of the state to perform symbolic execution to step through the VC.
Fig. 5 shows the rough sequence of steps that vcSteps carries out automatically
at an assignment. Each numbered line below the horizontal line gives an inter-
mediate goal state as vcSteps is running. The two hypothesis V and H above
the line describe the variable environment and memory of the initial state σ,
and are the precondition of this statement. The first stmPre below the line is
the initial VC that must be verified. The tactic unfolds the definition of the VC
for a sequence (line 2), then for an assignment (line 3), then determines that
the value of the variable x is a by examining the hypothesis V (line 4). The
load expression now has a value as an argument, so the tactic will examine the
hypothesis H to determine that address a contains value v (line 5). The relevant
binding a → v can occur as any subtree of the separation logic assertion. Now
the do-notation can be reduced away (line 6).
Once this is done, all that remains is to actually perform the assignment. The
hypothesis V proves that y is in the domain the variable file of σ, so setting the
value of y to v will succeed, producing a new state σ . The tactic simplifies the
goal to step through this update, and uses V to produce a new V that describes
σ . H can still be used as the memory of σ is the same as the memory of σ. This
results in the proof state shown in Fig. 6. The tactic can now begin to analyze
the statement s in a similar manner.
This may seem like a lengthy series of steps, but it is largely invisible to
the user. Breaking down statements in this manner allows the tactics to easily
handle a wide variety of expressions. vcSteps will get “stuck” at various points
that require user intervention, such as loops and branches where invariants must
be supplied, and where the tactic cannot easily show that a memory or variable

V : veEqv {x, y} {(x ; a)} (venv(σ))

H : (A ∗ a → v ∗ B) (mem(σ))
1) stmPre (y := [x]; s) P σ
2) stmPre (y := [x]) (stmPre s P ) σ
3) do v ← eval(σ, [x]); do σ ← setVar(σ, y, v ); stmPre s P σ
4) do v ← eval(σ, [a]); do σ ← setVar(σ, y, v ); stmPre s P σ
5) do v ← Some(v); do σ ← setVar(σ, y, v ); stmPre s P σ
6) do σ ← setVar(σ, y, v); stmPre s P σ

Fig. 5. Program logic tactics: examining the state

V : veEqv {x, y} {(x ; a), (y ; v)} (venv(σ ))

H : (A ∗ a → v ∗ B) (mem(σ ))
stmPre s P σ

Fig. 6. Program logic tactics: updating the state

354 A. McCreight

operation is safe. In the latter case, the tactics described in the previous section
can be applied to manipulate the assertion to a form the program logic tactics
can understand, then vcSteps can be invoked again to pick up where it left oﬀ.
In addition to the tactic for reasoning about VCs, there is a tactic veEqvSolver
to automatically solve goals involving veEqv. This is straightforward, as it only
needs to reason about concrete ﬁnite sets.

5 Example of Tactic Use

In this section, we demonstrate the use of our tactics by verifying a fragment of

an in-place linked list reversal, given in Fig. 2. Before we can verify this program,
we need to deﬁne a loop invariant inv l0 σ, where l0 is the list of values in the
initial linked list and σ is the state at a loop entry:

inv l0 σ ::= ∃v1 , v2 . veEqv {x, y, t} {(x, v1 ), (y, v2 )} (venv(σ)) ∧

∃l1 , l2 . (llist(v1 , l1 ) ∗ llist(v2 , l2 )) (mem(σ)) ∧
rev(l1 ) ++ l2 = rev(l0 )

In the ﬁrst line, the predicate veEqv requires that in the current state that
at least the program variables x, y, t are valid, and that the variables x and y
are equal to some values v1 and v2 , respectively. The second line is a separation
logic predicate specifying that memory contains two disjoint linked lists as seen in
Sect. 3. From these two descriptions, we can deduce that the variable x contains
a pointer to a linked list containing the values l1 . Finally, the invariant requires
that reversing l1 and appending l2 results in the original list l0 . We write rev(l)
for the reversal of list l and l ++ l for appending list l to the end of l.
To save space, we will only go over the veriﬁcation of the loop body and not
describe in detail the invocation of standard Coq tactics. In the loop body, we
know that the loop invariant inv holds on the current state and that the value
of x is not Vword(0) (i.e., NULL). We must show that after the loop body has
executed that the loop invariant is reestablished.
Our initial proof state is thus:

NE : v1 = V int0
L : rev(l1 ) ++ l2 = rev(l0 )
H : (llist(v1 , l1 ) ∗ llist(v2 , l2 )) (mem(σ))
V : veEqv {x, y, t} {(x, v1 ), (y, v2 )} (venv(σ))
stmPre (t := [x + 4]; [x + 4] := y; y := x; x := t) (inv l0 ) σ

Our database of rewriting rules for separation logic data structures includes
the following rule: if v is not 0, then a linked list llist(v, l) must have at least one
element. Thus, applying our rewriting tactic to the hypothesis H triggers this
rule for v = v1 . After applying a standard substitution tactic, we have this proof
state (where everything that is unchanged is left as ...):
Practical Tactics for Separation Logic 355

...
L : rev(v :: l1 ) ++ l2 = rev(l0 )
H : ((v1 → v) ∗ (v1 +4 → v1 ) ∗ llist(v1 , l1 ) ∗ llist(v2 , l2 )) (mem(σ))
...

Now that we know that the address v1 + 4 contains the value v1 , we can
show that it is safe to execute the loop body. The tactic vcSteps, described in
Sect. 4.3, is able to automatically step through the entire loop body, leaving an
updated state description and the goal of showing that the loop invariant holds
on the ﬁnal state of the loop σ :

...
H : ((v1 → v) ∗ (v1 +4 → v2 ) ∗ llist(v1 , l1 ) ∗ llist(v2 , l2 )) (mem(σ ))
V : veEqv {x, y, t} {(x, v1 ), (y, v1 ), (t, v1 )} (venv(σ ))
inv l0 σ
Now we must instantiate two existential variables and show that they are
the values of the variables x and y (and that x, y and t are valid variables).
These existentials can be automatically instantiated, and the part of the goal
using veEqv solved, using a few standard Coq tactics along with the veEqvSolver
described in Sect. 4.3.
The remaining goal is

...
∃l3 , l4 . (llist(v1 , l3 ) ∗ llist(v1 , l4 )) (mem(σ )) ∧ rev(l3 ) ++ l4 = rev(l0 )

We manually instantiate the existentials with l1 and (v :: l2 ) and split the
resulting conjunction using standard Coq tactics. This produces two subgoals.
The second subgoal is rev(l2 ) ++ (v :: l1 ) = rev(l0 ) and can be solved using
standard tactics. The ﬁrst subgoal is an assertion containing llist(v1 , v :: l2 ),
which we can be simpliﬁed using standard tactics leaving the proof state

...
(llist(v1 , l1 ) ∗ (∃x .(v1 → v) ∗ (v1 +4 → x ) ∗ llist(x , l2 ))) (mem(σ ))

Invoking our simpliﬁcation tactic ssimpl replaces the existential with a Coq
meta-level existential variable “?100”, leaving the goal3

...
H : ((v1 → v) ∗ (v1 +4 → v2 ) ∗ llist(v1 , l1 ) ∗ llist(v2 , l2 )) (mem(σ ))
(llist(v1 , l1 ) ∗ (v1 → v) ∗ (v1 +4 → ?100) ∗ llist(?100, l2 )) (mem(σ ))

This goal is immediately solved by our tactic searchMatch. The hypothesis

contains v1 +4 → v2 , and the goal contains v1 +4 → ?100, so ?100 must be equal
to v2 . Once ?100 is instantiated, the rest of easy to match up. Thus we have
veriﬁed that the body of the loop preserves the loop invariant.
3
H has not changed but we include it here for convenience.
356 A. McCreight

6 Implementation and Application

Our tactics are implemented entirely in the Coq tactic language Ltac . The tool
suite include about 5200 lines of libraries, such as theorems about modular arith-
metic and data structures such as finite sets. Our definition of Cminor (discussed
in Sect. 2) is about 4700 lines. The definition of separation logic assertions and
associated lemmas (discussed in Sect. 3) are about 1100 lines, while the tactics
are about 3000 lines. Finally, the program logic (discussed in Sect. 4), which
includes its definition, proofs, and associated tactics, is about 2000 lines.
We have used the tactics described in this paper to verify the safety and com-
pleteness of a Cheney copying garbage collector [8] implemented in Cminor. It is
safe because the final object heap is isomorphic to the reachable objects in the
initial state and complete because it only copies reachable objects. This collector
supports features such as scanning roots stored in stack frames, objects with an
arbitrary number of fields, and precise collection via information stored in object
headers. The verification took around 4700 lines of proof scripts (including the
definition of the collector and all specifications), compared to 7800 lines for our
previous work [4] which used more primitive tactics. The reduction in line count
is despite the fact that our earlier collector did not support any of the features
we listed earlier in this paragraph and did not use modular arithmetic.
There has been other work on mechanized garbage collector verification, such
as Myreen et al. [15] who verify a Cheney collector in 2000 lines using a decom-
pilation based approach. That publication unfortunately does not have enough
detail of the collector verification to explain the difference in proof size, though it
is likely due in part to greater automation. Hawblitzel et al. [16] used a theorem
prover to automatically verify a collector that is realistic enough to be used for
real C# benchmarks.

7 Related Work and Conclusion

Appel’s unpublished note [12] describes Coq tactics that are very similar to ours.
These tactics do not support as many direct manipulations of assertions, and his
rewriting tactic appears to require manual instantiation of quantifiers. The paper
describes a tactic for inversion of inductively defined separation logic predicates,
which we do not support. While Appel also applies a “two-level approach” that
attempts to pull things out of separation logic assertions to leverage existing
proof assistant infrastructure, our approach is more aggressive about this, lifting
out expression evaluation. This allows our approach to avoid reasoning about
whether expressions in assertions involve memory.
We can give a rough comparison of proof sizes, using the in-place list reversal
procedure. Ignoring comments and the definition of the program, by the count
of wc Appel uses 200 lines and 795 words to verify this program [12]. With our
tactics, ignoring comments, blank lines and the definition of the program, our
verification takes 68 lines and less than 400 words.
Affeldt and Marti [13] use separation logic in a proof assistant, but unfold the
definitions of the separation logic assertions to allow the use of more conventional
Practical Tactics for Separation Logic 357

tactics. Tuch et al. [17] define a mechanized program logic for reasoning about
C-like memory models. They are able to verify programs using separation logic,
but do not have any complex tactics for separation logic connectives.
Other work, such as Smallfoot [6], has focused on automated verification of
lightweight separation logic specifications. This approach has been used as the
basis for certified separation logic decisions procedures in Coq [18] and HOL [19].
Calcagno et al. [20] use separation logic for an efficient compositional shape
analysis that is able to infer some specifications.
Still other work has focused on mechanized reasoning about imperative pointer
programs outside of the context of separation logic [11,21,22] using either deep
or shallow embeddings. Expressing assertions via more conventional propositions
enables the use of powerful preexisting theorem provers. Another approach to
program verification decompiles imperative programs into functional programs
that are more amenable to analysis in a proof assistant [23,15].
The tactics we have described in this paper provide a solid foundation for
the use of separation logic in a proof assistant but there is room for further au-
tomation. Integrating a Smallfoot-like decision procedure into our tactics would
automate reasoning about standard data structures.
We have presented a set of separation logic tactics that allows the verification
of programs using separation logic in a proof assistant. These tactics allow Coq
to be used as a proof assistant for separation logic by allowing the assertions
to be easily manipulated via simplification, rearranging, splitting, matching and
rewriting. They also provide tactics for proving a verification condition by means
of a separation logic based description of the program state. These tactics are
powerful enough to verify a garbage collector.

Acknowledgments. I would like to thank Andrew Tolmach for providing extensive

feedback about the tactics, and Andrew Tolmach, Jim Hook and the anonymous
reviewers for providing helpful comments on this paper.

References
1. Reynolds, J.C.: Separation logic: A logic for shared mutable data structures. In:
LICS 2002, Washington, DC, USA, pp. 55–74. IEEE Computer Society, Los Alami-
tos (2002)
2. Girard, J.Y.: Linear logic. Theoretical Computer Science 50, 1–102 (1987)
3. Birkedal, L., Torp-Smith, N., Reynolds, J.C.: Local reasoning about a copying
garbage collector. In: POPL 2005, pp. 220–231. ACM Press, New York (2004)
4. McCreight, A., Shao, Z., Lin, C., Li, L.: A general framework for certifying gcs and
their mutators. In: PLDI 2007, pp. 468–479. ACM, New York (2007)
5. The Coq Development Team: The Coq proof assistant, http://coq.inria.fr
6. Berdine, J., Calcagno, C., O’Hearn, P.W.: Smallfoot: Modular automatic asser-
tion checking with separation logic. In: de Boer, F.S., Bonsangue, M.M., Graf,
S., de Roever, W.-P. (eds.) FMCO 2005. LNCS, vol. 4111, pp. 115–137. Springer,
Heidelberg (2006)
7. Leroy, X.: Formal certiﬁcation of a compiler back-end, or: programming a compiler
with a proof assistant. In: POPL 2006, pp. 42–54. ACM Press, New York (2006)
358 A. McCreight

8. Cheney, C.J.: A nonrecursive list compacting algorithm. Communications of the

ACM 13(11), 677–678 (1970)
9. Appel, A.W., Blazy, S.: Separation logic for small-step Cminor. In: Schneider, K.,
Brandt, J. (eds.) TPHOLs 2007. LNCS, vol. 4732, pp. 5–21. Springer, Heidelberg
(2007)
10. Paulin-Mohring, C.: Inductive definitions in the system Coq—rules and proper-
ties. In: Bezem, M., Groote, J.F. (eds.) TLCA 1993. LNCS, vol. 664. Springer,
Heidelberg (1993)
11. Wildmoser, M., Nipkow, T.: Certifying machine code safety: Shallow versus deep
embedding. In: Slind, K., Bunker, A., Gopalakrishnan, G.C. (eds.) TPHOLs 2004.
LNCS, vol. 3223, pp. 305–320. Springer, Heidelberg (2004)
12. Appel, A.W.: Tactics for separation logic (January 2006),
http://www.cs.princeton.edu/~ appel/papers/septacs.pdf
13. Marti, N., Affeldt, R., Yonezawa, A.: Formal verification of the heap manager of an
os using separation logic. In: Liu, Z., He, J. (eds.) ICFEM 2006. LNCS, vol. 4260,
pp. 400–419. Springer, Heidelberg (2006)
14. Boutin, S.: Using reflection to build efficient and certified decision procedures. In:
Ito, T., Abadi, M. (eds.) TACS 1997. LNCS, vol. 1281, pp. 515–529. Springer,
Heidelberg (1997)
15. Myreen, M.O., Slind, K., Gordon, M.J.C.: Machine-code verification for multiple
architectures - an application of decompilation into logic. In: Proceedings of Formal
Methods in Computer-Aided Design (FMCAD) (2008)
16. Hawblitzel, C., Petrank, E.: Automated verification of practical garbage collectors.
In: POPL 2009, pp. 441–453. ACM, New York (2009)
17. Tuch, H., Klein, G., Norrish, M.: Types, bytes, and separation logic. In: POPL
2007, pp. 97–108. ACM, New York (2007)
18. Marti, N., Affeldt, R.: A certified verifier for a fragment of separation logic. In: 9th
JSSST Workshop on Programming and Prog. Langs, PPL 2007 (2007)
19. Tuerk, T.: A separation logic framework in HOL. In: Otmane Ait Mohamed, C.M.,
Tahar, S. (eds.) TPHOLs 2008, August 2008, pp. 116–122 (2008)
20. Calcagno, C., Distefano, D., O’Hearn, P., Yang, H.: Compositional shape analysis
by means of bi-abduction. In: POPL 2009, pp. 289–300. ACM, New York (2009)
21. Mehta, F., Nipkow, T.: Proving pointer programs in higher-order logic. Inf. Com-
put. 199(1-2), 200–227 (2005)
22. Bulwahn, L., Krauss, A., Haftmann, F., Erkök, L., Matthews, J.: Imperative func-
tional programming with Isabelle/HOL. In: Mohamed, O.A., Muñoz, C., Tahar, S.
(eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 134–149. Springer, Heidelberg (2008)
23. Filliâtre, J.C., Marché, C.: The Why/Krakatoa/Caduceus platform for deduc-
tive program verification. In: Damm, W., Hermanns, H. (eds.) CAV 2007. LNCS,
vol. 4590, pp. 173–177. Springer, Heidelberg (2007)
Verified LISP Implementations on
ARM, x86 and PowerPC

Magnus O. Myreen and Michael J.C. Gordon

Computer Laboratory, University of Cambridge, UK

Abstract. This paper reports on a case study, which we believe is the

first to produce a formally verified end-to-end implementation of a func-
tional programming language running on commercial processors. Inter-
preters for the core of McCarthy’s LISP 1.5 were implemented in ARM,
x86 and PowerPC machine code, and proved to correctly parse, evaluate
and print LISP s-expressions. The proof of evaluation required working
on top of verified implementations of memory allocation and garbage
collection. All proofs are mechanised in the HOL4 theorem prover.

1 Introduction
Explicit pointer manipulation is an endless source of errors in low-level programs.
Functional programming languages hide pointers and thereby achieve a more
abstract programming environment. The downside with functional programming
(and Java/C# programming) is that the programmer has to trust automatic
memory management routines built into run-time environments.
In this paper we report on a case study, which we believe is the first to
produce a formally verified end-to-end implementation of a functional program-
ming language. We have implemented, in ARM, x86 and PowerPC machine code,
a program which parses, evaluates and prints LISP; and furthermore formally
proved that our implementation respects a semantics of the core of LISP 1.5 [6].
Instead of assuming correctness of run-time routines, we build on a verified im-
plementation of allocation and garbage collection.
For a flavour of what we have implemented and proved consider an example:
if our implementation is supplied with the following call to pascal-triangle,
(pascal-triangle ’((1)) ’6)

it parses the string, evaluates the expression and prints a string,

((1 6 15 20 15 6 1)
(1 5 10 10 5 1)
(1 4 6 4 1)
(1 3 3 1)
(1 2 1)
(1 1)
(1))

where pascal-triangle had been supplied to it as

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 359–374, 2009.

c Springer-Verlag Berlin Heidelberg 2009
360 M.O. Myreen and M.J.C. Gordon

(label pascal-triangle
(lambda (rest n)
(cond ((equal n ’0) rest)
(’t (pascal-triangle
(cons (pascal-next ’0 (car rest)) rest) (- n ’1))))))

with auxiliary function:

(label pascal-next
(lambda (p xs)
(cond ((atomp xs) (cons p ’nil))
(’t (cons (+ p (car xs)) (pascal-next (car xs) (cdr xs)))))))

The theorem we have proved about our LISP implementation can be used to
show e.g. that running pascal-triangle will terminate and print the first n + 1
rows of Pascal’s triangle, without a premature exit due to lack of heap space. One
can use our theorem to derive sufficient conditions on the inputs to guarantee
that there will be enough heap space.
We envision that our verified LISP interpreter will provide a platform on top
of which formally verified software can be produced with much greater ease than
at lower levels of abstraction, i.e. in languages where pointers are made explicit.
Why LISP? We chose to implement and verify a LISP interpreter since LISP
has a neat definition of both syntax and semantics [12] and is still a very powerful
language as one can see, for example, in the success of ACL2 [8]. By choosing
LISP we avoided verifying machine code which performs static type checking.
Our proofs [14] are mechanised in the HOL4 theorem prover [19].

2 Methodology
Instead of delving into the many detailed invariants developed for our proofs,
this paper will concentrate on describing the methodology we used:
" First, machine code for various LISP primitives, such as car, cdr, cons, was
written and verified (Section 3);
• The correctness of each code snippets is expressed as a machine-code
Hoare triple [15]: { pre ∗ pc p } p : code { post ∗ pc (p + exit) }.
• For cons and equal we used previously developed proof automation [15],
which allows for proof reuse in between different machine languages.
" Second, the verified LISP primitives were input into a proof-producing com-
piler in such a way that the compiler can view the processors as a machine
with six registers containing LISP s-expressions (Section 4);
• The compiler [16] we use maps tail-recursive functions, defined in the
logic of HOL4, down to machine code and proves that the generated
code executes the original HOL4 functions.
• Theorems describing the LISP primitives were input into the compiler,
which can use them as building blocks when deriving new code/proofs.
Verified LISP Implementations on ARM, x86 and PowerPC 361

" Third, LISP evaluation was deﬁned as a (partially-speciﬁed) tail-recursive

function lisp eval, and then compiled into machine code using the compiler
mentioned above (Section 5).
• LISP evaluation was defined as a tail-recursive function which only uses
expressions/names for which the compiler has verified building blocks.
• lisp eval maintains a stack and a symbol-value list.
" Fourth, to gain confidence that lisp eval implements ‘LISP evaluation’, we
proved that lisp eval implements a semantics of LISP 1.5 [12] (Section 6).
• Our relational semantics of LISP [6] is a formalisation of a subset of
McCarthy’s original LISP 1.5 [12], with dynamic binding.
• The semantics abstracts the stack and certain evaluation orders.
" Finally, the verified LISP interpreters were sandwiched between a verified
parser and printer to produce string-to-string theorems describing the be-
haviour of the entire implementation (Section 7).
• The parser and printer code, respectively, sets up and tears down an
appropriate heap for s-expressions.

Sections 8 and 9 give quantitative data on the eﬀort and discuss related work,
respectively. Some deﬁnitions and proofs are presented in the Appendixes.

3 LISP Primitives

LISP programs are expressed in and operate over s-expressions, expressions that
are either a (natural) number, a symbol or a pair of s-expressions. In HOL,
s-expressions are readily modelled using a data-type with constructors:

Num : N → SExp
Sym : string → SExp
Dot : SExp → SExp → SExp

LISP programs and s-expressions are conventionally written in an abbreviated

string form. A few examples will illustrate the correspondence, which is given a
formal deﬁnition in Appendix D.

(car x) means Dot (Sym "car") (Dot (Sym "x") (Sym "nil"))
(1 2 3) means Dot (Num 1) (Dot (Num 2) (Dot (Num 3) (Sym "nil")))
’f means Dot (Sym "quote") (Dot (Sym "f") (Sym "nil"))
(4 . 5) means Dot (Num 4) (Num 5)

Some basic LISP primitives are deﬁned over SExp as follows:

car (Dot x y) = x
cdr (Dot x y) = y
362 M.O. Myreen and M.J.C. Gordon

cons x y = Dot x y

plus (Num m) (Num n) = Num (m + n)

minus (Num m) (Num n) = Num (m − n)
times (Num m) (Num n) = Num (m × n)
division (Num m) (Num n) = Num (m div n)
modulus (Num m) (Num n) = Num (m mod n)

equal x y = if x = y then Sym "t" else Sym "nil"

less (Num m) (Num n) = if m < n then Sym "t" else Sym "nil"

In the deﬁnition of equal, expression x = y tests standard structural equality.

3.1 Speciﬁcation of Primitive Operations

Before writing and verifying the machine code implementing primitive LISP
operations, a decision had to be made how to represent Num, Sym and Dot on a
real machine. To keep memory usage to a minimum each Dot-pair is represented
as a block of two pointers stored consecutively on the heap, each Num n is
represented as a 32-bit word containing 4 × n + 2 (i.e. only natural numbers
0 ≤ n < 230 are representable), and each Sym s is represented as a 32-bit word
containing 4×i+3, where i is the row number of symbol s in a symbol table which,
in our implementation, is a linked-list kept outside of the garbage-collected heap.
Here ‘+2’ and ‘+3’ are used as tags to make sure that the garbage collector
can distinguish Num and Sym values from proper pointers. Pointers to Dot-pairs
are word-aligned, i.e. a mod 4 = 0, a condition the collector tests by computing
a & 3 = 0, where & is bitwise-and.
This simple and small representation of SExp allows most LISP primitives
from the previous section to be implemented in one or two machine instruc-
tions. For example, taking car of register 3 and storing the result in register 4 is
implemented on ARM as a load instruction:

E5934000 ldr r4,[r3] (* load into reg 4, memory at address reg 3 *)

Similarly, ARM code for performing LISP operation plus of register 3 and 4, and
storing the result into register 3 is implemented by:

E0833004 add r3,r3,r4 (* reg 3 is assigned value reg 3 + reg 4 *)

E2433002 sub r3,r3,#2 (* reg 3 is assigned value reg 3 - 2 *)

The intuition here is: (4 × m + 2) + (4 × n + 2) − 2 = 4 × (m + n) + 2.

The correctness of the above implementations of car and plus is expressed for-
mally by the two ARM Hoare triples [15] below. Here lisp (v1 , v2 , v3 , v4 , v5 , v6 , l)
is an assertion, deﬁned below, which asserts that a heap with room for l Dot-
pairs is located in memory and that s-expressions v1 ...v6 (each of type SExp) are
stored in machine registers. This lisp assertion should be understood as lifting
Veriﬁed LISP Implementations on ARM, x86 and PowerPC 363

the level of abstraction to a level where speciﬁc machine instructions make the
processor seem as if it has six1 registers containing s-expressions, of type SExp.
(∃x y. Dot x y = v1 ) ⇒
{ lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc p }
p : E5934000
{ lisp (v1 , car v1 , v3 , v4 , v5 , v6 , l) ∗ pc (p + 4) }

(∃m n. Num m = v1 ∧ Num n = v2 ∧ m+n < 230 ) ⇒

{ lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc p }
p : E0833004 E2433002
{ lisp (plus v1 v2 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc (p + 8) }
The new assertion is deﬁned for ARM (lisp), x86 (lisp’), and PowerPC (lisp”)
as maintaining a relation lisp inv between the abstract state v1 ...v6 (each of type
SExp) and the concrete state x1 ...x6 (each of type 32-bit word). The details of
lisp inv (deﬁned in Appendix A) and the separating conjunction ∗ (explained in
Myreen [14]) are unimportant for this presentation.
lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) =
∃x1 x2 x3 x4 x5 x6 m1 m2 m3 a temp. m m1 ∗ m m2 ∗ m m3 ∗
r2 temp ∗ r3 x1 ∗ r4 x2 ∗ r5 x3 ∗ r6 x4 ∗ r7 x5 ∗ r8 x6 ∗ r10 a ∗
lisp inv (v1 , v2 , v3 , v4 , v5 , v6 , l) (x1 , x2 , x3 , x4 , x5 , x6 , a, m1 , m2 , m3 )

lisp’ (v1 , v2 , v3 , v4 , v5 , v6 , l) =
∃x1 x2 x3 x4 x5 x6 m1 m2 m3 a. m m1 ∗ m m2 ∗ m m3 ∗
eax x1 ∗ ecx x2 ∗ edx x3 ∗ ebx x4 ∗ esi x5 ∗ edi x6 ∗ ebp a ∗
lisp inv (v1 , v2 , v3 , v4 , v5 , v6 , l) (x1 , x2 , x3 , x4 , x5 , x6 , a, m1 , m2 , m3 )

lisp” (v1 , v2 , v3 , v4 , v5 , v6 , l) =
∃x1 x2 x3 x4 x5 x6 m1 m2 m3 a temp. m m1 ∗ m m2 ∗ m m3 ∗
r2 temp ∗ r3 x1 ∗ r4 x2 ∗ r5 x3 ∗ r6 x4 ∗ r7 x5 ∗ r8 x6 ∗ r10 a ∗
lisp inv (v1 , v2 , v3 , v4 , v5 , v6 , l) (x1 , x2 , x3 , x4 , x5 , x6 , a, m1 , m2 , m3 )
The following examples will use only lisp deﬁned for ARM.

3.2 Memory Layout and Speciﬁcation of ‘Cons’ and ‘Equal’

Two LISP primitives required code longer than one or two machine instructions,
namely cons and equal. Memory allocation, i.e. cons, requires an allocation pro-
cedure combined with a garbage collector. However, the top-level speciﬁcation,
which is explained next, hides these facts. Let size count the number of Dot-pairs
in an expression.
size (Num w) = 0
size (Sym s) = 0
size (Dot x y) = 1 + size x + size y
1
Number six was chosen since six is suﬃcient and suits the x86 implementation best.
364 M.O. Myreen and M.J.C. Gordon

The speciﬁcation of cons guarantees that its implementation will always succeed
as long as the number of reachable Dot-pairs is less than the capacity of the
heap, i.e. less than l. This precondition under approximates pointer aliasing.
size v1 + size v2 + size v3 + size v4 + size v5 + size v6 < l ⇒
{ lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc p }
p : E50A3018 E50A4014 E50A5010 E50A600C ... E51A8004 E51A7008
{ lisp (cons v1 v2 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc (p + 332) }
The implementation of cons includes a copying collector which implements
Cheney’s algorithm [2]. This copying collector requires the heap to be split into
two heap halves of equal size; only one of which is used for heap data at any
one point in time. When a collection request is issued, all live elements from the
currently used heap half are copied over to the currently unused heap half. The
proof of cons is outlined in the ﬁrst author’s PhD thesis [14].
The fact that one half of the heap is left empty might seem to be a waste
of space. However, the other heap half need not be left completely unused, as
the implementation of equal can make use of it. The LISP primitive equal tests
whether two s-expressions are structurally identical by traversing the expression
tree as a normal recursive procedure. This recursive traversal requires a stack,
but the stack can in this case be built inside the unused heap half as the garbage
collector will not be called during the execution of equal. Thus, the implementa-
tion of equal uses no external stack and requires no conditions on the size of the
expressions v1 and v2 , as their depths cannot exceed the length of a heap half.
{ lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc p }
p : E1530004 03A0300F 0A000025 E50A4014 ... E51A7008 E51A8004
{ lisp (equal v1 v2 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc (p + 164) }

4 Compiling s-Expression Functions to Machine Code

The previous sections described the theorems which state that certain machine
instructions execute LISP primitives. These theorems can be used to augment
the input-language understood by a proof-producing compiler that we have de-
veloped [16]. The theorems mentioned above allow the compiler to accept:
let v2 = car v1 in ...
let v1 = plus v1 v2 in ...
let v1 = cons v1 v2 in ...
let v1 = equal v1 v2 in ...
Theorems for basic tests have also been proved in a similar manner, and can be
provided to the compiler. For example, the following theorem shows that ARM
instruction E3330003 assigns boolean value (v1 = Sym "nil") to status bit z.
{ lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc p ∗ s }
p : E3330003
{ lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc (p + 4) ∗ sz (v1 = Sym "nil") ∗
∃n c v. sn n ∗ sc c ∗ sv v }
Veriﬁed LISP Implementations on ARM, x86 and PowerPC 365

The compiler can use such theorems to create branches on the expression as-
signed to status bits. The above theorem adds support for the if-statement:

if v1 = Sym "nil" then ... else ...

Once the compiler was given suﬃcient Hoare-triple theorems it could be used
to compile functions operating over s-expressions into machine code. An example
will illustrate the process. From the following function
sumlist(v1 , v2 , v3 ) = if v1 = Sym "nil" then (v1 , v2 , v3 ) else
let v3 = car v1 in
let v1 = cdr v1 in
let v2 = plus v2 v3 in
sumlist(v1 , v2 , v3 )

the compiler produces the theorem below, containing the generated ARM ma-
chine code and a precondition sumlist pre(v1 , v2 , v3 ).

sumlist pre(v1 , v2 , v3 ) ⇒
{ lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc p ∗ s }
p : E3330003 0A000004 E5935000 E5934004 E0844005 E2444002 EAFFFFF8
{ let (v1 , v2 , v3 ) = sumlist(v1 , v2 , v3 ) in
lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc (p + 28) ∗ s }

The proof performed by the compiler is outlined in Appendix C, where the

precondition sumlist pre(v1 , v2 , v3 ) is also deﬁned. The automatically generated
pre-functions collect side conditions that must be true for proper execution of
the code, e.g. when cons is used the pre-functions collect the requirements on
not exceeding the heap limit l.

5 Assembling the LISP Evaluator

LISP evaluation was deﬁned as a large tail-recursive function lisp eval and then
compiled, to ARM, PowerPC and x86, to produce theorems of the following
form. The theorem below states that the generated ARM code executes lisp eval
for inputs that do not violate any of the side conditions gathered in lisp eval pre.
lisp eval pre(v1 , v2 , v3 , v4 , v5 , v6 , l) ⇒
{ lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc p ∗ s }
p : E3360003 1A0001D1 E3A0600F E3130001 0A000009 ... EAFFF85D
{ lisp (lisp eval(v1 , v2 , v3 , v4 , v5 , v6 , l)) ∗ pc (p + 7816) ∗ s }

lisp eval evaluates the expression stored in v1 , input v6 is a list of symbol-value

pairs against which symbols in v1 are evaluated, inputs v2 , v3 , v4 and v5 are
used as temporaries that are to be initialised with Sym "nil". The heap limit l
had to be passed into lisp eval due to an implementation restriction which re-
quires lisp eval pre to input the same variables as lisp eval. The side condition
lisp eval pre uses l to state restrictions on applications of cons.
366 M.O. Myreen and M.J.C. Gordon

6 Evaluator Implements McCarthy’s LISP 1.5

The previous sections, and Appendix C, described how a function lisp eval was
compiled down to machine code. The compiler generated some code and derived
a theorem which states that the generated code correctly implements lisp eval.
However, the compiler does not (and cannot) give any evidence that lisp eval in
fact implements ‘LISP evaluation’. The definition of lisp eval is long and full of
tedious details of how the intermediate stack is maintained and used, and thus
it is far from obvious that lisp eval corresponds to ‘LISP evaluation’.
In order to gain confidence that the generated machine code actually imple-
ments LISP evaluation, we proved that lisp eval implements a clean relational
semantics of LISP 1.5 [6]. Our relational semantics of LISP 1.5 is defined in
terms of three mutually recursive relations →eval , →eval list and →app . Here
(f n, [arg1 ; · · · ; argn ], ρ) →app s means that f n[arg1 ; · · · ; argn ] = s if the free
variables in f n have values specified by an environment ρ; similarly (e, ρ) →eval s
holds if term e evaluates to s-expression s with respect to environment ρ; and
(el, ρ) →eval list sl holds if list el of expressions evaluates to list sl of expressions
with respect to ρ. Here k denotes built-in function names and c constants. For
details refer to Gordon [6] and Appendix A in Myreen [14].
ok name v
(v, ρ) →eval ρ(v) (c, ρ) →eval c ([ ], ρ) →eval nil

(p, ρ) →eval nil ∧ ([gl], ρ) →eval s (p, ρ) →eval x ∧ x = nil ∧ (e, ρ) →eval s
([p → e; gl], ρ) →eval s ([p → e; gl], ρ) →eval s

can apply k args (ρ(f ), args, ρ) →app s ∧ ok name f

(k, args, ρ) →app k args (f, args, ρ) →app s

(e, ρ[args/vars]) →eval s (f n, args, ρ[f n/x]) →app s

(λ[[vars]; e], args, ρ) →app s (label[[x]; f n], args, ρ) →app s

(e, ρ) →eval s ∧ ([el], ρ) →eval list sl

([ ], ρ) →eval list [] ([e; el], ρ) →eval list [s; sl]

We have proved that whenever the relation for LISP 1.5 evaluation →eval
relates expression s under environment ρ to expression r, then lisp eval will do
the same. Here t and u are translation functions, from one form of s-expressions
to another. Let nil = Sym "nil" and fst (x, y, . . .) = x.

∀s ρ r. (s, ρ) →eval r ⇒ fst (lisp eval (t s, nil, nil, nil, u ρ, nil, l)) = t r

7 Veriﬁed Parser and Printer

Sections 4 and 5 explained how machine code was generated and proved to
implement a function called lisp eval. The precondition of the certiﬁcate theorem
requires the initial state to satisfy a complex heap invariant lisp. How do we know
that this precondition is not accidentally equivalent to false, making the theorem
Veriﬁed LISP Implementations on ARM, x86 and PowerPC 367

vacuously true? To remedy this shortcoming, we have verified machine code that
will set-up an appropriate state from scratch.
The set-up and tear-down code includes a parser and printer that will, re-
spectively, read in an input s-expression and print out the resulting s-expression.
The development of the parser and printer started by first defining a function
sexp2string which lays down how s-expressions are to be represented in string
form (Appendix D). Then a function string2sexp was defined for which we proved:

∀s. sexp ok s ⇒ string2sexp (sexp2string s) = s

Here sexp ok s makes sure that s does not contain symbols that print ambigu-
ously, e.g. Sym "", Sym "(" and Sym "2". The parsing function was deﬁned as a
composition of a lexer sexp lex and a token parser sexp parse (Appendix D).

string2sexp str = car (sexp parse (reverse (sexp lex str)) (Sym "nil") [])

Machine code was written and verified based on the high-level functions sexp lex,
sexp parse and sexp2string. Writing these high-level definitions first was a great
help when constructing the machine code (using the compiler from [16]).
The overall theorems about our LISP implementations are of the following
form. If →eval relates s with r under the empty environment (i.e. (s, []) →eval r),
no illegal symbols are used (i.e. sexp ok (t s)), running lisp eval on t s will not run
out of memory (i.e. lisp eval pre(t s, nil, nil, nil, nil, nil, l)), the string representation
of t s is in memory (i.e. string a (sexp2string (t s))), and there is enough space to
parse t s and set up a heap of size l (i.e. enough space (t s) l), then the code will
execute successfully and terminate with the string representation of t r stored
in memory (i.e. string a (sexp2string (t r))). The ARM code expects the address
of the input string to be in register 3, i.e. r3 a.

∀s r l p.
(s, []) →eval r ∧ sexp ok (t s) ∧ lisp eval pre(t s, nil, nil, nil, nil, nil, l) ⇒
{ ∃a. r3 a ∗ string a (sexp2string (t s)) ∗ enough space (t s) l ∗ pc p }
p : ... code not shown ...
{ ∃a. r3 a ∗ string a (sexp2string (t r)) ∗ enough space’ (t s) l ∗ pc (p+10404) }

The input needs to be in register 3 for PowerPC and the eax register for x86.

8 Quantitative Data

The idea for this project first arose approximately two years ago. Since then
a decompiler [15] and compiler [16] have been developed to aid this project,
which produced in total some 4,580 lines of proof automation and 16,130 lines
of interactive proofs and definitions, excluding the definitions of the instruction
set models [5,9,18]. Running through all of the proofs takes approximately 2.5
hours in HOL4 using PolyML.
368 M.O. Myreen and M.J.C. Gordon

The veriﬁed LISP implementations seem to have reasonable execution times:

the pascal-triangle example, from Section 1, executes on a 2.4 GHz x86 pro-
cessor in less than 1 millisecond and on a 67 MHz ARM processor in approxi-
mately 90 milliseconds. The PowerPC implementations have not yet been tested
on real hardware. The ARM implementation is 2,601 instructions long (10,404
bytes), x86 is 3,135 instructions (9,054 bytes) and the PowerPC implementation
consists of 2,929 instructions (11,716 bytes).

9 Discussion of Related Work

This project has produced trustworthy implementations of LISP. The VLISP

project by Guttman et al. [7] shared our goal, but diﬀered in many other aspects.
For example, the VLISP project implemented a larger LISP dialect, namely
Scheme, and emphasised rigour, not full formality:

“ The veriﬁcation was intended to be rigorous, but not completely formal,

much in the style of ordinary mathematical discourse. Our goal was to
verify the algorithms and data types used in the implementation, not
their embodiment in the code. ”

The VLISP project developed an implementation which translates Scheme pro-

grams into byte code that is then run on a rigorously verified interpreter. Much
like our project, the VLISP project developed their interpreter in a subset of
the source language: for them PreScheme, and for us, the input language of our
augmented compiler, Section 4.
Work that aims to implement functional languages, in a formally verified man-
ner, include Pike et al. [17] on a certifying compiler from Cryptol (a dialect of
Haskell) to AAMP7 code; Dargaye and Leroy [4] on a certified compiler from
mini-ML to PowerPC assembly; Li and Slind’s work [10] on a certifying com-
piler from a subset of HOL4 to ARM assembly; and also Chlipala’s certified
compiler [3] from the lambda calculus to an invented assembly language. The
above work either assumes that the environment implements run-time memory
management correctly [3,4] or restricts the input language to a degree where
no run-time memory management is needed [10,17]. It seems that none of the
above have made use of (the now large number of) verified garbage collectors
(e.g. McCreight et al. [13] have been performing correctness proofs for increas-
ingly sophisticated garbage collectors).
The parser and printer proofs, in Section 7, involved verifying implementations
of string-copy, -length, -compare etc., bearing some resemblance to pioneering
work by Boyer and Yu [1] on verification of machine code. They verified Motorola
MC68020 code implementing a library of string functions.

Acknowledgements. We thank Anthony Fox, Xavier Leroy and Susmit Sarkar

et al. for allowing us to use their processor models for this work [5,9,18]. We also
thank Thomas Tuerk, Joe Hurd, Konrad Slind and John Matthews for comments
and discussions. We are grateful for funding from EPSRC, UK.
Veriﬁed LISP Implementations on ARM, x86 and PowerPC 369

References

1. Boyer, R.S., Yu, Y.: Automated proofs of object code for a widely used micropro-
cessor. J. ACM 43(1), 166–192 (1996)
2. Cheney, C.J.: A non-recursive list compacting algorithm. Commun. ACM 13(11),
677–678 (1970)
3. Chlipala, A.J.: A certified type-preserving compiler from lambda calculus to as-
sembly language. In: Programming Language Design and Implementation (PLDI),
pp. 54–65. ACM, New York (2007)
4. Dargaye, Z., Leroy, X.: Mechanized verification of CPS transformations. In: Der-
showitz, N., Voronkov, A. (eds.) LPAR 2007. LNCS, vol. 4790, pp. 211–225.
Springer, Heidelberg (2007)
5. Fox, A.: Formal specification and verification of ARM6. In: Basin, D., Wolff, B.
(eds.) TPHOLs 2003. LNCS, vol. 2758, pp. 25–40. Springer, Heidelberg (2003)
6. Gordon, M.: Defining a LISP interpreter in a logic of total functions. In: The ACL2
Theorem Prover and Its Applications, ACL2 (2007)
7. Guttman, J., Ramsdell, J., Wand, M.: VLISP: A verified implementation of scheme.
Lisp and Symbolic Computation 8(1/2), 5–32 (1995)
8. Kaufmann, M., Moore, J.S.: An ACL2 tutorial. In: Mohamed, O.A., Muñoz, C.,
Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 17–21. Springer, Heidelberg
(2008)
9. Leroy, X.: Formal certification of a compiler back-end, or: programming a compiler
with a proof assistant. In: Principles of Programming Languages (POPL), pp. 42–
54. ACM Press, New York (2006)
10. Li, G., Owens, S., Slind, K.: A proof-producing software compiler for a subset of
higher order logic. In: European Symposium on Programming (ESOP). LNCS, pp.
205–219. Springer, Heidelberg (2007)
11. Manolios, P., Strother Moore, J.: Partial functions in ACL2. J. Autom. Reason-
ing 31(2), 107–127 (2003)
12. McCarthy, J., Abrahams, P.W., Edwards, D.J., Hart, T.P., Levin, M.I.: LISP 1.5
Programmer’s Manual. The MIT Press, Cambridge (1966)
13. McCreight, A., Shao, Z., Lin, C., Li, L.: A general framework for certifying garbage
collectors and their mutators. In: Ferrante, J., McKinley, K.S. (eds.) Proceedings
of the Conference on Programming Language Design and Implementation (PLDI),
pp. 468–479. ACM, New York (2007)
14. Myreen, M.O.: Formal verification of machine-code programs. PhD thesis, Univer-
sity of Cambridge (2009)
15. Myreen, M.O., Slind, K., Gordon, M.J.C.: Machine-code verification for multiple
architectures – An application of decompilation into logic. In: Formal Methods in
Computer Aided Design (FMCAD). IEEE, Los Alamitos (2008)
16. Myreen, M.O., Slind, K., Gordon, M.J.C.: Extensible proof-producing compilation.
In: Compiler Construction (CC). LNCS. Springer, Heidelberg (2009)
17. Pike, L., Shields, M., Matthews, J.: A verifying core for a cryptographic language
compiler. In: Manolios, P., Wilding, M. (eds.) Proceedings of the Sixth Interna-
tional Workshop on the ACL2 Theorem Prover and its Applications. HappyJack
Books (2006)
18. Sarkar, S., Sewell, P., Nardelli, F.Z., Owens, S., Ridge, T., Myreen, T.B.M.O.,
Alglave, J.: The semantics of x86-CC multiprocessor machine code. In: Principles
of Programming Languages (POPL). ACM, New York (2009)
370 M.O. Myreen and M.J.C. Gordon

19. Slind, K., Norrish, M.: A brief overview of HOL4. In: Mohamed, O.A., Muñoz, C.,
Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 28–32. Springer, Heidelberg
(2008)

A Deﬁnition of lisp inv in HOL4

The deﬁnition of the main invariant of the LISP state.

ALIGNED a = (a && 3w = 0w)

string mem "" (a,m,dm) = T

string mem (STRING c s) (a,m,df) = a ∈ dm ∧
(m a = n2w (ORD c)) ∧ string mem s (a+1w,m,dm)

symbol table [] x (a,dm,m,dg,g) = (m a = 0w) ∧ a ∈ dm ∧ (x = {})

symbol table (s::xs) x (a,dm,m,dg,g) = (s = "") ∧ ¬ MEM s xs ∧
(m a = n2w (string size s)) ∧ {a; a+4w} ⊆ dm ∧ ((a,s) ∈ x) ∧
let a’ = a + n2w (8 + (string size s + 3) DIV 4 * 4) in
a < a’ ∧ (m (a+4w) = a’) ∧ string mem s (a+8w,g,dg) ∧
symbol table xs (x - {(a,s)}) (a’,dm,m,dg,g)

builtin =
["nil"; "t"; "quote"; "+"; "-"; "*"; "div"; "mod"; "<"; "car"; "cdr";
"cons"; "equal"; "cond"; "atomp"; "consp"; "numberp"; "symbolp"; "lambda"]

lisp symbol table sym (a,dm,m,dg,g) =

∃syms. symbol table (builtin ++ syms) { (b,s) | (b-a,s) ∈ sym } (a,dm,m,dg,g)

lisp x (Num k) (a,dm,m) sym = (a = n2w (k * 4 + 2)) ∧ k < 2 ** 30

lisp x (Sym s) (a,dm,m) sym = ALIGNED (a - 3w) ∧ (a - 3w,s) ∈ sym
lisp x (Dot x y) (a,dm,m) sym = lisp x x (m a,dm,m) sym ∧ a ∈ dm ∧ ALIGNED a ∧
lisp x y (m (a+4w),dm,m) sym

ref set a f = {a + 4w * n2w i | i < 2 * f + 4} ∪ {a - 4w * n2w i | i ≤ 8}

ch active set (a,i,e) = { a + 8w * n2w j | i ≤ j ∧ j < e }

ok data w d = if ALIGNED w then w ∈ d else ¬(ALIGNED (w - 1w))

lisp inv (t1,t2,t3,t4,t5,t6,l) (w1,w2,w3,w4,w5,w6,a,(dm,m),sym,(dh,h),(dg,g)) =

∃i u.
let v = if u then 1 + l else 1 in
let d = ch_active_set (a,v,i) in
32 ≤ w2n a ∧ w2n a + 2 * 8 * l + 20 < 2 ** 32 ∧ l = 0 ∧
(m a = a + n2w (8 * i)) ∧ ALIGNED a ∧ v ≤ i ∧ i ≤ v + l ∧
(m (a + 4w) = a + n2w (8 * (v + l))) ∧
(m (a - 28w) = if u then 0w else 1w) ∧
(m (a - 32w) = n2w (8 * l)) ∧ (dm = ref_set a (l + l + 1)) ∧
lisp_symbol_table sym (a + 16w * n2w l + 24w,dh,h,dg,g) ∧
lisp_x t1 (w1,d,m) sym ∧ lisp_x t2 (w2,d,m) sym ∧ lisp_x t3 (w3,d,m) sym ∧
lisp_x t4 (w4,d,m) sym ∧ lisp_x t5 (w5,d,m) sym ∧ lisp_x t6 (w6,d,m) sym ∧
∀w. w ∈ d ⇒ ok_data (m w) d ∧ ok_data (m (w + 4w)) d

B Sample Veriﬁcation Proof of ‘Car’ Primitive

The veriﬁcation proofs of the primitive LISP operations build on lemmas about
lisp inv. The following lemma is used in the proof of the theorem about car
described in Section 3.1. This lemma can be read as saying that, if lisp inv relates
x1 to Dot-pair v1 , then x1 is a word-aligned address into memory segment m,
Veriﬁed LISP Implementations on ARM, x86 and PowerPC 371

and an assignment of car v1 to v2 corresponds to replacing x2 with the value of

memory m at address x1 , i.e. m(x1 ).

(∃x y. Dot x y = v1 ) ∧
lisp inv (v1 , v2 , v3 , v4 , v5 , v6 , l) (x1 , x2 , x3 , x4 , x5 , x6 , a, m, m2 , m3 ) ⇒
(x1 & 3 = 0) ∧ x1 ∈ domain m ∧
lisp inv (v1 , car v1 , v3 , v4 , v5 , v6 , l) (x1 , m(x1 ), x3 , x4 , x5 , x6 , a, m, m2 , m3 )

One of our tools derives the following Hoare triple theorem for the ARM instruc-
tion that is to be veriﬁed: ldr r4,[r3] (encoded as E5934000).

{r3 r3 ∗ r4 r4 ∗ m m ∗ pc p ∗ (r3 & 3 = 0) ∧ r3 ∈ domain m }

p : E5934000
{r3 r3 ∗ r4 m(r3 ) ∗ m m ∗ pc (p+4) }

Application of the frame rule (shown in Appendix C) produces:

{r3 r3 ∗ r4 r4 ∗ m m ∗ pc p ∗ (r3 & 3 = 0) ∧ r3 ∈ domain m ∗

r5 x3 ∗ r6 x4 ∗ r7 x5 ∗ r8 x6 ∗ r10 a ∗ m m2 ∗ m m3 ∗
lisp inv (v1 , v2 , v3 , v4 , v5 , v6 , l) (r3 , r4 , x3 , x4 , x5 , x6 , a, m, m2 , m3 )}
p : E5934000
{r3 r3 ∗ r4 m(r3 ) ∗ m m ∗ pc (p+4) ∗ (r3 & 3 = 0) ∧ r3 ∈ domain m ∗
r5 x3 ∗ r6 x4 ∗ r7 x5 ∗ r8 x6 ∗ r10 a ∗ m m2 ∗ m m3 ∗
lisp inv (v1 , v2 , v3 , v4 , v5 , v6 , l) (r3 , r4 , x3 , x4 , x5 , x6 , a, m, m2 , m3 )}

Now the postcondition can be weakened to the desired expression:

{r3 r3 ∗ r4 r4 ∗ m m ∗ pc p ∗ (r3 & 3 = 0) ∧ r3 ∈ domain m ∗
r5 x3 ∗ r6 x4 ∗ r7 x5 ∗ r8 x6 ∗ r10 a ∗ m m2 ∗ m m3 ∗
lisp inv (v1 , v2 , v3 , v4 , v5 , v6 , l) (r3 , r4 , x3 , x4 , x5 , x6 , a, m, m2 , m3 )}
p : E5934000
{ lisp (v1 , car v1 , v3 , v4 , v5 , v6 , l) ∗ pc (p + 4) }

Since variables r3 , r4 , x3 , x4 , x5 , x6 , m, m2 , m3 do not appear in the post-

condition, they can be existentially quantiﬁed in the precondition, which then
strengthens as follows:
{ lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc p ∗ ∃x y. Dot x y = v1 }
p : E5934000
{ lisp (v1 , car v1 , v3 , v4 , v5 , v6 , l) ∗ pc (p + 4) }

The speciﬁcation for car follows by moving the boolean condition:

(∃x y. Dot x y = v1 ) ⇒
{ lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc p }
p : E5934000
{ lisp (v1 , car v1 , v3 , v4 , v5 , v6 , l) ∗ pc (p + 4) }

All of the primitive LISP operations were veriﬁed in the same manner. For the
HOL4 implementation, a 50-line ML program was written to automate these
proofs given the appropriate lemmas about lisp inv.
372 M.O. Myreen and M.J.C. Gordon

C Proof Performed by Compiler

Internally the compiler runs through a short proof when constructing the the-
orem presented in Section 4. This proof makes use of the following five proof
rules derived from the definition of our machine-code Hoare triple, developed in
previous work [15]. Formal definitions and detailed explanations are given in the
first author’s PhD thesis [14]. Here ∪ is simply set union.

frame: {p} c {q} ⇒ ∀r. {p ∗ r} c {q ∗ r}

code extension: {p} c {q} ⇒ ∀d. {p} c ∪ d {q}
composition: {p} c {q} ∧ {q} d {r} ⇒ {p} c ∪ d {r}
move pure: {p ∗ b} c {q} = (b ⇒ {p} c {q})
tail recursion: (∀x. P (x) ∧ G(x) ⇒ {p(x)} c {p(F (x))}) ∧
(∀x. P (x) ∧ ¬G(x) ⇒ {p(x)} c {q(D(x))}) ⇒
(∀x. pre(G, F, P )(x) ⇒ {p(x)} c {q(tailrec(G, F, D)(x))})

The last rule mentions tailrec and pre, which are functions that satisfy:

∀x. tailrec(G, F, D)(x) = if G(x) then tailrec(G, F, D)(F (x)) else D(x)
∀x. pre(G, F, P )(x) = if G(x) then pre(G, F, P )(F (x)) ∧ P (x) else P (x)

Note that any tail-recursive function can be deﬁned as an instance of tailrec, in-
troduced using a trick by Manolios and Moore [11]. Another noteworthy feature:
if pre(G, F, P )(x) is true then tailrec(G, F, D) terminates for input x.
The compiler starts its proof from the following theorems describing the test
v1 = Sym "nil" as well as operations car, cdr and plus.

1. { lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc p ∗ s }
p : E3330003
{ lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc (p + 4) ∗ sz (v1 = Sym "nil") ∗
∃n c v. sn n ∗ sc c ∗ sv v }
2. (∃x y. Dot x y = v1 ) ⇒
{ lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc p ∗ s }
p : E5935000
{ lisp (v1 , v2 , car v1 , v4 , v5 , v6 , l) ∗ pc (p + 4) }
3. (∃x y. Dot x y = v1 ) ⇒
{ lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc p ∗ s }
p : E5933004
{ lisp (cdr v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc (p + 4) }
4. (∃m n. Num m = v2 ∧ Num n = v3 ∧ m+n < 230 ) ⇒
{ lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc p ∗ s }
p : E0844005 E2444002
{ lisp (v1 , plus v2 v3 , v3 , v4 , v5 , v6 , l) ∗ pc (p + 4) }
Veriﬁed LISP Implementations on ARM, x86 and PowerPC 373

The compiler next generates two branches to glue the code together; the branch
instructions have the following speciﬁcations:

5. { pc p ∗ sz z ∗ z } p : 0A000004 { pc (p + 24) ∗ sz z }

6. { pc p ∗ sz z ∗ ¬z } p : 0A000004 { pc (p + 4) ∗ sz z }
7. { pc p } p : EAFFFFF8 { pc (p − 24) }

The speciﬁcations above are collapsed into theorems describing one pass through
the code by composing 1,5 and 1,6,2,3,4,7, which results in:

8. { lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc p ∗ s ∗ v1 = Sym "nil" }

p : E3330003 0A000004
{ lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc (p + 28) ∗ s }
9. (∃x y. Dot x y = v1 ) ∧
(∃m n. Num m = v2 ∧ Num n = car v1 ∧ m+n < 230 ) ⇒
{ lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc p ∗ s ∗ v1 = Sym "nil" }
p : E3330003 0A000004 E5935000 E5934004 E0844005 E2444002 EAFFFFF8
{ lisp (cdr v1 , plus v2 (car v1 , l), car v1 , v4 , v5 , v6 ) ∗ pc p ∗ s }

Code extension is applied to theorem 8, and then the rule for introducing a
tail-recursive function is applied. The compiler produces the following total-
correctness speciﬁcation.

10. sumlist pre(v1 , v2 , v3 ) ⇒

{ lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc p ∗ s }
p : E3330003 0A000004 E5935000 E5934004 E0844005 E2444002 EAFFFFF8
{ let (v1 , v2 , v3 ) = sumlist(v1 , v2 , v3 ) in
lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc (p + 28) ∗ s }

Here sumlist is deﬁned as an instance of tailrec, and sumlist pre is an instance of

pre. The compiler exports sumlist pre as the following recursive function which
collects all of the side conditions that must hold for proper execution of the code:

sumlist pre(v1 , v2 , v3 ) =
if v1 = Sym "nil" then true else
let cond = (∃x y. Dot x y = v1 ) in
let v3 = car v1 in
let cond = cond ∧ (∃x y. Dot x y = v1 ) in
let v1 = cdr v1 in
let cond = cond ∧ (∃m n. Num m = v2 ∧ Num n = v3 ∧ m+n < 230 ) in
let v2 = plus v2 v3 in
sumlist pre(v1 , v2 , v3 ) ∧ cond
374 M.O. Myreen and M.J.C. Gordon

When the loop rule is applied above, its parameters are assigned values:
p = λ(v1 , v2 , v3 ). lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc p ∗ s
q = λ(v1 , v2 , v3 ). lisp (v1 , v2 , v3 , v4 , v5 , v6 , l) ∗ pc (p + 28) ∗ s
G = λ(v1 , v2 , v3 ). v1 = Sym "nil"
F = λ(v1 , v2 , v3 ). (cdr v1 , plus v2 (car v1 , l), car v1 )
D = λ(v1 , v2 , v3 ). (v1 , v2 , v3 )
P = λ(v1 , v2 , v3 ). (v1 = Sym "nil") ⇒
(∃x y. Dot x y = v1 ) ∧
(∃m n. Num m = v2 ∧ Num n = car v1 ∧ m+n < 230 )

D Deﬁnition of s-Expression Printing and Parsing

Our machine code for printing LISP s-expressions implements sexp2string.
sexp2string x = aux (x, T)
aux (Num n, b) = num2str n
aux (Sym s, b) = s
aux (Dot x y, b) = if isQuote (Dot x y) ∧ b then "’" ++ aux (car y, T) else
let (a, e) = (if b then ("(", ")") else ("", "")) in
if y = Sym "nil" then a ++ aux (x, T) ++ e else
if isDot y then a ++ aux (x, T) ++ " " ++ aux (y, F) ++ e
else a ++ aux (x, T) ++ " . " ++ aux (y, F) ++ e
isDot x = ∃y z. x = Dot y z
isQuote x = ∃y. x = Dot (Sym "quote") (Dot y (Sym "nil"))

Parsing is deﬁned as the follows. Here reverse is normal list reversal.

string2sexp str = car (sexp parse (reverse (sexp lex str)) (Sym "nil") [])

The lexing function sexp lex splits a string into a list of strings, e.g.
sexp lex "(car (’23 . y))" = ["(", "car", "(", "’", "23", ".", "y", ")", ")"]

Token parsing is deﬁned as:

sexp parse [] exp stack
exp =
sexp parse (")" :: ts) exp stack
sexp parse ts (Sym "nil") (exp :: stack)
=
sexp parse ("(" :: ts) exp stack
sexp parse ts (Dot exp (head stack)) (tail stack)
=
sexp parse ("." :: ts) exp stack
sexp parse ts (car exp) stack
=
sexp parse ("’" :: ts) exp stack
sexp parse ts (Dot (Dot (Sym "quote")
=
(Dot (car exp) (Sym "nil"))) (cdr exp)) stack
sexp parse (t :: ts) exp stack = sexp parse ts (Dot (if is num t then
Num (str2num t) else Sym t) exp) stack
Trace-Based Coinductive Operational Semantics
for While
Big-Step and Small-Step, Relational and Functional Styles

Keiko Nakata and Tarmo Uustalu

Institute of Cybernetics at Tallinn University of Technology,

Akadeemia tee 21, EE-12618 Tallinn, Estonia
{keiko,tarmo}@cs.ioc.ee

Abstract. We present four coinductive operational semantics for the

While language accounting for both terminating and non-terminating
program runs: big-step and small-step relational semantics and big-step
and small-step functional semantics. The semantics employ traces (possi-
bly inﬁnite sequences of states) to record the states that program runs go
through. The relational semantics relate statement-state pairs to traces,
whereas the functional semantics return traces for statement-state pairs.
All four semantics are equivalent. We formalize the semantics and their
equivalence proofs in the constructive setting of Coq.

1 Introduction
Now and then we must program a partially recursive function whose domain
of definedness we cannot decide or is undecidable, e.g., an interpreter. Reactive
programs such as operating systems and data base systems are not supposed
to terminate. To reason about such programs properly, we need semantics that
account for both terminating and non-terminating program runs. Compilers, for
example, should preserve both terminating and non-terminating behaviors of
source programs [10,13]. Standard operational semantics ignore (or say too little
about) non-terminating runs, so finer semantic accounts are necessary.
In this paper, we present four coinductive semantics for the While language
that we claim to be both adequate for reasoning about non-terminating runs as
well as well-designed. They represent four different styles of operational seman-
tics: big-step and small-step relational and big-step and small-step functional
semantics. Our semantics are based on traces, defined coinductively as possibly
infinite non-empty sequences of states. What is more, the evaluation and normal-
ization relations and functions are also coinductive/corecursive. The functional
semantics are constructively possible thanks to the fact that in the trace-based
setting, While becomes a total rather than partial language (every run defines
a trace, even if it may be infinite). All four semantics are constructively equiva-
lent. We have formalized our development in the Coq proof assistant, using the
Ssreflect syntax extension, see http://cs.ioc.ee/~keiko/majas.tar.gz.
It might be objected against this paper that the results are unsurprising, since
the semantics appear simple and enjoy all expected properties. They are simple

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 375–390, 2009.

c Springer-Verlag Berlin Heidelberg 2009
376 K. Nakata and T. Uustalu

indeed, but in the case of the two big-step semantics, this is a consequence of
very careful design decisions. As a matter of fact, getting coinductive big-step
semantics right is tricky, and in this situation it is really fortunate that simple
solutions are available. In the paper, we discuss some of the design considerations
and also show some design options that we rejected deliberately. Previous work
in the literature [6,9,14] also contains some designs that are more complicated
than ours or fail to have some clearly desirable properties or both. A skeptical
reader may also worry that While is a toy language. We argue that While is
sufficient for highlighting all important issues. In fact, our designs scale without
pain to procedures and language constructs for effects such as exceptions, non-
determinism and interactive input-output.
Programming and reasoning with coinductive types in type theory require
taking special care about productivity. Here the type checker of Coq help us avoid
mistakes by ruling out improductivity. But some limitations are imposed by the
implementation. For instance, 15 years ago a type-based approach for ensuring
productivity of corecursive definitions was developed [8]. This approach is more
flexible than the syntactic guardedness approach [7] of Coq, but it has not been
implemented. Several coding techniques have been proposed to circumvent the
limitations [2,14]. In our development, we rely on syntactic productivity.
The remainder of the paper is organized as follows. We introduce traces in
Section 2. We present the big-step relational semantics in Section 3, the small-
step relational semantics in Section 4, and the big-step and small-step functional
semantics in Sections 5 and 6, proving the equivalent along the way. We discuss
related work in Section 7 to conclude in Section 8.
The language we consider is the While language, defined inductively by the
following productions:

stmt s ::= skip | s0 ; s1 | x := e | if e then st else sf | while e do st

We assume given a supply of variables and a set of (pure) expressions, whose

elements are ranged over by metavariables x and e respectively. We assume the
set of values to be the integers, non-zero integers counting as truth and zero as
falsity. The metavariable v ranges over values. A state, ranged over by σ, maps
variables to values. The notation σ[x → v] denotes the update of σ with v at x.
We assume given an evaluation function eσ, which evaluates e in the state σ.
We write σ |= e and σ |= e to denote that e is true, resp. false in σ.

2 Traces
We describe the semantics of statements in terms of traces. A trace is a possibly
inﬁnite non-empty sequence of states, the sequence of all states that the run
of the statement passes through, including the given initial state. We enforce
non-emptiness by having the nil constructor to also take a state as an argument.
Formally traces are deﬁned coinductively by the following productions:

trace τ ::= σ | σ :: τ

Trace-Based Coinductive Operational Semantics for While 377

We deﬁne bisimilarity of two traces τ , τ , written τ ≈ τ , by the coinductive

interpretation of the following inference rules1 :
τ ≈ τ
σ ≈ σ σ :: τ ≈ σ :: τ

Bisimilarity is reﬂexive, symmetric and transitive, i.e., an equivalence. The proofs

are straightforward. The reader can find a gentle introduction to coinduction in
Coq in [1, Ch. 13].
We want to think of bisimilar traces as equal, corresponding to quotienting the
set of traces as defined above by bisimilarity. In our type-theoretic implementa-
tion, we do not “compute” the quotient. Instead, we view trace with bisimilarity
as a setoid, i.e., a set with an equivalence relation. Accordingly, we have to
make sure that all functions and predicates we define on trace are in fact setoid
functions and predicates, i.e., insensitive to bisimilarity.

3 Big-Step Relational Semantics

The main contribution of the paper is the big-step relational semantics, presented
∗
in Fig. 1. The semantics is given by two relations ⇒ and ⇒, defined mutually
coinductively. The ⇒ relation relates a statement-state pair to a trace and is
the evaluation relation of our interest: the proposition (s, σ) ⇒ τ expresses that
running s from an initial state σ results in trace τ . It is defined by case distinction
on the statement.
∗
The auxiliary ⇒ relation relates a statement-trace pair to a trace. Roughly
∗
the proposition (s, τ ) ⇒ τ states that running s from the last state of an al-
∗
ready accumulated trace τ results in trace τ . The rules literally define ⇒ as
∗
the coinductive prefix closure of ⇒. A more precise description of (s, τ ) ⇒ τ is
therefore as follows. If τ is finite, then s is run from the last state of τ and τ
is obtained from τ by appending the trace produced by s. If τ is infinite (so it
∗
does not have a last state), then (s, τ ) ⇒ τ is derivable for any τ bisimilar to
τ , in particular for τ . This design has the desirable consequence that, if a run
of the first statement of a sequence diverges, the second statement is not run at
∗
all. Indeed, if (s0 , σ) ⇒ τ and τ is infinite, then we can derive (s1 , τ ) ⇒ τ and
further (s0 ; s1 , σ) ⇒ τ . Similarly, if a run of the body of a while-loop diverges,
we do not get around to retesting the guard and continuing.
∗
A remarkable feature of the definition of ⇒ is that it does not hinge on
deciding whether the trace is finite or not, which is constructively impossible. A
∗
proof of (s, τ ) ⇒ τ is simply a traversal the already accumulated trace τ : if the
last element is hit, the statement is run, otherwise the traversal goes on forever.
Evaluation is a setoid predicate and it is deterministic (up to bisimulation,
which is appropriate since we think of bisimilarity as equality):
Lemma 1. For any σ, s, τ, τ , if (s, σ) ⇒ τ and τ ≈ τ then (s, σ) ⇒ τ .
1
Following X. Leroy [14], we use double horizontal lines in sets of inference rules that
are to be interpreted coinductively and single horizontal lines in inductive definitions.
378 K. Nakata and T. Uustalu

∗
(s0 , σ) ⇒ τ (s1 , τ ) ⇒ τ
(x := e, σ) ⇒ σ :: σ[x → eσ] (skip, σ) ⇒ σ (s0 ; s1 , σ) ⇒ τ
∗ ∗
σ |= e (st , σ :: σ) ⇒ τ σ |= e (sf , σ :: σ) ⇒ τ
(if e then st else sf , σ) ⇒ τ (if e then st else sf , σ) ⇒ τ
∗ ∗
σ |= e (st , σ :: σ) ⇒ τ (while e do st , τ ) ⇒ τ σ |= e

(while e do st , σ) ⇒ τ (while e do st , σ) ⇒ σ :: σ
∗
(s, σ) ⇒ τ (s, τ ) ⇒ τ
∗ ∗
(s, σ) ⇒ τ (s, σ :: τ ) ⇒ σ :: τ

Fig. 1. Big-step relational semantics

Lemma 2. For any σ, s, τ and τ , if (s, σ) ⇒ τ and (s, σ) ⇒ τ then τ ≈ τ .

Some design decisions we have made are that skip does not grow a trace, so
we have (skip, σ) ⇒ σ. But an assignment and testing the guard of an if-
or while-statement contribute a state, i.e., constitute a small step, e.g., we
have (x := 17, σ) ⇒ σ :: σ[x → 17], (while false do skip, σ) ⇒ σ :: σ and
(while true do skip, σ) ⇒ σ :: σ :: σ :: . . .. This is good for several reasons. First,
we have that skip is the identity of sequential composition, i.e., the semantics
does not distinguish s, skip; s and s; skip. Second, we get a notion of small steps
that fully agrees with the textbook-style small-step semantics given in the next
section. The third and most important outcome is that any while-loop always
progresses, because testing of the guard is a small step. Another option would
be to regard testing of the guard to be instantaneous, but take leaving the loop
body, or a backward jump in terms of low-level compiled code, to constitute a
small step. But then we would not agree to the textbook small-step semantics.
It is not mandatory to record full states in a trace as we are doing in this
paper. It would make perfect sense to record just some observable part of the
intermediate states, or to only record that some states were passed through (to
track ticks of the clock). Neither is (strong) bisimilarity the only interesting
notion of equality of traces. Viable alternatives are various weak versions of
bisimilarity (allowing collapsing ﬁnite sequences of ticks).

Discussions on alternative designs. In the rest of this section we reveal some sub-
tleties in designing coinductive big-step semantics, by looking at several seem-
ingly not so different but problematic alternatives that we reject2 .
Since progress of loops is not required for wellformedness of the definitions of
∗
⇒ and ⇒, one might be tempted to regards guard testing to be instantaneous
and modify the rules for the while-loop to take the form
∗
σ |= e (st , σ) ⇒ τ (while e do st , τ ) ⇒ τ σ |= e

(while e do st , σ) ⇒ τ (while e do st , σ) ⇒ σ
2
Our Coq development includes complete definitions of these alternative semantics.
Trace-Based Coinductive Operational Semantics for While 379

This leads to undesirable outcomes. We can derive (while true do skip, σ) ⇒ σ,
which means that the non-terminating while true do skip is considered semanti-
cally equivalent to the terminal (immediately terminating) skip. Worse, we can
also derive (while true do skip; x := 17, σ) ⇒ σ :: σ[x → 17], which is even
more inadequate: a sequence can continue to run after the non-termination of
the ﬁrst statement. Yet worse, inspecting the rules closer we discover we are
also able to derive (while true do skip, σ) ⇒ τ for any τ ! Mathematically, giving
up insisting on progress in terms of growing the trace has also the consequence
that the relational semantics cannot be turned into a functional one, although
While should intuitively be total and deterministic. In a functional semantics,
evaluation must be a trace-valued function and in a constructive setting such a
function must be productive.
Another option, where assignments and test of guards are properly taken to
∗
constitute steps, could be to deﬁne ⇒ by case distinction on the statement by
rules such as
∗ ∗
τ |=∗ e (st , duplast τ ) ⇒ τ (while e do st , τ ) ⇒ τ τ |=∗ e
∗ ∗
(while e do st , τ ) ⇒ τ (while e do st , τ ) ⇒ duplast τ

Here, duplast τ , defined corecursively, traverses τ and duplicates its last state,
if it is finite. Similarly, τ |=∗ e and τ |=∗ e traverse τ and evaluate e in the last
state, if it is finite:
τ |=∗ e σ |= e τ |=∗ e σ |= e
∗ ∗ ∗
σ :: τ |= e σ |= e σ :: τ |= e σ |=∗ e

(The rules for skip and sequence are very simple and appealing in this design.)
The relation ⇒ would then be deﬁned uniformly by the rule
∗
(s, σ) ⇒ τ
(s, σ) ⇒ τ

It turns out that we can still derive (while true do skip, σ) ⇒ τ for any τ . We can
even derive (while true do x := x + 1, σ) ⇒ τ for any τ !
The third alternative (Leroy and Grall use this technique in [14]) is most
∗
close to ours. It introduces, instead of our ⇒ relation, an auxiliary relation split ,
deﬁned coinductively by
split τ τ0 σ τ1
split σ σ σ σ split (σ :: τ ) σ σ (σ :: τ ) split (σ :: τ ) (σ :: τ0 ) σ τ1

so that split τ τ0 σ τ1 expresses that the trace τ can be split into a concate-
nation of traces τ0 and τ1 glued together at a mid-state σ . Then the evaluation
∗
relation is deﬁned by replacing the uses of ⇒ with split , e.g., the rule for the
sequence statement would be:
split τ τ0 σ τ1 (s0 , σ) ⇒ τ0 (s1 , σ ) ⇒ τ1
(s0 ; s1 , σ) ⇒ τ
380 K. Nakata and T. Uustalu

This third alternative does not cause any outright anomalies for While. But
alarmingly s1 has to be run from some (underdetermined) state within a run
of s0 ; s1 even if the run of s0 does not terminate. In a richer language with
abnormal terminations, we get a serious problem: no evaluation is derived for
(while true do skip); abort although the abort statement should not be reached.

4 Small-Step Relational Semantics

Devising an adequate small-step relational semantics is an easy problem com-

pared to the one of the previous section. We can adapt the textbook inductive
small-step semantics, which only accounts for terminating runs. Our semantics,
given in Fig. 2, is based on a terminality predicate and one-step reduction rela-
tion. The proposition s states that s is terminal (terminates in no steps), which
is possible for a sequence of skips. The proposition (s, σ) → (s , σ ) states that in
state σ the statement s one-step reduces to s with the next state being σ . These
are exactly the same as one would use for an inductive semantics. The normaliza-
tion relation is the terminal many-step reduction relation, deﬁned coinductively
to allow for the possibility of inﬁnitely many steps. The proposition (s, σ) τ
expresses that running s from σ results in the trace τ .
Normalization is a setoid predicate and it is deterministic:

Lemma 3. For any s, σ, τ, τ , if (s, σ) τ and τ ≈ τ then (s, σ) τ .

Lemma 4. For any s, σ, τ, τ , if (s, σ) τ and (s, σ) τ then τ ≈ τ .

Equivalence to big-step relational semantics. Of course we expect the big-

step and small-step semantics to be equivalent. We will show this in two ways:
the ﬁrst approach, presented in this section, directly proves the equivalence; the
second approach, given in Section 6, proves the equivalence by going through

s0 s1
skip s0 ; s1

s0 (s1 , σ) → (s1 , σ ) (s0 , σ) → (s0 , σ )

(x := e, σ) → (skip, σ[x → eσ]) (s0 ; s1 , σ) → (s1 , σ ) (s0 ; s1 , σ) → (s0 ; s1 , σ )
σ |= e σ |= e
(if e then st else sf , σ) → (st , σ) (if e then st else sf , σ) → (sf , σ)
σ |= e σ |= e
(while e do st , σ) → (st ; while e do st , σ) (while e do st , σ) → (skip, σ)

s (s, σ) → (s , σ ) (s , σ ) τ

(s, σ) σ (s, σ) σ :: τ

Fig. 2. Small-step relational semantics

Trace-Based Coinductive Operational Semantics for While 381

functional semantics. The first approach is stronger in that it does not rely on the
determinism of the semantics, thus prepares a better avenue for generalization to
a language with non-determinism. (Our functional semantics deals with single-
valued functions and thus the second approach relies on the determinism.)
The following lemma connects the big-step semantics with the terminality
predicate and one-step reduction relation and is proved by induction.
Lemma 5. For any s, σ and τ , if (s, σ) ⇒ τ then either s and τ = σ, or else
there are s , σ , τ such that (s, σ) → (s , σ ) and τ ≈ σ :: τ and (s , σ ) ⇒ τ .
Then correctness of the big-step semantics relative to the small-step semantics
follows by coinduction:
Proposition 1. For any s, σ and τ , if (s, σ) ⇒ τ then (s, σ) τ .
The opposite direction, that the small-step semantics is correct relative to the
big-step semantics, is more interesting. The proof proceeds by coinduction. At
the crux is the case of the sequence statement: we are given a normalization
(s0 ; s1 , σ) τ and the coinduction hypotheses for s0 (resp. s1 ) that enable us to
deduce (s0 , σ ) ⇒ τ (resp. (s1 , σ ) ⇒ τ ) from (s0 , σ ) τ (resp. (s1 , σ ) τ )
for any σ , τ . Naively, we have what we need to close the case. The assumption
(s0 ; s1 , σ) τ ensures that τ can be split into two parts τ0 and τ1 such that τ0
corresponds to running s0 and τ1 to running s1 . If τ0 is finite, we can traverse
τ0 until we hit its last state, to then invoke the coinduction hypothesis on s1 . If
∗
τ0 is infinite, we can deduce τ ≈ τ0 and (s1 , τ0 ) ⇒ τ0 by coinduction.
The actual proof is more involved. First we have to explicitly construct τ0 and
τ1 . This is possible by examining the proof of (s0 ; s1 , σ) τ . Our proof defines
an auxiliary function midp (s0 s1 : stmt) (σ : state) (τ : trace) (h : (s0 ; s1 , σ)
τ ) : trace by corecursion as follows. We look at the last inference in the proof h
of (s0 ; s1 , σ) τ . If s0 ; s1 is terminal, we return σ. Otherwise we have a proof
h0 of (s0 ; s1 , σ) → (s , σ ) and a proof h of (s , σ ) τ for some σ , τ such
that τ = σ :: τ . We look at the last inference in h0 . If s0 is terminal, we also
return σ. Else it must be the case that (s0 , σ) → (s0 , σ ) for some s0 such that
s = s0 ; s1 and we return σ :: midp s0 s1 σ τ h . The corecursive call is guarded
by consing σ. The following lemma is proved by coinduction.
Lemma 6. For any s0 , s1 , σ, τ , h : (s0 ; s1 , σ) τ , (s0 , σ) midp s0 s1 σ τ h.
Second, we cannot decide whether τ0 is finite as this would amount to deciding
whether running s0 from σ terminates. Our big-step semantics was carefully
crafted to avoid stumbling upon this problem, by introduction of the coinductive
∗
prefix closure ⇒ of ⇒ to uniformly handle the cases of both the finite and infinite
already accumulated trace. We need a small-step counterpart to it:
∗
(s, σ) τ (s, τ ) τ
∗ ∗
(s, σ) τ (s, σ :: τ ) σ :: τ
∗
The proposition (s, τ ) τ states that running s from the last state of an already
accumulated trace τ (if it has one) results in the total trace τ . The following
lemma is proved by coinduction.
382 K. Nakata and T. Uustalu

s (s, σ) → (s , σ ) (s , σ ) ind σ

(s, σ) ind σ (s, σ) ind σ

Fig. 3. Inductive small-step relational semantics

∗
Lemma 7. For any s0 , s1 , σ, τ , h : (s0 ; s1 , σ) τ , (s1 , midp s0 s1 σ τ h) τ .
Only now we can finally prove that the small-step relational semantics is correct
relative to the big-step relational semantics.
Proposition 2. For any s, σ, τ , τ , the following two conditions hold:
– if (s, σ) τ then (s, σ) ⇒ τ ,
∗ ∗
– if (s, τ ) τ then (s, τ ) ⇒ τ .
Proof. Both conditions are proved at once by mutual coinduction. We only show
the first condition in the case of the sequence statement, to demonstrate how
∗
the relation helps us avoid having to decide finiteness. Suppose we have h :
(s0 ; s1 , σ) τ . By Lemmata 6 and 7, we have (s0 , σ) midp s0 s1 σ τ h and
∗
(s1 , midp s0 s1 σ τ h) τ . By invoking the coinduction hypothesis on them, we
∗
obtain (s0 , σ) ⇒ midp s0 s1 σ τ h. and (s1 , midp s0 s1 σ τ h) ⇒ τ , from which
we deduce (s0 ; s1 , σ) ⇒ τ .
Differences between Coq’s Prop and Set force normalization to be Set-valued
rather than Prop-valued, since our definition of the trace-valued midp function
relies on case distinction on the proof of the given normalization proposition.
Case distinction on a proof of a Prop-proposition is not available for constructing
an element of a Set-set. This in turn requires the evaluation relation to also be
Set-valued, to be comparable to normalization. A further complication is that,
for technical reasons, the proofs of Lemmata 6 and 7 must rely on John Major
equality [15] and the principle that two JM-equal elements of the same type are
equal. Given that Coq’s support for programming with (co)inductive families (in
ML-style, as opposed to proving in the tactic language) is also weak (so midp
was easily manufactured in the tactic language, but we failed to construct it in
ML-style), one might wish to prove the equivalence of the big-step and small-
step semantics in some altogether different way. In the subsequent sections we
study functional semantics. These offer us a less direct route that is less painful
in the aspects we have just described.

Adequacy relative to inductive small-step relational semantics. The

textbook inductive small-step relational semantics defines normalization as the
inductive terminal many-step reduction. The definition is given in Fig. 3. This
normalization relation associates a state-statement pair with a state (the termi-
nal state) rather than a trace, although a trace-based version (for an inductive
concept of traces, i.e., finite sequences of states) would be obtained by a straight-
forward modification. For completeness of our development, we prove that the
inductive and coinductive semantics agree on terminating runs.
Trace-Based Coinductive Operational Semantics for While 383

We introduce a last-state predicate on traces inductively by the rules

τ ↓ σ
σ ↓ σ σ :: τ ↓ σ

Proposition 3 states that the inductive semantics is correct relative to the coin-
ductive semantics. Proposition 4 states that the coinductive semantics is correct
relative to the inductive semantics for terminating runs. Both propositions are
proved by induction.
Proposition 3. For any s and σ, if (s, σ) ind σ then there is τ such that
(s, σ) τ and τ ↓ σ .
Proposition 4. For any s, τ, σ, σ , if (s, σ) τ and τ ↓ σ then (s, σ) ind σ .
The connection between our coinductive big-step semantics and the inductive
big-step semantics can now be concluded from the well-known equivalence be-
tween the inductive big-step and small-step semantics. Y. Bertot has formalized
the proof of this equivalence in Coq [3].
We conclude this section by citing an observation by V. Capretta [4]. The
inﬁniteness predicate on traces is deﬁned coinductively by the rule
τ
(σ :: τ )

We can prove in Coq the proposition ∀τ, (¬∃σ, τ ↓ σ) → τ . However the propo-
sition ∀τ, (∃σ, τ ↓ σ) ∨ τ can only be proved from ∀τ, (∃σ, τ ↓ σ) ∨ ¬(∃σ, τ ↓ σ).
Constructively, this instance of the classical law of excluded middle states de-
cidability of ﬁniteness. For this reason, we reject what could be called sum-type
semantics. For instance, a relational semantics could relate a statement-state pair
to either a state for a terminating run or a special token ∞ for a non-terminating
run, i.e., an element from the sum type state +1, where 1 is the one-element type.
Or, it could be given as the disjunction of an inductive trace-based semantics,
describing terminating runs, and a coinductive trace-based semantics, describing
non-terminating runs, an approach studied in [14].

5 Big-Step Functional Semantics

We now proceed to functional versions of our semantics. The standard state-
based approach to While does not allow for a (constructive) functional semantics,
as this would require deciding the halting problem. Working with traces has the
benefit that we do not have to decide: any statement and initial state uniquely
determine some trace and we do not have to know whether this trace is finite or
infinite. The semantics is given in Fig. 4. The evaluation function eval : stmt →
state → trace is defined by recursion on the statement. In the cases for sequence
and while it calls auxiliary functions sequence and loop.
We first look at loop. It is defined together with a further auxiliary function
loopseq by mutual corecursion. loop takes three arguments: k for evaluating the
384 K. Nakata and T. Uustalu

Fixpoint eval (s : stmt ) (σ : state) {struct s} : trace :=

CoFixpoint sequence (k : state → trace) (τ : trace) : trace :=

match τ with
| σ ⇒ k σ
| σ :: τ ⇒ σ :: sequence k τ
end

CoFixpoint loop (k : state → trace) (p : state → bool) (σ : state) : trace :=

if p σ then
match k σ with
| σ ⇒ σ :: loop k p σ
| σ :: τ ⇒ σ :: loopseq k p τ
end
else σ
with loopseq (k : state → trace) (p : state → bool) (τ : trace ) : trace :=
match τ with
| σ ⇒ σ :: loop k p σ
| σ :: τ ⇒ σ :: loopseq k p τ
end

Fig. 4. Big-step functional semantics

loop body from a state; p for testing the boolean guard on a state; and a state
σ, which is the initial state. loopseq takes a trace τ , the initial trace, instead of
a state, as the third argument. The two functions work as follows. loop takes
care of repeating of the loop body, once the guard of a while loop has been
evaluated. It analyzes the result and, if the guard is false, then the run of the
loop terminates. If it is true, then the loop body is evaluated by calling k. loop
then constructs the trace of the loop body by examining the result of k. If the
loop body does not augment the trace, which can only happen, if the loop body
is a sequence of skips, a new round of repeating the loop body is started by a
corecursive call to loop. The corecursive call is guarded by ﬁrst augmenting the
trace, which corresponds to the new evaluation of the boolean guard. If the loop
body augments the trace, the new round is reached by reconstruction of the trace
of the current repetition with loopseq. On the exhaustion of this trace, loopseq
corecursively calls loop, again appropriately guarded. Our choice of augmenting
traces at boolean guards facilitates implementing loop in Coq: we exploit it to
satisfy Coq’s syntactic guardedness condition.
sequence, deﬁned by simple corecursion, is similar to loopseq, but does not
involve repetition. It takes two arguments: k for running a statement (the second
Trace-Based Coinductive Operational Semantics for While 385

statement of a sequence) from a state and τ the already accumulated trace

(resulting from running the first statement of the sequence). After reconstructing
τ , sequence calls k on the last state of τ .
Proposition 5 proves the big-step functional semantics correct relative to the
big-step relational semantics. The proof proceeds by induction on the statement
and performs coinductive reasoning in the cases for sequence and while. More-
over, the way induction and coinduction hypotheses are invoked mimics the way
eval makes recursive calls and sequence and loop make corecursive calls.
Proposition 5. For any s, σ, (s, σ) ⇒ eval s σ.
Proof. By induction on s. We show the interesting cases of sequence and while.
– s = s0 ; s1 : We are given as the induction hypotheses that, for any σ, (s0 , σ) ⇒
eval s0 σ and (s1 , σ) ⇒ eval s1 σ. We must prove (s0 ; s1 , σ) ⇒ eval (s0 ; s1 ) σ.
We do so by proving the following condition by coinduction: for any τ ,
∗
(s1 , τ ) ⇒ sequence (eval s1 ) τ . The proof of the condition proceeds by
case distinction on τ and invokes the induction hypothesis on s1 for the
case where τ is a singleton. Then we close the case by combining (s0 , σ) ⇒
(eval s0 σ), which is obtained from the induction hypothesis on s0 , and
∗
(s1 , (eval s0 σ)) ⇒ sequence (eval s1 ) (eval s0 σ), which is obtained from
the condition proved.
– s = while e do st : We are given as the induction hypothesis that, for any σ,
(st , σ) ⇒ eval st σ. We must prove (while e do st , σ) ⇒ eval (while e do st ) σ.
We do so by proving the following two conditions simultaneously by mutual
coinduction:
• for any σ, (while e do st , σ) ⇒ σ :: loop (eval st ) e σ,
∗
• for any τ , (while e do st , τ ) ⇒ loopseq (eval st ) e τ .
The proof of the first condition invokes the induction hypothesis on st . The
case follows immediately from the first condition.
As an obvious corollary of Proposition 5, the relational semantics is total:
Corollary 1. For any s, σ, there exists τ such that (s, σ) ⇒ τ .
This corollary is valuable on its own, and in fact it is one motivation for defin-
ing the functional big-step semantics. Since the conclusion is not a coinductive
predicate, we cannot prove the corollary directly by coinduction.
Correctness of the big-step functional semantics relative to the big-step rela-
tional semantics is easy. In the light of Lemma 2 (determinism of the big-step
relational semantics), it is an immediate consequence of Proposition 5.
Proposition 6. For any s, σ, τ , if (s, σ) ⇒ τ , then τ ≈ eval s σ.
The fact that the corecursive functions sequence and loop must produce traces in
a guarded way in the functional semantics lends further support to our definition
of the big-step relational semantics. Mere wellformedness of a coinductive defi-
nition of the evaluation relation does not suffice, the rules must be tight enough
to properly define the trace of a run.
386 K. Nakata and T. Uustalu

Fixpoint red (s : stmt) (σ : state ) {struct s} : option (stmt ∗ state ) :=

CoFixpoint norm (s : stmt) (σ : state ) : trace :=

match red s σ with
| None ⇒ σ
| Some (s , σ ) ⇒ σ :: norm s σ
end

Fig. 5. Small-step functional semantics

6 Small-Step Functional Semantics

Our small-step functional semantics, deﬁned in Fig. 5, is quite similar to the

small-step relational semantics, except that it uses a function to perform one-step
reductions. The option-returning one-step reduction function red is a functional
equivalent to the jointly total and deterministic terminality predicate and one-
step reduction relation of Fig. 2. It returns None, if the given statement is
terminal; otherwise, it one-step reduces the given statement from the given state
and returns the resulting statement-state pair. The normalization function norm
calls red repeatedly; it is deﬁned by corecursion (guardedness is achieved by
consing the current state to the corecursive call on the next state).
The small-step functional semantics is correct relative to the small-step
relational semantics:

Proposition 7. For all s, σ, (s, σ) norm s σ.

That the small-step relational semantics is correct relative to the small-step

functional semantics is an immediate consequence of Proposition 7 and Lemma 4
(determinism of small-step relational semantics).

Proposition 8. For all s, σ, τ , if (s, σ) τ , then τ ≈ norm s σ.

To verify the equivalence of the small-step functional semantics to the big-step

functional semantics, we ﬁrst prove two auxiliary lemmas. Lemma 8 relates norm
and sequence. Lemma 9 relates norm and loop together with loopseq. The former
is proved by coinduction, the latter is by mutual coinduction.
Trace-Based Coinductive Operational Semantics for While 387

Lemma 8. For any s0 , s1 , σ, norm (s0 ; s1 ) σ ≈ sequence (norm s1 ) (norm s0 σ).

Lemma 9. For any e, st and σ, the following two conditions hold:

– norm (while e do st ) σ ≈ σ :: loop (norm st ) e σ,

– for any s, norm (s; while e do st ) σ ≈ loopseq (norm st ) e (norm s σ).

Equipped with these lemmata we are in the position to show the big-step and
small-step functional semantics to agree up to bisimilarity.

Proposition 9. For any s and σ, eval s σ ≈ norm s σ.

Proof. By induction on s. We outline the proof for the main cases.

– s = s0 ; s1 : We prove by coinduction the following condition: for any τ and τ

such that τ ≈ τ , sequence (eval s1 ) τ ≈ sequence (norm s1 ) τ . The proof
of the condition uses the induction hypothesis on s1 . Then the condition,
Lemma 8 and the induction hypothesis on s0 together close the case.

– s = while e do st : We prove the following two conditions simultaneously by

mutual coinduction using the induction hypothesis:
• for any σ, loop (eval st ) e σ ≈ loop (norm st ) e σ,
• for any τ, τ such that τ ≈ τ ,
loopseq (eval st ) e τ ≈ loopseq (norm st ) e τ .
Then the ﬁrst condition and Lemma 9 together close the case.

We can now prove that the small-step relational semantics correct relative to
the big-step relational semantics without having to rely on dependent pattern-
matching or JM equality by going through the functional semantics:

Proposition 10. For any s, σ and τ , if (s, σ) τ then (s, σ) ⇒ τ .

Proof. By Prop. 7 and Lemma 4, τ ≈ norm s σ. By Prop. 9 and the transitivity

of the bisimilarity relation, τ ≈ eval s σ. By Prop. 5 and Lemma 1, (s, σ) ⇒ τ .

7 Related Work

X. Leroy and H. Grall [14] study two approaches to big-step relational semantics
for lambda-calculus, accounting for both terminating and non-terminating eval-
uations, fully formalized in Coq. In both approaches, evaluation relates lambda-
terms to normal forms or to reduction sequences.
The ﬁrst approach, inspired by Cousot and Cousot [5], uses two evaluation
relations, an inductive one for terminating evaluations and a coinductive one
for non-terminating evaluations. The proof of equivalence to the small-step se-
mantics requires the use of an instance of the excluded middle, constructively
amounting to deciding halting. In essence, this means adopting a sum-type so-
lution. This has deep implications even for the big-step semantics alone: the
388 K. Nakata and T. Uustalu

determinism of the evaluation relation, for example, can only be shown by going
through the small-step semantics.
Leroy used this approach in his work on a certified C compiler [13]. To the
best of our knowledge, this was the first practical application of mechanized
coinductive semantics. C is one of the most used languages for developing pro-
grams that are not supposed to terminate, such as operating systems. Hence
it is important that a certified compiler for C preserves the semantics of non-
terminating programs. The work on the Compcert compiler is a strong witness
of the importance and practicality of mechanized coinductive semantics.
In our approach to While, we have a single evaluation relation for both ter-
minating and non-terminating runs. The big-step semantics is equivalent to the
small-step semantics constructively. Furthermore, the big-step semantics is con-
structively deterministic and the proof of this is without an indirection through
the small-step semantics.
Leroy and Grall [14] also study a different big-step semantics where both
terminating and non-terminating runs are described by a single coinductively
defined evaluation relation (“coevaluation”) relating lambda-terms to normal
forms or reduction sequences. This semantics does not agree with the small-step
semantics, since it assigns a result even to an infinite reduction sequence and
continues reducing a function even after the argument diverges.
Coinductive big-step relational semantics for While similar in some aspects to
Leroy and Grall’s work on lambda-calculus appear in the works of Glesner [9] and
Nestra [16,17]. Regardless of whether evaluation relates statement-state pairs to
possibly infinite traces, possibly non-wellfounded trees of states (“fractions”) or
transfinite traces, these approaches have it in common that the result of a non-
terminating run can be non-deterministic even for While-programs, which should
be deterministic. For one technical reason or another, it becomes possible in all
these semantics that after an infinite number of small steps a run reaches an under-
determined limit state and continues from there. In the case of Nestra [16,17], this
seems intended: he devised his non-standard “fractional” and transfinite seman-
tics to justify a program slicing transformation that is unsound under the standard
semantics. Elsewhere, the outcome appears accidental and undesired.
In our approach, we the take result of a program run to be given precisely by
what can be finitely observed: we record the state of the program at every finite
time instant. We never run ahead of the clock by jumping over some intermediate
states (in particular, we never run ahead of the clock infinitely much) and we
reject transfinite time. As a result of this design decision, the big-step semantics
agrees precisely with the small-step semantics and does so even constructively.
Coinductive functional semantics similar to ours have appeared in the works of
J. Rutten and V. Capretta. A difference is that instead of trace-based semantics
they looked at delayed state based semantics, i.e., semantics that, for a given
statement-state pair, return a possibly infinitely delayed state. Delayed states,
or Burroni conaturals over states, are like conatural numbers (possibly infinite
natural numbers), except that instead of the number zero their deconstruction
terminates (if it does) with a state.
Trace-Based Coinductive Operational Semantics for While 389

J. Rutten [18] gave a delayed state based coinductive small-step functional

semantics for While in coalgebraic terms. The one-step reduction function is a
coalgebra on statement-state pairs. The final coalgebra is given by the analysis of
a delayed state into a readily available state or a unit delay of a further delayed
state (the predecessor function on Burroni conaturals over states). The small-
step semantics, which sends a statement-state pair to a delayed state, is given by
the unique map from a coalgebra to the final coalgebra. He also discussed weak
bisimilarity of delayed states. This identifies all finite delays. As he worked in
classical set theory, the quotient of the set of delayed states by weak bisimilarity
is isomorphic to the sum of the set of states and a one-element set. However the
coalgebraic approach is not confined to the category of sets and in constructive
settings the theory of weak bisimilarity is richer.
V. Capretta [4] carried out a similar project in a constructive, type-theoretic
setting, focusing on combinators essential for big-step functional semantics (our
sequence and loop). Central to him was the realization that the delay type con-
structor is a monad, more specifically a completely iterative monad (with the
unit, sequence the Kleisli extension operation and loop the iteration operation).
Similarly to Leroy and Grall, he formalized his development in Coq.
Our work is very much inspired by the designs of Rutten and Capretta. Here,
however, we have replaced the delay monad by the trace monad, which is also
completely iterative. Moreover, we have also considered relational semantics.
A general categorical account of small-step trace semantics has been given by
I. Hasuo et al. [12].

8 Conclusion
We have devised four trace-based coinductive semantics for While in different
styles of operational semantics. We were pleased to find that simple semantics
covering both terminating and non-terminating program runs are possible even
in the big-step relational and functional styles. The metatheory of our coinduc-
tive semantics is remarkably analogous to that of the textbook inductive seman-
tics and on finite runs they agree. Remarkably, everything can be arranged so
that in a constructive setting we never have to decide whether a trace is finite
or infinite.

Acknowledgments. K. Nakata thanks X. Leroy for interesting discussions on the

coinductive semantics for the Compcert compiler during her post-doctoral stay
in the Gallium team at INRIA Rocquencourt. The Ssreﬂect library was instru-
mental in producing the formal development. She is also thankful for helpful
advice she received via the Coq and Ssreﬂect mailing-lists. T. Uustalu acknowl-
edges the many inspiring discussions on the delay monad with T. Altenkirch and
V. Capretta. Both authors were supported by the Estonian Science Foundation
grant no. 6940 and the EU FP6 IST integrated project no. 15905 MOBIUS.
390 K. Nakata and T. Uustalu

References
1. Bertot, Y., Castéran, P.: Coq’Art: Interactive Theorem Proving and Program De-
velopment. Springer, Heidelberg (2004)
2. Bertot, Y.: Filters on coinductive streams, an application to Eratosthenes’ sieve. In:
Urzyczyn, P. (ed.) TLCA 2005. LNCS, vol. 3461, pp. 102–115. Springer, Heidelberg
(2005)
3. Bertot, Y.: A survey of programming language semantics styles. Coq development
(2007), http://www-sop.inria.fr/marelle/Yves.Bertot/proofs.html
4. Capretta, V.: General recursion via coinductive types. Logical Methods in Com-
puter Science 1(2), 1–18 (2005)
5. Cousot, P., Cousot, R.: Inductive definitions, semantics and abstract interpreta-
tion. In: Conf. Record of 19th ACM SIGPLAN-SIGACT Symp. on Principles of
Programming Languages, POPL 1992, Albuquerque, NM, pp. 83–94. ACM Press,
New York (1992)
6. Cousot, P., Cousot, R.: Bi-inductive structural semantics. Inform. and Com-
put. 207(2), 258–283 (2009)
7. Giménez, E.: Codifying guarded definitions with recursive schemes. In: Smith, J.,
Dybjer, P., Nordström, B. (eds.) TYPES 1994. LNCS, vol. 996, pp. 39–59. Springer,
Heidelberg (1995)
8. Giménez, E.: Structural recursive definitions in type theory. In: Larsen, K.G.,
Skyum, S., Winskel, G. (eds.) ICALP 1998. LNCS, vol. 1443, pp. 397–408. Springer,
Heidelberg (1998)
9. Glesner, S.: A proof calculus for natural semantics based on greatest fixed point
semantics. In: Knoop, J., Necula, G.C., Zimmermann, W. (eds.) Proc. of 3rd
Int. Wksh. on Compiler Optimization Meets Compiler Verification, COCV 2004,
Barcelona. Electron. Notes in Theor. Comput. Sci., vol. 132(1), pp. 73–93. Elsevier,
Amsterdam (2005)
10. Glesner, S., Leitner, J., Blech, J.O.: Coinductive verification of program optimiza-
tions using similarity relations. In: Knoop, J., Necula, G.C., Zimmermann, W.
(eds.) Proc. of 5th Int. Wksh. on Compiler Optimization Meets Compiler Verifi-
cation, COCV 2006, Vienna. Electron. Notes in Theor. Comput. Sci., vol. 176(3),
pp. 61–77. Elsevier, Amsterdam (2007)
11. Gonthier, G., Mahboubi, A.: A small scale reflection extension for the Coq system.
Technical Report RR-6455, INRIA (2008)
12. Hasuo, I., Jacobs, B., Sokolova, A.: Generic trace semantics via coinduction. Logical
Methods in Comput. Sci. 3(4), article 11(2007)
13. Leroy, X.: The Compcert verified compiler. Commented Coq development (2008),
http://compcert.inria.fr/doc/
14. Leroy, X., Grall, H.: Coinductive big-step operational semantics. Inform. and Com-
put. 207(2), 285–305 (2009)
15. McBride, C.: Elimination with a motive. In: Callaghan, P., Luo, Z., McKinna, J.,
Pollack, R. (eds.) TYPES 2000. LNCS, vol. 2277, pp. 197–216. Springer, Heidelberg
(2002)
16. Nestra, H.: Fractional semantic. In: Johnson, M., Vene, V. (eds.) AMAST 2006.
LNCS, vol. 4019, pp. 278–292. Springer, Heidelberg (2006)
17. Nestra, H.: Transfinite semantics in the form of greatest fixpoint. J. of Logic and
Algebr. Program (to appear)
18. Rutten, J.: A note on coinduction and weak bisimilarity for While programs. Theor.
Inform. and Appl. 33(4–5), 393–400 (1999)
A Better x86 Memory Model: x86-TSO

Scott Owens, Susmit Sarkar, and Peter Sewell

University of Cambridge
http://www.cl.cam.ac.uk/users/pes20/weakmemory

Abstract. Real multiprocessors do not provide the sequentially consis-

tent memory that is assumed by most work on semantics and verifica-
tion. Instead, they have relaxed memory models, typically described in
ambiguous prose, which lead to widespread confusion. These are prime
targets for mechanized formalization. In previous work we produced a rig-
orous x86-CC model, formalizing the Intel and AMD architecture spec-
ifications of the time, but those turned out to be unsound with respect
to actual hardware, as well as arguably too weak to program above.
We discuss these issues and present a new x86-TSO model that suffers
from neither problem, formalized in HOL4. We believe it is sound with
respect to real processors, reflects better the vendor’s intentions, and is
also better suited for programming. We give two equivalent definitions of
x86-TSO: an intuitive operational model based on local write buffers, and
an axiomatic total store ordering model, similar to that of the SPARCv8.
Both are adapted to handle x86-specific features. We have implemented
the axiomatic model in our memevents tool, which calculates the set of all
valid executions of test programs, and, for greater confidence, verify the
witnesses of such executions directly, with code extracted from a third,
more algorithmic, equivalent version of the definition.

1 Introduction
Most previous research on the semantics and veriﬁcation of concurrent programs
assumes sequential consistency: that accesses by multiple threads to a shared
memory occur in a global-time linear order. Real multiprocessors, however, in-
corporate many performance optimisations. These are typically unobservable by
single-threaded programs, but some have observable consequences for the be-
haviour of concurrent code. For example, on standard Intel or AMD x86 proces-
sors, given two memory locations x and y (initially holding 0), if two processors
proc:0 and proc:1 respectively write 1 to x and y and then read from y and x,
as in the program below, it is possible for both to read 0 in the same execution.

iwp2.3.a/amd4 proc:0 proc:1

poi:0 MOV [x]←$1 MOV [y]←$1
poi:1 MOV EAX←[y] MOV EBX←[x]
Allow: 0:EAX=0 ∧ 1:EBX=0

One can view this as a visible consequence of write buffering: each processor
effectively has a FIFO buffer of pending memory writes (to avoid the need to

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 391–407, 2009.

c Springer-Verlag Berlin Heidelberg 2009
392 S. Owens, S. Sarkar, and P. Sewell

block while a write completes), so the reads from y and x can occur before the
writes have propagated from the buffers to main memory. Such optimisations
destroy the illusion of sequential consistency, making it impossible (at this level
of abstraction) to reason in terms of an intuitive notion of global time.
To describe what programmers can rely on, processor vendors document ar-
chitectures. These are loose specifications, claimed to cover a range of past and
future actual processors, which should reveal enough for effective programming,
but without unduly constraining future processor designs. In practice, however,
they are informal prose documents, e.g. the Intel 64 and IA-32 Architectures
SDM [2] and AMD64 Architecture Programmer’s Manual [1]. Informal prose is
a poor medium for loose specification of subtle properties, and, as we shall see
in §2, such documents are often ambiguous, are sometimes incomplete (too weak
to program above), and are sometimes unsound (with respect to the actual pro-
cessors). Moreover, one cannot test programs above such a vague specification
(one can only run programs on particular actual processors), and one cannot use
them as criteria for testing processor implementations.
Architecture specifications are, therefore, prime targets for rigorous mech-
anised formalisation. In previous work [19] we introduced a rigorous x86-CC
model, formalised in HOL4 [11], based on the informal prose causal-consistency
descriptions of the then-current Intel and AMD documentation. Unfortunately
those, and hence also x86-CC, turned out to be unsound, forbidding some be-
haviour which actual processors exhibit.
In this paper we describe a new model, x86-TSO, also formalised in HOL4. To
the best of our knowledge, x86-TSO is sound, is strong enough to program above,
and is broadly in line with the vendors’ intentions. We present two equivalent def-
initions of the model: an abstract machine, in §3.1, and an axiomatic version, in
§3.2. We compensate for the main disadvantage of formalisation, that it can make
specifications less widely accessible, by extensively annotating the mathematical
definitions. To explore the consequences of the model, we have a hand-coded
implementation in our memevents tool, which can explore all possible executions
of litmus-test examples such as that above, and for greater confidence we have a
verified execution checker extracted from the HOL4 axiomatic definition, in §4.
We discuss related work in §5 and conclude in §6.

2 Many Memory Models

We begin by reviewing the informal-prose speciﬁcations of recent Intel and AMD
documentation. There have been several versions, some diﬀering radically; we
contrast them with each other, and with what we know of the behaviour of
actual processors.

2.1 Pre-IWP (Before Aug. 2007)

Early revisions of the Intel SDM (e.g. rev-22, Nov. 2006) gave an informal-prose
model called ‘processor ordering’, unsupported by any examples. It is hard to
give a precise interpretation of this description.
A Better x86 Memory Model: x86-TSO 393

2.2 IWP/AMD64-3.14/x86-CC
In August 2007, an Intel White Paper [12] (IWP) gave a somewhat more pre-
cise model, with 8 informal-prose principles supported by 10 examples (known
as litmus tests). This was incorporated, essentially unchanged, into later revi-
sions of the Intel SDM (including rev.26–28), and AMD gave similar, though not
identical, prose and tests [1]. These are essentially causal-consistency models [4].
They allow independent readers to see independent writes (by different proces-
sors to different addresses) in different orders, as below (IRIW, see also [6]),
but require that, in some sense, causality is respected: “P5. In a multiprocessor
system, memory ordering obeys causality (memory ordering respects transitive
visibility)”.

amd6 proc:0 proc:1 proc:2 proc:3

poi:0 MOV [x]←$1 MOV [y]←$1 MOV EAX←[x] MOV ECX←[y]
poi:1 MOV EBX←[y] MOV EDX←[x]
Final: 2:EAX=1 ∧ 2:EBX=0 ∧ 3:ECX=1 ∧ 3:EDX=0
cc : Allow; tso : Forbid

These informal specifications were the basis for our x86-CC model, for which
a key issue was giving a reasonable interpretation to this “causality”. Apart
from that, the informal specifications were reasonably unambiguous — but they
turned out to have two serious flaws.
First, they are arguably rather weak for programmers. In particular, they
admit the IRIW behaviour above but, under reasonable assumptions on the
strongest x86 memory barrier, MFENCE, adding MFENCEs would not suffice
to recover sequential consistency [19, §2.12]. Here the specifications seem to be
much looser than the behaviour of implemented processors: to the best of our
knowledge, and following some testing, IRIW is not observable in practice. It
appears that some JVM implementations depend on this fact, and would not be
correct if one assumed only the IWP/AMD64-3.14/x86-CC architecture [9].
Second, more seriously, they are unsound with respect to current processors.
The following n6 example, due to Paul Loewenstein [14], shows a behaviour that
is observable (e.g. on an Intel Core 2 duo), but that is disallowed by x86-CC,
and by any interpretation we can make of IWP and AMD64-3.14.

n6 proc:0 proc:1
poi:0 MOV [x]←$1 MOV [y]←$2
poi:1 MOV EAX←[x] MOV [x]←$2
poi:2 MOV EBX←[y]
Final: 0:EAX=1 ∧ 0:EBX=0 ∧ [x]=1
cc : Forbid; tso : Allow

To see why this may be allowed by multiprocessors with FIFO write buffers,
suppose that first the proc:1 write of [y]=2 is buffered, then proc:0 buffers its
write of [x]=1, reads [x]=1 from its own write buffer, and reads [y]=0 from main
memory, then proc:1 buffers its [x]=2 write and flushes its buffered [y]=2 and
[x]=2 writes to memory, then finally proc:0 flushes its [x]=1 write to memory.
394 S. Owens, S. Sarkar, and P. Sewell

2.3 Intel SDM rev-29 (Nov. 2008)

The most recent change in the x86 vendor specifications, was in revision 29 of the
Intel SDM (revision 30 is essentially identical, and we are told that there will be
a future revision of the AMD specification on similar lines). This is in a similar
informal-prose style to previous versions, again supported by litmus tests, but is
significantly different to IWP/AMD64-3.14/x86-CC. First, the IRIW final state
above is forbidden [Example 7-7, rev-29], and the previous coherence condition:
“P6. In a multiprocessor system, stores to the same location have a total order”
has been replaced by: “P9. Any two stores are seen in a consistent order by
processors other than those performing the stores”.
Second, the memory barrier instructions are now included, with “P11. Reads
cannot pass LFENCE and MFENCE instructions” and “P12. Writes cannot pass
SFENCE and MFENCE instructions”.
Third, same-processor writes are now explicitly ordered (we regarded this
as implicit in the IWP “P2. Stores are not reordered with other stores”): “P10.
Writes by a single processor are observed in the same order by all processors”.
This specification appears to deal with the unsoundness, admitting the n6 be-
haviour above, but, unfortunately, it is still problematic. The first issue is, again,
how to interpret “causality” as used in P5. The second issue is one of weakness:
the new P9 says nothing about observations of two stores by those two processors
themselves (or by one of those processors and one other). Programming above
a model that lacks any such guarantee would be problematic. The following
n5 and n4 examples illustrate the potential difficulties. These final states were
not allowed in x86-CC, and we would be surprised if they were allowed by any
reasonable implementation (they are not allowed in a pure write-buffer imple-
mentation). We have not observed them on actual processors; however, rev-29
appears to allow them.

n5 proc:0 proc:1 n4 proc:0 proc:1

poi:0 MOV [x]←$1 MOV [x]←$2 poi:0 MOV EAX←[x] MOV ECX←[x]
poi:1 MOV EAX←[x] MOV EBX←[x] poi:1 MOV [x]←$1 MOV [x]←$2
Forbid: 0:EAX=2 ∧ 1:EBX=1 poi:2 MOV EBX←[x] MOV EDX←[x]
Forbid: 0:EAX=2 ∧ 0:EBX=1∧
1:ECX=1 ∧ 1:EDX=2

Summarising the key litmus-test diﬀerences, we have:

IWP/AMD64-3.14/x86-CC rev-29 actual processors
IRIW allowed forbidden not observed
n6 forbidden allowed observed
n4/n5 forbidden allowed not observed
There are also many non-diﬀerences: tests for which the behaviours coincide in all
three cases. The test details are omitted here, but can be found in the extended
version [16] or in [19]. They include the 9 other IWP tests, illustrating that the
various load and store reorderings other than those shown in iwp2.3.a/amd4 (§1)
are not possible; the AMD MFENCE tests amd5 and amd10; and several others.
A Better x86 Memory Model: x86-TSO 395

3 The x86-TSO Model

Given these problems with the informal specifications, we cannot produce a use-
ful rigorous model by formalising the “principles” they contain (as we attempted
with x86-CC [19]). Instead, we have to build a reasonable model that is consis-
tent with the given litmus tests, with observed processor behaviour, and with
what we know of the needs of programmers and of the vendors intentions.
The fact that write buffering is observable (iwp2.3.a/amd4 and n6) but IRIW
is not, together with the other tests that prohibit many other reorderings, strongly
suggests that, apart from write buffering, all processors share the same view of
memory (in contrast to x86-CC, where each processor had a separate view or-
der). This is broadly similar to the SPARC Total Store Ordering (TSO) memory
model [20,21], which is essentially an axiomatic description of the behaviour of
write-buffer multiprocessors. Moreover, while the term “TSO” is not used, infor-
mal discussions suggest this matches the intention behind the rev.29 informal
specification. Accordingly, we present here a rigorous x86-TSO model, with two
equivalent definitions.
The first definition, in §3.1, is an abstract machine with explicit write buffers.
The second definition, in §3.2, is an axiomatic model that defines valid executions
in terms of memory orders and reads-from maps. In both, we deal with x86
CISC instructions with multiple memory accesses, with x86 LOCK’d instructions
(CMPXCHG, LOCK;INC, etc.), with potentially non-terminating computations,
and with dependencies through registers. Together with our earlier instruction
semantics, x86-TSO thus defines a complete semantics of programs. The abstract
machine conveys the programmer-level operational intuition behind x86-TSO,
whereas the axiomatic model supports constraint-based reasoning about example
programs, e.g., by our memevents tool in §4.
The intended scope of x86-TSO, as for the x86-CC model, covers typical
user code and most kernel code: programs using coherent write-back memory,
without exceptions, misaligned or mixed-size accesses, ‘non-temporal’ operations
(e.g. MOVNTI), self-modifying code, or page-table changes.
Basic Types: Actions, Events, and Event Structures. As in our earlier
work, the action of (any particular execution of) a program is abstracted into a
set of events (with additional data) called an event structure. An event represents
a read or write of a particular value to a memory address, or to a register, or
the execution of a fence. Our earlier work includes a definition of the set of
event structures generated by an assembly language program. For any such event
structure, the memory model (there x86-CC, here x86-TSO) defines what a valid
execution is.
In more detail, each machine-code instruction may have multiple events asso-
ciated with it: events are indexed by an instruction ID iiid that identifies which
processor the event occurred on and the position in the instruction stream of the
instruction it comes from (the program order index, or poi). Events also have an
event ID eiid to identify them within an instruction (to permit multiple, other-
wise identical, events). An event structure indicates when one of an instruction’s
396 S. Owens, S. Sarkar, and P. Sewell

events has a dependency on another event of the same instruction with an intra
causality relation, a partial order over the events of each instruction. An event
structure also records which events occur together in a locked instruction with
atomicity data, a set of (disjoint, non-empty) sets of events which must occur
atomically together.
Expressing this in HOL, we index processors by a type proc = num, take types
address and value to both be the 32-bit words, and take a location to be either
a memory address or a register of a particular processor:
location = Location reg of proc reg
| Location mem of address
The model is parameterised by a type reg of x86 registers, which one should
think of as an enumeration of the names of ordinary registers EAX, EBX, etc.,
the instruction pointer EIP, and the status ﬂags. To identify an instance of an
instruction in an execution, we specify its processor and its program order index.

iiid =[ proc : proc; poi : num]

An action is either a read or write of a value at some location, or a barrier:
dirn = R | W
barrier = Lfence | Sfence | Mfence
action = Access of dirn ( reg location) value | Barrier of barrier

Finally, an event has an instruction instance id, an event id (of type eiid = num,
unique per iiid), and an action:
event =[ eiid : eiid; iiid : iiid; action : action]
An event structure E comprises a set of processors, a set of events, an intra-
instruction causality relation, and a partial equivalence relation (PER) capturing
sets of events which must occur atomically, all subject to some well-formedness
conditions which we omit here.

event structure =[ procs : proc set;

events : ( reg event)set;
intra causality : ( reg event)reln;
atomicity : ( reg event)set set]
Example. We show a very simple event structure below, for the program:

tso1 proc:0 proc:1

poi:0 MOV [x]←$1 MOV [x]←$2
poi:1 MOV EAX←[x]

There are four events — the inner (blue in the on-line version) boxes. The event
ids are pretty-printed alphabetically, as a,b,c,d, etc. We also show the assembly
A Better x86 Memory Model: x86-TSO 397

instruction that gave rise to each event, e.g. MOV [x]←$1, though that is not
formally part of the event structure.
Note that events contain concrete val-
ues: in this particular event structure,
there are two writes of x, with values
a: W [x]=1 d: W [x]=2 1 and 2, a read of [x] with value 2, and
proc:0 poi:0 proc:1 poi:0
a write of proc:0’s EAX register with
MOV [x]←$1 MOV [x]←$2
value 2. Later we show two valid exe-
cutions for this program, one for this
po rf event structure and one for another
(note also that some event structures
b: R [x]=2 may not have any valid executions).
proc:0 poi:1 In the diagram, the instructions of
MOV EAX←[x] each processor are clustered together,
into the outermost (magenta) boxes,
intra causality
with program order (po) edges be-
c: W 0:EAX=2
tween them, and the events of each
proc:0 poi:1
instruction are clustered together into
MOV EAX←[x]
the intermediate (green) boxes, with
intra-causality edges as appropriate —
here, in the MOV EAX←[x], the write
tso1 rfmap 0 (of ess 0)
of EAX is dependent on the read of x.

3.1 The x86-TSO Abstract Machine Memory Model

To understand our x86-TSO machine model, consider an idealised x86 multipro-
cessor system partitioned into two components: its memory and register state (of
all its processors combined), and the rest of the system (the other parts of all the
processor cores). Our abstract machine is a labelled transition system: a set of
l
states, ranged over by s, and a transition relation s −→ s . An abstract machine
state s models the state of the ﬁrst component: the memory and register state of
a multiprocessor system. The machine interacts with the rest of the system by
synchronising on labels l (the interface of the abstract machine), which include
register and memory reads and writes. In Fig. 1, the states s correspond to the
parts of the machine shown inside of the dotted line, and the labels l correspond
to the communications that traverse the dotted line boundary.
One should think of the machine as operating in parallel with the proces-
sor cores (absent their register/memory subsystems), executing their instruction
streams in program order; the latter data is provided by an event structure. This
partitioning does not correspond directly to the microarchitecture of any realis-
tic x86 implementation, in which memory and registers would be implemented
by separate and intricate mechanisms, including various caches. However, it is
useful and suﬃcient for describing the programming model, which is the proper
business of an architecture description. It also supports a precise correspondence
with our axiomatic memory model. In more detail, the labels l are the values of
398 S. Owens, S. Sarkar, and P. Sewell

Computation Computation

Lock/ W [a]=v R [a]=v W [a]=v R [a]=v

Unlock
W r=v R r=v W r=v R r=v

(bypass)

(bypass)
Write Buﬀer

Write Buﬀer
Registers Registers

Lock RAM

Fig. 1. The abstract machine

the HOL type:

label = Tau | Evt of proc ( reg action) | Lock of proc | Unlock of proc
– Tau, for an internal action by the machine;
– Evt p a, where a is an action, as deﬁned above (a memory or register read
or write, with its value, or a barrier), by processor p;
– Lock p, indicating the start of a LOCK’d instruction by processor p; or
– Unlock p, for the end of a LOCK’d instruction by p.

(Note that there is nothing specific to any particular memory model in this
interface.) The states of the x86-TSO machine are records, with fields R, giving
a value for each register on each processor; M , giving a value for each shared
memory location; B , modelling a write buffer for each processor, as a list of
address/value pairs; and L, which is a global lock, either Some p, if p holds the
lock, or None. The HOL type is below.
machine state =[ R : proc → reg → value option; (* per-processor registers *)
M : address → value option; (* main memory *)
B : proc → (address#value)list; (* per-processor write buffers *)
L : proc option(* which processor holds the lock *)]
l
The behaviour of the x86-TSO machine, the transition relation s − → s , is
defined by the rules in Fig. 2. The rules use two auxiliary definitions: processor
p is not blocked in machine state s if either it holds the lock or no processor
does; and there are no pending writes in a buffer b for address a if there are no
(a, v ) pairs in b. Restating the rules informally:
1. p can read v from memory at address a if p is not blocked, has no buffered
writes to a, and the memory does contain v at a;
A Better x86 Memory Model: x86-TSO 399

Read from memory

not blocked s p ∧ (s.M a = Some v ) ∧ no pending (s.B p)a
Evt p (Access R (Location mem a)v )
s −
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→ s

Read from write buﬀer

not blocked s p ∧ (∃b1 b2 .(s.B p = b1 ++[(a, v )] ++b2 ) ∧ no pending b1 a)
Evt p (Access R (Location mem a)v )
s −
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→ s

Read from register

(s.R p r = Some v )
Evt p (Access R (Location reg p r )v )
s −
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→ s

Write to write buﬀer

T
Evt p (Access W (Location mem a)v )
s −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→
s ⊕ [B := s.B ⊕ (p → [(a, v )] ++(s.B p))]

Write from write buﬀer to memory

not blocked s p ∧ (s.B p = b ++[(a, v )])
Tau
s −−−→ s ⊕ [M := s.M ⊕ (a → Some v ); B := s.B ⊕ (p → b)]

Write to register
T
Evt p (Access W (Location reg p r )v )
s −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→
s ⊕ [R := s.R ⊕ (p → ((s.R p) ⊕ (r → Some v )))]

Barrier
(b = Mfence) =⇒ (s.B p = [ ])
Evt p (Barrier b)
s −−−−−−−−−−−−−−−→ s

Lock
(s.L = None) ∧ (s.B p = [ ])
Lock p
s −−−−−−→ s ⊕ [L := Some p]

Unlock
(s.L = Some p) ∧ (s.B p = [ ])
Unlock p
s −−−−−−−−→ s ⊕ [L := None]

Fig. 2. The x86-TSO Machine Behaviour

400 S. Owens, S. Sarkar, and P. Sewell

2. p can read v from its write buffer for address a if p is not blocked and has
v as the newest write to a in its buffer;
3. p can read the stored value v from its register r at any time;
4. p can write v to its write buffer for address a at any time;
5. if p is not blocked, it can silently dequeue the oldest write from its write
buffer to memory;
6. p can write value v to one of its registers r at any time;
7. if p’s write buffer is empty, it can execute an MFENCE (so an MFENCE
cannot proceed until all writes have been dequeued, modelling buffer flush-
ing); LFENCE and SFENCE can occur at any time, making them no-ops;
8. if the lock is not held, and p’s write buffer is empty, it can begin a LOCK’d
instruction; and
9. if p holds the lock, and its write buffer is empty, it can end a LOCK’d
instruction.
1l 2l
Consider execution paths through the machine s0 − → s1 −→ s2 · · · consisting of
finite or infinite sequences of states and labels. We define okMpath to hold for
paths through the machine that start in a valid initial state (with empty write
buffers, etc.) and satisfy the following progress condition: for each memory write
in the path, the corresponding Tau transition appears later on. This ensures
that no write can stay in the buffer forever. (We actually formalize okMpath for
the event-annotated machine described below.)
We emphasise that this is an abstract machine: we are concerned with its
extensional behaviour: the (completed, finite or infinite) traces of labelled tran-
sitions it can perform (which should include the behaviour of real implementa-
tions), not with its internal states and the transition rules. The machine should
provide a good model for programmers, but may bear little resemblance to the
internal structure of implementations. Indeed, a realistic design would certainly
not implement LOCK’d instructions with a global lock, and would have many
other optimisations — the force of the x86-TSO model is that none of those
have programmer-visible effects, except perhaps via performance observations.
There are several variants of the machine with different degrees of locking which
we conjecture are observationally equivalent. For example, one could prohibit all
activity by other processors when one holds the lock, or not require write buffers
to be flushed at the start of a LOCK’d instruction.
We relate the machine to event structures in two steps, which we summarise
here (the HOL details can be found on-line [16]). First, we define a more in-
tensional event-machine: we annotate each memory and register location with
an event option, recording the most recent write event (if any) to that location,
refine write buffers to record lists of events rather than of plain location/value
pairs, and annotate labels with the relevant events. Second, we relate paths of
annotated labels and event structures with a predicate okEpath that holds when
the path is a suitable linearization of the event structure: there is a 1:1 corre-
spondence between non-Tau/Lock/Unlock labels of path and the events of E ,
the order of labels in path is consistent with program order and intra-causality,
and atomic sets are properly bracketed by Lock/Unlock pairs. Thus, okMpath
A Better x86 Memory Model: x86-TSO 401

describes paths that are valid according to the memory model, and okEpath de-
scribes those that are valid according to an event structure (that encapsulates
the other aspects of processor semantics).
Theorem 1. The annotation-erasure of the event-machine is exactly the ma-
chine presented above. [HOL proof]

3.2 The x86-TSO Axiomatic Memory Model

Our x86-TSO axiomatic memory model is based on the SPARCv8 memory model
specification [20,21], but adapted to x86 and in the same terms as our ear-
lier x86-CC model. (Readers unfamiliar with the SPARCv8 memory model can
safely ignore the SPARC-specific comments in this section.) Compared with the
SPARCv8 TSO specification, we omit instruction fetches (IF ), instruction loads
(IL), flushes (F ), and stbars (— S ). The first three deal exclusively with instruction
memory, which we do not model, and the last is useful only under the SPARC
PSO memory model. To adapt it to x86 programs, we add register and fence
events, generalize to support instructions that give rise to many events (par-
tially ordered by an intra-instruction causality relation), and generalize atomic
load/store pairs to locked instructions.
An execution is permitted by our memory model if there exists an execution
witness X for its event structure E that is a valid execution. An execution witness
contains a memory order, an rfmap, and an initial state; the rest of this section
defines when these are valid.
execution witness =
[ memory order : ( reg event)reln;
rfmap : ( reg event)reln;
initial state : ( reg location → value option)]
The memory order is a partial order that records the global ordering of mem-
ory events. It must be a total order on memory writes, and corresponds to the
≤ relation in SPARCv8, as constrained by the SPARCv8 Order condition (in
figures, we use the label mo non-po write write for the otherwise-unforced part
of this order).
partial order (<X .memory order )(mem accesses E )
linear order ( (<X .memory order )|(mem writes E ) )(mem writes E )
The initial state is a partial function from locations to values. Each read
event’s value must come either from the initial state or from a write event:
the rfmap (‘reads-from map’) records which, containing (ew , er ) pairs where
the read er reads from the write ew . The reads from map candidates predicate
below ensures that the rfmap only relates such pairs with the same address and
value. (Strictly speaking, the rfmap is unnecessary; the constraints involving it
can be stated directly in terms of memory order, as SPARCv8 does. However,
we find it intuitive and useful. The SPARCv8 model has no initial states.)
reads from map candidates E rfmap =
∀(ew , er ) ∈ rfmap.(er ∈ reads E ) ∧ (ew ∈ writes E ) ∧
(loc ew = loc er ) ∧ (value of ew = value of er )
402 S. Owens, S. Sarkar, and P. Sewell

We lift program order from instructions to a relation po iico E over events,

taking the union of program order of instructions and intra-instruction causal-
ity. This corresponds roughly to the ; in SPARCv8. However, intra causality
might not relate some pairs of events in an instruction, so our po iico E will not
generally be a total order for the events of a processor.
po strict E =
{(e1 , e2 ) | (e1 .iiid .proc = e2 .iiid .proc) ∧ e1 .iiid .poi < e2 .iiid .poi ∧
e1 ∈ E .events ∧ e2 ∈ E .events}
<(po iico E ) = po strict E ∪ E .intra causality
The check rfmap written below ensures that the rfmap relates a read to the
most recent preceding write. For a register read, this is the most recent write
in program order. For a memory read, this is the most recent write in mem-
ory order among those that precede the read in either memory order or pro-
gram order (intuitively, the first case is a read of a committed write and the
second is a read from the local write buffer). The check rfmap written and
reads from map candidates predicates implement the SPARCv8 Value axiom
above the rfmap witness data. The check rfmap initial predicate extends this to
handle initial state, ensuring that any read not in the rfmap takes its value from
the initial state, and that that read is not preceded by a write in memory order
or program order.
previous writes E er <order =
{ew | ew ∈ writes E ∧ ew <order er ∧ (loc ew = loc er )}
check rfmap written E X =
∀(ew , er ) ∈ (X .rfmap).
if ew ∈ mem accesses E then
ew ∈ maximal elements (previous writes E er (<X .memory order ) ∪
previous writes E er (<(po iico E ) ))
(<X .memory order )
else (* ew IN reg accesses E *)
ew ∈ maximal elements (previous writes E er (<(po iico E ) ))(<(po iico E ))
check rfmap initial E X =
∀er ∈ (reads E \ range X .rfmap).
(∃l .(loc er = Some l ) ∧ (value of er = X .initial state l )) ∧
(previous writes E er (<X .memory order ) ∪
previous writes E er (<(po iico E ) ) = {})
We now further constrain the memory order, to ensure that it respects the
relevant parts of program order, and that the memory accesses of a LOCK’d
instruction do occur atomically.
– Program order is included in memory order, for a memory read before a mem-
ory access (labelled mo po read access in figures) (SPARCv8’s LoadOp):
∀er ∈ (mem reads E ).∀e ∈ (mem accesses E ).
er <(po iico E ) e =⇒ er <X .memory order e
A Better x86 Memory Model: x86-TSO 403

– Program order is included in memory order, for a memory write before a

memory write (mo po write write) (the SPARCv8 StoreStore):
∀ew 1 ew 2 ∈ (mem writes E ).
ew 1 <(po iico E ) ew 2 =⇒ ew 1 <X .memory order ew 2
– Program order is included in memory order, for a memory write before a
memory read, if there is an MFENCE between (mo po mfence). (There is
no need to include fence events themselves in the memory ordering.)
∀ew ∈ (mem writes E ).∀er ∈ (mem reads E ).∀ef ∈ (mfences E ).
(ew <(po iico E ) ef ∧ ef <(po iico E ) er ) =⇒ ew <X .memory order er
– Program order is included in memory order, for any two memory accesses
where at least one is from a LOCK’d instruction (mo po access/lock):
∀e1 e2 ∈ (mem accesses E ).∀es ∈ (E .atomicity ).
((e1 ∈ es ∨ e2 ∈ es) ∧ e1 <(po iico E ) e2 ) =⇒ e1 <X .memory order e2
– The memory accesses of a LOCK’d instruction occur atomically in memory
order (mo atomicity), i.e., there must be no intervening memory events.
Further, all program order relationships between the locked memory accesses
and other memory accesses are included in the memory order (this is a
generalization of the SPARCv8 Atomicity axiom):
∀es ∈ (E .atomicity ).∀e ∈ (mem accesses E \ es).
(∀e ∈ (es ∩ mem accesses E ).e <X .memory order e ) ∨
(∀e ∈ (es ∩ mem accesses E ).e <X .memory order e)
To deal properly with infinite executions, we also require that the prefixes of
the memory order are all finite, ensuring that there are no limit points, and,
to ensure that each write eventually takes effect globally, there must not be an
infinite set of reads unrelated to any particular write, all on the same memory
location (this formalizes the SPARCv8 Termination axiom).
finite prefixes (<X .memory order )(mem accesses E )
∀ew ∈ (mem writes E ).
finite{er | er ∈ E .events ∧ (loc er = loc ew ) ∧
er <X .memory order ew ∧ ew <X .memory order er }

A final state of a valid execution takes the last write in memory order for
each memory location, together with a maximal write in program order for each
register (or the initial state, if there is no such write). This is uniquely defined
assuming that no instruction has multiple unrelated writes to the same register
— a reasonable property for x86 instructions.
The definition of valid execution E X comprising the above conditions is
equivalent to one in which <X .memory order is required to be a linear order, not
just a partial order (again, the full details are on-line):
Theorem 2
1. If linear valid execution E X then valid execution E X .
404 S. Owens, S. Sarkar, and P. Sewell

2. If valid execution E X then there exists an X̂ with a linearisation of X ’s

memory order such that linear valid execution E X̂ . [HOL proof]

Interpreting “not reordered with”. Perhaps surprisingly, the above deﬁni-

tion does not require that program order is included in memory order for a mem-
ory write followed by a read from the same address. The definition does imply that
any such read cannot be speculated before the write (by check rfmap written, as
that takes both <(po iico E ) and <X .memory order into account). However, if one
included a memory order edge, perhaps following a naive interpretation of the rev-
29 “P4. Reads may be reordered with older writes to different locations but not with
older writes to the same location”, then the model would be strictly stronger: the
n7 example below would become forbidden, whereas it is allowed on x86-TSO. We
conjecture that this would correspond to the (rather strange) machine with the
Fig. 2 rules but without the read-from-write-buffer rule, in which any processor
would have to flush its write buffer up to (and including) a local write before it
can read from it.
n7 proc:0 proc:1 proc:2
poi:0 MOV [x]←$1 MOV [y]←$1 MOV ECX←[y]
poi:1 MOV EAX←[x] MOV EDX←[x]
poi:2 MOV EBX←[y]
Allow: 0:EAX=1 ∧ 0:EBX=0 ∧ 2:ECX=1 ∧ 2:EDX=0

Examples. We show two valid executions of the previous example program in

Fig. 3. In both executions, the proc:0 W x=1 event is before the proc:1 W x=2
event in memory order (the bold mo non-po write write edge). In the ﬁrst ex-
ecution, on the left, the proc:0 read of x reads from the most recent write in
memory order (the combination of the bold mo non-po write write edge and
the mo rf edge), which is the proc:1 W x=2. In the second execution, on the
right, the proc:0 read of x reads from the most recent write in program order,
which is the proc:0 W x=1. This example also illustrates some register events:
the MOV EAX←[x] instruction gives rise to a memory read of x, followed by (in
the intra-instruction causality relation) a register write of EAX.

3.3 The Machine and Axiomatic x86-TSO Models Are Equivalent

To prove that the abstract machine admits only valid executions, we deﬁne a
function path to X from event-annotated paths that builds a linear execution
witness by using the events from Tau and memory read labels in order. Thus,
the memory ordering in the execution witness corresponds to the order in which
events were read from and written to memory in the abstract machine.
Theorem 3. For any well-formed event structure E and event-machine path
path, if (okEpath E path) and (okMpath path), then (path to X path) is a
valid execution for E . [HOL proof]
To prove that the abstract machine admits every valid execution, we ﬁrst prove
(in HOL) a lemma showing that any valid execution can be a turned into a
A Better x86 Memory Model: x86-TSO 405

a: W [x]=1 d: W [x]=2 a: W [x]=1

mo non-po write write
proc:0 poi:0 proc:1 poi:0 proc:0 poi:0
MOV [x]←$1 MOV [x]←$2 MOV [x]←$1

po rf mo non-po write write

po rf mo rf

b: R [x]=2 b: R [x]=1 d: W [x]=2

proc:0 poi:1 proc:0 poi:1 proc:1 poi:0
MOV EAX←[x] MOV EAX←[x] MOV [x]←$2

intra causality intra causality

c: W 0:EAX=2 c: W 0:EAX=1
proc:0 poi:1 proc:0 poi:1
MOV EAX←[x] MOV EAX←[x]

tso1 vos 0 (of productive ess 0) showing Require tso1 vos 0 (of productive ess 2) showing Require

Fig. 3. Example valid execution witnesses (for two diﬀerent event structures)

stream-like linear order over labels that satisfies several conditions (label order
in the HOL sources) describing labels in an okMpath. We then have:
Theorem 4. For any well-formed event structure E , and valid execution X
for E , there exists some event-machine path, such that okEpath E path and
okMpath path, in which the memory reads and write-buffer flushes both respect
<X .memory order . [hand proof, relying on the preceding lemma]

4 Veriﬁed Checker and Results

To explore the consequences of x86-TSO, we implemented the axiomatic model in
our memevents tool, which exhaustively explores candidate execution witnesses.
For greater confidence, we added to this a verified witness checker: we defined
variants of event structures and execution witnesses, using lists instead of sets,
wrote algorithmic versions of well formed event structure and valid execution,
proved these equivalent (in the finite case) to our other definitions, extracted
OCaml code from the HOL, and integrated that into memevents. (Obviously,
this only provides assurance for positive tests, those with allowed final states.)
The memevents results coincide with our observations on real processors and
the vendor specifications, for the 10 IWP tests, the (negated) IRIW test, the two
MFENCE tests amd5 and amd10, our n2–n6, and rwc-fenced. The remaining
tests (amd3, n1, n7, n8, and rwc-unfenced) are “allow” tests for which we have
not observed the specified final state in practice.

5 Related Work
There is an extensive literature on relaxed memory models, but most of it does
not address x86, and we are not aware of any previous model that addresses the
concerns of §2. We touch here on some of the most closely related work.
406 S. Owens, S. Sarkar, and P. Sewell

There are several surveys of weak memory models, including those by Adve
and Gharachorloo [3], and by Higham et al. [13]; the latter formalises a range of
models, including a TSO model, in both operational and axiomatic styles, and
proves equivalence results. Their axiomatic TSO model is rather closer to the
operational style than ours is, and both are idealised rather than x86-specific.
Burckhardt and Musuvathi [8, Appendix A] also give operational and axiomatic
definitions of a TSO model and prove equivalence, but only for finite executions.
Their models treat memory reads and writes and barrier events, but lack regis-
ter events and locked instructions with multiple events that happen atomically.
Hangel et al. [10] describe the Sun TSOtool, checking the observed behaviour
of pseudo-randomly generated programs against a TSO model. Roy et al. [17]
describe an efficient algorithm for checking whether an execution lies within an
approximation to a TSO model, used in Intel’s Random Instruction Test (RIT)
generator. Boudol and Petri [7] give an operational model with hierarchical write
buffers (thereby permitting IRIW behaviours), and prove sequential consistency
for data-race-free (DRF) programs. Loewenstein et al. [15] describe a “golden
memory model” for SPARC TSO, somewhat closer to a particular implementa-
tion microarchitecture than the abstract machine we give in §3.1, that they use
for testing implementations. They argue that the additional intensional detail
increases the effectiveness of simulation-based verification. Saraswat et al. [18]
also define memory models in terms of local reordering, and prove a DRF the-
orem, but focus on high-level languages. Several groups have used proof tools
to tame the intricacies of these models, including Yang et al. [22], using Pro-
log and SAT solvers to explore an axiomatic Itanium model, and Aspinall and
Ševčı́k [5], who formalised and identified problems with the Java Memory Model
using Isabelle/HOL.

6 Conclusion
We have described x86-TSO, a memory model for x86 processors that does not
suffer from the ambiguities, weaknesses, or unsoundnesses of earlier models. Its
abstract-machine definition should be intuitive for programmers, and its equiva-
lent axiomatic definition supports the memevents exhaustive search and permits
an easy comparison with related models; the similarity with SPARCv8 suggests
x86-TSO is strong enough to program above. Mechanisation in HOL4 revealed a
number of subtle points of detail, including some of the well-formed event struc-
ture conditions that we depend on (e.g. that instructions have no internal data
races). We hope that this will clarify the semantics of x86 architectures.

Acknowledgements. We thank Luc Maranget for his work on memevents,

and David Christie, Dave Dice, Doug Lea, Paul Loewenstein, Gil Neiger, and
Francesco Zappa Nardelli for helpful remarks. We acknowledge funding from
EPSRC grant EP/F036345.
A Better x86 Memory Model: x86-TSO 407

References
1. AMD64 Architecture Programmer’s Manual (3 vols). Advanced Micro Devices,
rev. 3.14 (September 2007)
2. Intel 64 and IA-32 Architectures Software Developer’s Manual (5 vols). Intel Cor-
poration, rev. 29 (November 2008)
3. Adve, S., Gharachorloo, K.: Shared memory consistency models: A tutorial. IEEE
Computer 29(12), 66–76 (1996)
4. Ahamad, M., Neiger, G., Burns, J., Kohli, P., Hutto, P.: Causal memory: Definitions,
implementation, and programming. Distributed Computing 9(1), 37–49 (1995)
5. Aspinall, D., Ševčı́k, J.: Formalising Java’s data race free guarantee. In: Schnei-
der, K., Brandt, J. (eds.) TPHOLs 2007. LNCS, vol. 4732, pp. 22–37. Springer,
Heidelberg (2007)
6. Boehm, H.-J., Adve, S.: Foundations of the C++ concurrency memory model. In:
Proc. PLDI (2008)
7. Boudol, G., Petri, G.: Relaxed memory models: an operational approach. In: Proc.
POPL, pp. 392–403 (2009)
8. Burckhardt, S., Musuvathi, M.: Effective program verification for relaxed memory
models. Technical Report MSR-TR-2008-12, Microsoft Research (2008); Gupta, A.,
Malik, S. (eds.) CAV 2008. LNCS, vol. 5123, pp. 107–120. Springer, Heidelberg (2008)
9. Dice, D.: Java memory model concerns on Intel and AMD systems (January 2008),
http://blogs.sun.com/dave/entry/java_memory_model_concerns_on
10. Hangal, S., Vahia, D., Manovit, C., Lu, J.-Y.J., Narayanan, S.: TSOtool: A program
for verifying memory systems using the memory consistency model. In: Proc. ISCA,
pp. 114–123 (2004)
11. The HOL 4 system, http://hol.sourceforge.net/
12. Intel. Intel 64 architecture memory ordering white paper. SKU 318147-001 (2007)
13. Higham, L., Kawash, J., Verwaal, N.: Defining and comparing memory consistency
models. PDCS, Full version as TR #98/612/03, U. Calgary (1997)
14. Loewenstein, P.: Personal communication (November 2008)
15. Loewenstein, P.N., Chaudhry, S., Cypher, R., Manovit, C.: Multiprocessor memory
model verification. In: Proc. AFM (Automated Formal Methods), FLoC workshop
(August 2006), http://fm.csl.sri.com/AFM06/
16. Owens, S., Sarkar, S., Sewell, P.: A better x86 memory model: x86-TSO (extended
version). Technical Report UCAM-CL-TR-745, Univ. of Cambridge (2009), Sup-
porting material at, www.cl.cam.ac.uk/users/pes20/weakmemory/
17. Roy, A., Zeisset, S., Fleckenstein, C.J., Huang, J.C.: Fast and generalized polyno-
mial time memory consistency verification. In: Ball, T., Jones, R.B. (eds.) CAV
2006. LNCS, vol. 4144, pp. 503–516. Springer, Heidelberg (2006)
18. Saraswat, V., Jagadeesan, R., Michael, M., von Praun, C.: A theory of memory
models. In: Proc. PPoPP (2007)
19. Sarkar, S., Sewell, P., Zappa Nardelli, F., Owens, S., Ridge, T., Braibant, T.,
Myreen, M., Alglave, J.: The semantics of x86-CC multiprocessor machine code.
In: Proc. POPL 2009 (January 2009)
20. Sindhu, P.S., Frailong, J.-M., Cekleov, M.: Formal specification of memory models.
In: Scalable Shared Memory Multiprocessors, pp. 25–42. Kluwer, Dordrecht (1991)
21. SPARC International, Inc. The SPARC architecture manual, v. 8. Revision
SAV080SI9308 (1992), http://www.sparc.org/standards/V8.pdf
22. Yang, Y., Gopalakrishnan, G., Lindstrom, G., Slind, K.: Nemos: A framework for
axiomatic and executable specifications of memory consistency models. In: IPDPS
(2004)
Formal Verification of Exact Computations
Using Newton’s Method

Nicolas Julien and Ioana Paşca

INRIA Sophia Antipolis

{Nicolas.Julien,Ioana.Pasca}@sophia.inria.fr

Abstract. We are interested in the veriﬁcation of Newton’s method. We

use a formalization of the convergence and stability of the method done
with the axiomatic real numbers of Coq’s Standard Library in order to
validate the computation with Newton’s method done with a library of
exact real arithmetic based on co-inductive streams. The contribution of
this work is twofold. Firstly, based on Newton’s method, we design and
prove correct an algorithm on streams for computing the root of a real
function in a lazy manner. Secondly, we prove that rounding at each step
in Newton’s method still yields a convergent process with an accurate
correlation between the precision of the input and that of the result. An
algorithm including rounding turns out to be much more eﬃcient.

1 Introduction
The Standard Library of the Coq proof assistant [4,1] contains a formalization
of real numbers based on a set of axioms. This gives the real numbers all the
desired theoretical properties and makes theorem proving more agreeable and
close to “pencil and paper” proofs [16]. However, this formalization has no (or
little) computational meaning. During this paper we shall refer to the reals from
this implementation as “axiomatic reals”. We note that Coq is not a special case
and proof assistants in general provide libraries with results from real analysis
[5,7,8,10], but with formalizations for real numbers that are not well suited for
computations. However, in a proof process, it is often the case that we are in-
terested in computing with the real numbers (or at least approximating such
computations), so a considerable eﬀort has been invested in having libraries of
exact computations for proof systems [13,15,18]. We shall refer to numbers from
such implementations as “exact reals”. These libraries provide veriﬁed computa-
tions for a set of operations and elementary functions on real numbers.
The results in this paper are concerned with Newton’s method. Under certain
conditions, this method ensures the convergence at a certain speed towards a
root of the given function, the unicity of this root in a certain domain and the
local stability. But, as the “paper” proof for these results depends on non-trivial
theorems from analysis like the mean value theorem and concepts like continuity,
derivation etc. the formal development conducted around them is based on the
axiomatic reals of Coq. We would like to transfer these “theoretical” properties
to the computations done with exact reals. Our work is thus conducted in two

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 408–423, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Formal Veriﬁcation of Exact Computations Using Newton’s Method 409

directions. On one side we are interested in proving correct Newton’s method on

exact reals and having algorithms that are suited for our implementation of the
real numbers as co-inductive streams [13]. On the other hand we are concerned
in providing appropriate theoretical results to support the correctness of the
algorithms and optimizations we make.
The paper is organized as follows: in section 2 we present the theoretical
results around Newton’s method that have been verified with the axiomatic
reals in Coq. This section gives the formalization of well-known results in [6]
and presents a new proof that was motivated by our implementation of the
method on exact reals. To clarify the need for this proof, in section 3 we present
a library of exact real arithmetic implemented with Coq’s co-inductive streams
and we discuss how computations with Newton’s method can be verified in this
setting. We also design and prove correct an algorithm for computing the root
of a function that is based on Newton’s method and is adapted for streams.
However, this algorithm is much more efficient when rounding is used during
the process. The theorem we present in section 2.1 justifies this optimization,
though the optimized algorithm is not completely certified. The applications of
our algorithm are given in section 4.4 along with perspectives opened by the
suggested improvements. We finish by discussing related work in section 5 as
well as conclusions and possible extensions of our work in section 6.

2 Kantorovitch’s Theorem and Related Results

Kantorovitch’s theorem gives suﬃcient conditions for the convergence of New-
ton’s method towards the root of a given function and establishes the unicity
of this root in a certain domain. A version of this theorem as well as results
concerning the speed for the convergence of the process and its stability are dis-
cussed in [6]. Preliminary results around a formalization of these theorems inside
the Coq proof assistant are described in [19]. At present all the theorems listed
in this section are veriﬁed in the Coq proof assistant. The formal proof is based
on the axiomatic real numbers from Coq’s Standard Library. This choice is mo-
tivated by the concepts we needed to handle, as the library contains results from
real analysis concerning convergence, continuity, derivability etc. The theorems
listed bellow illustrate the type of concepts involved in the proof.

Theorem 1 (Existence). Consider an equation f (x) = 0, where f : [a, b] → R

, a, b ∈ R f (x) ∈ C (1) ([a, b]). Let x(0) be a point contained in [a, b] with its closed
ε-neighborhood Uε (x(0) ) = {|x − x(0) | ≤ ε} ⊂ [a, b]. If the following conditions
hold:
1. f (x(0) ) = 0 and | f (x1(0) ) | ≤ A0 ;
(0)
2. | ff(x )
(x(0) )
| ≤ B0 ≤ 2ε ;
3. ∀x, y ∈]a, b[, |f (x) − f (y)| ≤ C|x − y|
4. the constants A0 , B0 , C satisfy the inequality μ0 = 2A0 B0 C ≤ 1.
then, for an initial approximation x(0) , the Newton process
410 N. Julien and I. Paşca

f (x(n) )
x(n+1) = x(n) − , n = 0, 1, 2, . . . (1)
f (x(n) )
converges and lim x(n) = x∗ is a solution of the initial system, so that |x∗ −
n→∞
x(0) | ≤ 2B0 ≤ ε.

Theorem 2 (Uniqueness). Under the conditions of Theorem 1 the root x∗ of

the function f is unique in the interval [x(0) − 2B0 , x(0) + 2B0 ].

Theorem 3 (Speed of convergence). Under the conditions of Theorem 1

the speed of the convergence of Newton’s method is given by |x(n) − x∗ | ≤
1 2n −1
2n−1 μ0 B0 .

Theorem 4 (Local stability). If the conditions of Theorem 1 are satisﬁed and

if, additionally, 0 < μ0 < 1 and [x(0) − μ20 B0 , x(0) + μ20 B0 ] ⊂ [a, b], then for any
initial approximation x(0) that satisﬁes |x(0) − x(0) | ≤ 1−μ 0
2μ0 B0 the associated
∗
Newton’s process converges to the root x .

The convergence of the process ensures that Newton’s method is indeed appro-
priate for determining the root of the function. The unicity of the solution in
a certain domain is used in practice for isolating the roots of the function. The
result on the speed of the convergence means we know a bound for the distance
between a given element of the sequence and the root of the function. This rep-
resents the precision at which an element of the sequence approximates the root.
In practice this theorem is used to determine the number of iterations needed
in order to achieve a certain precision for the solution. The result on the sta-
bility of the process will help with eﬃciency issues as it allows the use of an
approximation rather than an exact real.
We do not present here the proofs of the theorems, we just give a few elements
of these proofs that are needed in understanding the next section. For details
on the proofs we refere the reader to [6]. The central element of the proof is
an induction process that establishes a set of properties for each element of
the Newton sequence. The proof introduces the auxiliary sequences {An }n∈N ,
{Bn }n∈N and {μn }n∈N :
An = 2An−1 (2)

2 1
Bn = An−1 Bn−1 C= μn−1 Bn−1 (3)
2

μn = 2An Bn C = μ2n−1 (4)

For each element of the Newton sequence, we are able to verify properties that
are similar to those for x(0) . Reasoning by induction we get the following:

◦ f (x(n) ) = 0 and | f (x1(n) ) | ≤ An

◦ |f (x(n) )/f (x(n) )| ≤ Bn ≤ ε
2n+1
◦ μn ≤ 1
Formal Veriﬁcation of Exact Computations Using Newton’s Method 411

Notice that hypothesis 3. of Theorem 1 is a property of the function and it does

not depend on the elements of Newton’s sequence.
From the above relations we get the convergence, unicity and speed of conver-
gence for the sequence.
For Theorem 4 (local stability) we prove that the new initial approxima-
tion x(0) satisﬁes similar hypotheses as those for x(0) . The new constants are
A = 3+μ 4
0
A0 and B = 3+μ 0
4μ0 B0 . This makes that μ = 2A B C = 1 and we can
verify that

◦ f (x(0) ) = 0 and | f (x1(0) ) | ≤ A

◦ |f (x(0) )/f (x(0) )| ≤ B
◦ μ ≤ 1

We are thus in the hypotheses of Theorem 1 and by applying this theorem we

conclude that the process converges to the same root x∗ .
Notice, however, that for the new constants we get μ = 1. If we do a Newton
iteration, we would get the new μ = μ = 1 (cf. equation (4)) and we would
2

not be able to do an approximation again, because Theorem 4 requires μ < 1.

To correct this, we impose a ﬁner approximation |x0 − x0 | ≤ (1−μ 0)
4μ0 B0 . This new
approximation yields the following formulas for the constants:
8
A = A0 (5)
7 + μ0

μ20 + 46μ0 + 17
B = B0 (6)
8(7 + μ0 )μ0
this makes that
μ20 + 46μ0 + 17
μ = <1 (7)
(7 + μ0 )2
We summarize these results in:
Corollary 1. If the conditions of Theorem 1 are satisﬁed and if, additionally,
0 < μ0 < 1 and [x(0) − μ20 B0 , x(0) + μ20 B0 ] ⊂ [a, b], then for any initial approxi-
mation x(0) that satisﬁes |x(0) − x(0) | ≤ 1−μ 0
4μ0 B0 the associated Newton’s process
∗
converges to the root x .

2.1 Newton’s Method with Rounding

We now have all the necessary tools to state and prove a theorem on the behavior
of Newton’s method if we consider rounding at each step. The rounding we do
is just good enough to ensure the convergence. This theorem is particularly
interesting for computations in arbitrary or multiple precision, as it relates the
number of iterations with the precision of the input and that of the result. This
means that for the ﬁrst iterations we need a lower precision, as we are not close
to the root. We will later increase the precision of our input with the desired
precision for the result.
412 N. Julien and I. Paşca

Theorem 5. We consider a function f : [a, b] → R and an initial approximation

x(0) satisfying the conditions in Theorem 1.
We also consider a function rnd : N × R → R that models the approximation we
will make at each step in the perturbed Newton sequence:
(n)
t(0) = x(0) and t(n+1) = rndn+1 (t(n) − ff(t(t(n))) )
If

1. ∀n∀x, x ∈]a, b[⇒ rndn (x) ∈]a, b[

2. 12 ≤ μ0 < 1
3. [x(0) − 3B0 , x(0) + 3B0 ] ⊂ [a, b]
1−μ20
4. ∀n∀x, |x − rndn+1 (x)| ≤ 1
3n R0 , where R0 = 8μ0 B0

then

a. the sequence {t(n) }n∈N converges and lim t(n) = x∗ where x∗ is the root of
n→∞
the function f given by Theorem 1
b. ∀n, |x∗ − t(n) | ≤ 2n−1
1
B0

The ﬁrst hypothesis makes sure that the new value will also be in the range of
the function. The second and third hypotheses come from the use of the stabil-
ity property of the Newton sequence (see Corollary 1). The fourth hypothesis
controls the approximation we are allowed to make at each iteration. The con-
clusion gives us the convergence of the process to the same limit as Newton’s
method without approximations. Also we give an estimate of the distance from
the computed value to the root at each step.

Proof. Our proof is based on those for theorems 1 - 4 and corollary 1. To give
the intuition behind the proof, we decompose Newton’s perturbed process t(n)
as follows:

1. set t(0) := x(0)

f (t(0) )
2. do a Newton iteration to get x(1) := t(0) − f (t(0) )
(1)
3. do an approximation of the result to get t := rnd(x(1) )
4. set t(0) := t(1) and go to step 2.

Now let’s look at these steps individually:

◦ At step 1. we start with the initial x(0) that satisfies the conditions in The-
orem 1. This means that Newton’s method from this initial point converges
to the root x∗ (cf. Theorem 1).
◦ At step 2. we consider a Newton sequence starting with x(1) . This sequence is
the same as the sequence at step 1. except that we “forget” the first element
of the sequence and start with the second. It is trivial that this sequence
converges to the root x∗ . We note that (cf. proof of Theorem 1) we can
associate the constants A1 , B1 to the initial iteration of this sequence and
get the corresponding hypotheses from Theorem 1.
Formal Verification of Exact Computations Using Newton’s Method 413

◦ At step 3. we consider Newton’s sequence starting from t(1) . This initial

point is just an approximation of the initial point of the previously considered
sequence. From Corollary 1 we get the convergence of the new sequence to the
same root x∗ . Moreover, the proof of Corollary 1 gives us the constants A , B
associated to the initial point that also satisfy the hypotheses of Theorem 1.
This means we can start the process over again.
If we take x(0) and then all the initial iterations of the sequences formed at
step 3. we get back our perturbed Newton’s sequence. But decomposing the
problem as we did gives the intuition of why this sequence should converge.
However, just having a set of sequences that all converge to the same root does
not suffice to prove that the sequence formed with all initial iterations of these
sequences will also converge to the same root. The reason is simple, the approx-
imation at step 3. could bring us back to the initial point x(0) which would still
yield a convergent Newton’s sequence, but which would not make the new ele-
ment of the perturbed sequence any closer to the root than the previous one. To
get the convergence of the perturbed sequence we need to control the approxi-
mation we make. We will see in what follows that hypothesis 4. suffices to ensure
the convergence of the new process.
To make the intuitive explanation more formal we consider the sequence of
sequences of real numbers {Yp }p∈N defined as follows:
Y0n = x(n) is the original Newton’s sequence;
Y1 is given by
Y10 = rnd1 (x(1) );
Y1n+1 = Y1n − f (Y1n )/f (Y1n ) is the Newton’s sequence associated to the initial
iteration Y10 ;
we continue in the same manner and for an arbitrary p we define Yp as follows
0
Yp+1 = rndp+1 (Yp1 );
n+1
Yp+1 = Yp+1 n
− f (Yp+1
n
)/f (Yp+1
n
).
We notice that taking the first element in each of these sequences forms our
perturbed Newton’s process:

Y00 = x(0) = t(0) and

0
Yn+1 = rndn+1 (Yn0 − f (Yn0 )/f (Yn0 )) = rndn+1 (t(n) − f (t(n) )/f (t(n) )) = t(n+1)
Following our plan, we now show that for each p the sequence {Ypn }n∈N
converges to x∗ and ensures a certain bound in the error.
◦ We start with sequence {Y0n }n∈N . Since it coincides with the initial sequence,
the properties from Theorem 1 are trivially satisﬁed. For the initial point Y00
we have the associated constants A0 , B0 . Applying Theorem 1 we get that
lim Y0n = x∗ and |x∗ − Y00 | ≤ 2B0 .
n→∞
n
◦ Before considering {Y1n }n∈N , we note that the sequence Y 0 = Y0n+1 (i.e. the
previously considered sequence where we start from the second element) also
0
satisﬁes the conditions, with initial point Y 0 = Y01 and constants A0 = 2A0
and B 0 = A0 B02 C. The laws for these constants are deduced from relations
414 N. Julien and I. Paşca

n 0
(2), (3). We get that lim Y 0 = x∗ and |x∗ − Y 0 | = |x∗ − Y01 | ≤ 2B 0 =
n→∞
2(A0 B02 C).
◦ Now we consider {Y1n }n∈N . The initial point of this sequence is Y10 = rnd1
0
(Y00 − f (Y00 )/f (Y00 )) = rnd1 (Y 0 ). We are in the situation of Corollary 1,
n
where we have a converging sequence ( {Y 0 }n∈N ) and we introduce an ap-
proximation in the initial iteration. To be able to apply this corollary we need
0 0 0 0
to verify 0 < μ0 < 1, [Y 0 − μ2 B 0 , Y 0 + μ2 B 0 ] ⊂ [a, b] and |rnd1 (Y 0 ) − Y 0 | ≤
0 0
1−μ0
4μ0 B 0 .
We will show later on that under our hypotheses these three con-
ditions are indeed veriﬁed. From Corollary 1 we get the new constants ac-
cording to relations (5), (6). This makes that we ﬁnd ourselves again in
n
the conditions of Theorem 1 and we can deduce that lim Y 1 = x∗ and
n→∞
∗ μ20 +46μ0 +17
|x − Y10 | ≤ 2B = 2 8(7+μ B0.
0 )μ0

We are in the appropriate conditions to start this process again and explain
in the same manner the properties for {Y2n }n∈N , {Y3n }n∈N , etc. The auxiliary
sequences are given by the following relations:
8
A0 = A0 and An+1 =
7 + μn (2An )
μn 2 + 46μn + 17 2
B0 = B0 and Bn+1

= (An Bn C)
8(7 + μn )μn

μn = 2(2An )(An Bn C)C = (2An Bn C)2

we also consider
22
μ n + 46μ n + 17
2
μ2 + 46μn + 17
μn+1 = 2An+1 Bn+1

C = n =
(7 + μn )2 (7 + μ 2n )2

1 − μ n 1 1 − μ n
2 2
1 − μn
Rn = Bn = ( μn Bn ) = Bn
4μn 2
4μ n 2 8μn
Using the above reasoning steps, we get by induction that |Yn0 − x∗ | ≤ 2Bn

and we also manage to show ∀n, Bn+1 ≤ 12 Bn ≤ 2n−1
1
B0 . The latter relations is
deduced from the above formulas by basic manipulations. It trivially implies the
convergence of the perturbed sequence to the root x∗ .
We need some auxiliary results to ensure that Corollary 1 is applied in the
appropriate conditions each time we make a rounding. These results are as follow:
◦ 0 < 12 ≤ μ0 = μ0 ≤ μn ≤ μn+1 ≤ . . . < 1
2
1 1−μ0
◦ Rn+1 ≤ 13 Rn ≤ . . . ≤ 31n R0 = 31n 1−μ 0
4μ0 B 0 = 3n 8μ0 B0
◦ |Yn+1
0
− Yn0 | ≤ 21n B0 + 31n R0
0 0
◦ [Y n − μ2 B n , Y n + μ2 B n ] ⊆ [Y00 − 3B0 , Y00 + 3B0 ] ⊂ [a, b]
n n

We do not discuss all the details as they are elementary reasoning steps concern-
ing inequalities, second degree equations or geometric series. All these results
have been formalized in Coq to ensure that no steps are overlooked.
Formal Veriﬁcation of Exact Computations Using Newton’s Method 415

Remarks. Independent of veriﬁcation of exact computations, this proof has an

interest from a proof engineering point of view. We were able to come up with
the proof because we had formalized theorems 1 - 4 inside a proof assistant. Such
a formalization forces the user to understand the structure of the proof on one
hand and to handle details with care on the other. Thus, an assisted proof is
usually more structured and more detailed than a paper proof (especially in do-
mains where automatic techniques are diﬃcult to implement, like real analysis).
For example, while on paper the auxiliary sequences {An }n∈N , {Bn }n∈N appear
during the proof, on the computer they are deﬁned apart from the proof, allowing
the user to better understand their importance and use similar sequences in the
new proof. A proof assistant is also helpful with syntactic aspects like properly
constructing the induction hypothesis and doing the bookkeeping to make sure
all needed details are taken into consideration.

3 A Coq Library for Exact Real Arithmetic

The exact real library we are considering represents a real in the interval [−1, 1]
as a lazy infinite sequence of signed digits of an arbitrary integer base. The signed
digits of a base β are the integers in [−β + 1, β − 1]. We denote s1 ::s the infinite
sequence beginning by the digit s1 and followed by the infinite sequence s. The
real number r represented by such an infinite sequence s in base β is :
∞
si
r = sβ = s1 ::s2 ::s3 :: . . .β = .
i=1
βi

A real number represented by a stream and for which we know the first digit
s1 +sβ
can be written as: r = s1 ::sβ = β .
Having signed digits makes our representation redundant. For example we
can represent 13 as 3::3::3::3 . . .10 but also as 4:: − 7::4:: − 7 . . .10 .
For each digit k the set of real numbers that admit a representation beginning
by this digit is: [ k−1 k+1
β , β ]. The sets associated to consecutive digits overlap with
a constant magnitude of β2 . The main benefit of this redundancy is that we are
able to design algorithms for which we can decide a possible first digit of the
output. Without redundancy this is in general undecidable. Take the example
of addition: 0::3 . . .10 + 0::6 . . .10 may need infinite precision to decide
whether the first digit is 0 or 1. In the case of signed digits we give 1 as a first
digit knowing we can always go back to a smaller number by using a negative
digit. We also note that in our example it was sufficient to know two digits of
the input to decide the first digit of the output and this is true for addition in
general.
Designing an algorithm therefore requires approximating the result to a pre-
cision that is sufficient to determine a possible first digit. Also, since our real
numbers are infinite streams, the algorithms need to be designed in such a way
that we are always able to provide an extra digit of the result. This is done by
co-recursive calls on our co-inductive streams.
416 N. Julien and I. Paşca

In Coq, co-induction [9] provides a way to describe potentially inﬁnite datatypes

as our infinite sequences of digits. It offers both efficient lazy evaluation and nice
proof schemes. The type of infinite sequences of objects of some type A is defined
as follows

CoInductive stream (A : Set) : Set := | Cons : A → stream A → stream A.

Cons should not be understood as a way to construct an inﬁnite stream from

another since we cannot build an initial infinite stream, but as a way to de-
compose an infinite stream into a finite part and an infinite part that could be
described again with a new Cons and so on.
Our real numbers will be streams of signed digits, so we also need to create
a model for the digits. They are abstracted with respect to the base and the
implementation of integers, so both the base and the type of integers can be
chosen by the user. Here we will denote their type by digit.
We can define new streams using co-recursive functions, for instance the
stream of 0s, which obviously represents the real number 0.

CoFixpoint zero : stream digit := Cons 0 zero.

To prove the correctness of the algorithms on streams of digits we ﬁrst deﬁne

a relation between these streams and the axiomatic reals of Coq. This relation
is based on the relation between the real value of a sequence, its ﬁrst digit and
the real value of the following sequence as noted previously. We formalize this
relation as a co-inductive predicate :

CoInductive represents (β : Z) : stream digit → R → Prop :=

| rep : ∀ s r k, −β < k < β → −1 ≤ r ≤ 1 →
represents β s r → represents β (Cons k s) k+r
β
.

This relation also makes sure that streams only represent reals in [−1, 1] and
that the digits are in the set of the allowed signed digits.
The correctness of our algorithms is veriﬁed when we manage to express a
represents relation between our implementation and the standard in the Coq
library. For instance the proof that the multiplication is correct is formulated in
this way :

Theorem mult correct :

∀ x y vx vy, represents x vx → represents y vy → represents (x ⊗ y) (vx ∗ vy).

In means that every time we have an exact real (i.e. a stream of digits) x that
represents an axiomatic real vx and an y that represents a vy than our multi-
plication of streams x and y (here denoted ⊗) will represent the multiplication
of axiomatic reals vx and vy.
For further details on algorithms and proofs for this library we refer the reader
to [13].
Formal Veriﬁcation of Exact Computations Using Newton’s Method 417

4 Newton’s Method on Exact Reals

4.1 Correctness of Newton’s Method

We want to prove correctness of computation with Newton’s method on

exact reals in the same manner we proved correctness of multiplication in
section 3. We code Newton’s algorithm for both exact reals and axiomatic reals.
For simpliﬁcation we use a function g on exact reals to represent the ratio ff(x)
(x)
of axiomatic reals.
Fixpoint EXn g ex0 n {struct n}: stream digit:= match n with
| 0 ⇒ ex0 | S n ⇒ let exn:= (EXn g x0 n) in exn % g exn end.

Fixpoint Xn f f’ x0 n {struct n}: R:= match n with

| 0 ⇒ x0 | S n ⇒ let xn:= (Xn x0 f f’ n) in xn − f xn / f’ xn end.

The relation between elements of the same rank in the two sequences:
∀ n, represents (EXn g EX0 n) (Xn X0 f f’ n)
is almost trivial, if we have a represents relation for the initial iteration and
for the function.
Theorem EXn correct : ∀ g ex0 f f’ x0 n, represents ex0 x0 →
(∀ x vx, represents x vx → represents (g x) (f vx / f ’ vx)) →
(∀ n, −1 ≤Xn x0 f f’ n ≤ 1) → represents (EXn g ex0 n) (Xn x0 f f’ n).

The proof follows from the correction of the subtraction on streams with respect
to the subtraction on axiomatic reals.
This theorem allows us to transfer properties proved for Newton’s method
on axiomatic reals to the method implemented on exact reals. If we satisfy the
conditions of Theorem 1 for the function f and the initial iteration X0, then
we can compute the root of the function at an arbitrary accuracy, given by
Theorem 3 (speed of convergence). From the same theorem we get the rank to
which we need to compute for a given accuracy to be obtained. However, if we
wanted to increase this accuracy, we would need to redo all the computation for
the new rank. We want to avoid this and take advantage of the lazy evaluation
characteristic for streams: we can design an algorithm that uses Newton’s method
to compute an arbitrary number of digits for the root of a given function, under
certain conditions for this function.

4.2 An Algorithm for Exact Computation of Roots

We consider a function f : [−1, 1] → R with x∗ the root of f and a suitable initial

approximation x(0) for Newton’s process. We have to find a possible first digit of
the result x∗ in base β. For this we use make_digit which requires a precision
of β−2
2β 2 of the result to make the appropriate choice of the first digit. Indeed
make_digit changes the first two digits of a representation by an equivalent
two digits prefix in order to ensure that each number close to a precision of β−2
2β 2
admits a representation begining by the same first new digit.
418 N. Julien and I. Paşca

To determine the number of Newton iterations that ensures this precision we

use Theorem 3 (speed of convergence), which gives us n s.t. |x(n) − x∗ | ≤ β−2 2β 2 .
We choose as a first digit for x∗ the first digit d1 of a representation of x(n) . This
d +x∗
gives us x∗ = 1 β 1 , where x∗1 is the number formed from the remaining digits
d +x∗
of x∗ . Since f (x∗ ) = 0, we get f ( 1 β 1 ) = 0. This means we can define a new
function f1 (x) := f ( d1β+x ), and x∗1 is the root of f1 . Determining the second digit
of x∗ is equivalent to determining the first digit of x∗1 . We repeat the previous
steps for function f1 and we take as the initial approximation the remaining
digits of x(n) , given by x(n) = βx(n) − d1 . Now we have a co-recursive process
to produce the digits of the root of our function one by one. If we simplify our
algorithm by using g = ff , when we transform g in g1 we get

f1 (x) f ( d1β+x ) d1 + x
g1 (x) := = 1 d +x
= β × g( )
f1 (x) βf ( β )
1 β

For the exact real implementation in Coq we express the algorithm on streams
d1 +xβ
of digits, so we remind that for the stream d1 ::x, we have d1 ::xβ = β

CoFixpoint exact newton (g: stream digit → stream digit) ex0 n:=
match (make digit (EXn g ex0 n) with
|d1:: x’ ⇒ d1::exact newton (fun x ⇒ (β & g (d1::x))) x’ n
end.

We note that β 1 represents a speciﬁc function provided by the library which

computes efficiently the multiplication of a stream by the base. It is possible to
determine a value for n that is sufficent to allow the production of a digit at
each co-recursive call. This simplifies the algorithm but reduces the quadratic
convergence to linear convergence.
The formal verification of this algorithm means we have to prove that the
output of this algorithm represents the root of the function f . For this we
use Theorems 1 - 3 (see section 2) on axiomatic reals and a version of the
theorem EXn_correct (section 4.1) that links Newton’s method on exact reals
to Newton’s method on streams. We need to show that if the initial function
f satisfies the hypotheses of Theorem 1 then the function f1 built at the co-
recursive call will also satisfy these hypotheses, thus yielding a correct algorithm.
The hypotheses of Theorem 1 impose that

1. f ∈ C (1) (] − 1, 1[)
2. ∀x, y ∈] − 1, 1[, |f (x) − f (y)| ≤ C|x − y|
3. f (x(0) ) = 0 and | f (x1(0) ) | ≤ A0 ;
(0)
4. | ff(x )
(x(0) )
| ≤ B0 ≤ 2ε ;
5. μ0 = 2A0 B0 C ≤ 1.

We analyze f1 (x) := f ( d1β+x ) for which we have f1 (x) = 1 d1 +x

βf ( β ) and the new
initial iteration x(n) = βx(n) − d1
Formal Veriﬁcation of Exact Computations Using Newton’s Method 419

1. the class of the function is obviously the same, so f ∈ C (1) (] − 1, 1[)

2. |f1 (x) − f1 (y)| = | β1 f ( d1β+x ) − β1 f ( d1β+y )| ≤ β1 C| d1β+x − d1β+y | = β12 C|x − y|
3. f1 (x(n) ) = f (x(n) ) = 0 and | f (x1(n) ) | = |β f (x1(n) ) | ≤ βAn ;
1
(n) (n)
4. | ff1 (x )
(x(n) )
| = |β ff(x )
(x(n) )
| ≤ βBn ;
1
5. μn = 2βAn βAn β12 C = 2An Bn C ≤ 1.

Relations 3. - 5. are given by the proof of Theorem 1. We are now able to prove
by co-induction that represents (exact_newton g ex0 n) x∗ .

4.3 Improvements of the Algorithm

Though short, elegant and proven correct, the algorithm presented in this section
is not usable in practice as it is very slow. There are two main reasons for this:

1. The certified computations from the library require a precision of the operands
higher than that of the result. We saw that in the case of addition one extra
digit is required, but for other operations and function this precision can be
higher. When we have an expression where we perform several operations,
the precision demanded for each individual operand is a lot higher than the
precision of the output. In the case of Newton’s method, each iteration only
brings a certain amount of information, so using a higher precision will not
improve the result.
2. This approach relies on the higher-order capabilities of the functional pro-
gramming language: the first argument of the exact_newton function is
itself a function that becomes more and more complex as exact_newton
calls itself recursively. The management of this function is somehow trans-
parent to the programmer, but it has a cost: a new closure is built at every
recursive call to exact_newton and when the function g is called, all the
closures built since the initial call have to be unraveled to obtain the opera-
tions that really need to be performed. This cost can be avoided by building
directly a first order data structure.

We discuss two possible improvements of this algorithm, dealing with these two
issues. For the first point the solution is simple, just use the significant digits
in the stream. Determining which are these significant digits and certifying the
result is still possible thanks to Theorem 5. We implement a truncate function
that given a stream s returns the stream containing the first n digits of s and
sets the rest to zero. This function represents the rnd function on axiomatic
reals (see Theorem 5).
Fixpoint truncate s n {struct n} :=
match n with | 0 ⇒ zero | S n’ ⇒
match s with | d :: s’ ⇒ d :: truncate s’ n’ end
end.

The perturbed Newton’s method becomes:

420 N. Julien and I. Paşca

Fixpoint Etn g ex0 (n : nat) {struct n} : stream digit := match n with

| 0 ⇒ ex0 | S n’ ⇒ let tn := (Etn g ex0 n’) in (truncate (tn % (g tn)) (φ n’))
end.

The function φ controls the approximation we can make at each iteration and
follows the constraints imposed by Theorem 5. The exact_newton algorithm
will work in the same way with this sequence as with the original method.
CoFixpoint exact newton rnd (g: stream digit → stream digit) ex0 n:=
match (make digit (Etn g ex0 n)) with
|d1:: x’ ⇒ d1::exact newton rnd (fun x ⇒ (β & g (d1::x))) x’ n
end.

Though the proof for this new algorithm is not ﬁnalized yet, we feel there is
no real diﬃculty in obtaining it as both the algorithm and the optimization we
make are proven correct.
To tackle the second point in our list of possible improvements we make ex-
plicit the construction of the new function g in the co-recursive call.
CoFixpoint exact newton aux
(g : stream digit → stream digit) (Xn : stream digit) k n :=
let Xn’ := make k digits x0 (EXn g x0 n) k in
(nth k Xn’) :: exact newton aux g Xn’ (S k) n.

Deﬁnition exact newton2 (g : stream digit → stream digit )

(X0 : stream digit) n := exact newton aux g X0 0 n.

The function make_k_digits takes three arguments: two streams x and y

and an integer k and produces k + 1 digits of y by copying the first k digits
in x and computing the digit k + 1 by using make_digit. For the function
to perform correctly we must ensure that the first k digits in y can indeed be
the same as those in x. In our case this results from the theorem on the speed
of convergence of Newton’s method, which make sure that a certain element is
close enough to the root. The way the algorithm works is that it does iterations
always for the same function g. It produces digits one at a time. Once it reached
enough precision to certify an extra digit, the (k + 1)th, it gets this digit by using
the function nth and it continues to compute where it left of.
This algorithm performs better than the previous one, but the optimizations
performed in this case seem more difficult to prove correct. At the time of writing
this paper we have the good properties on make_k_digit and the proof of
correctness of the algorithm is in progress.

4.4 Applications to the Square Root

Newton’s method is commonly used for the implementation of nth root function
or division. We discuss the example of the square root to illustrate the behaviour
of our algorithms. The square root of a positive real number a is the root of the
f (x)
function fsqrt (x) = x2 − a. The corresponding function gsqrt is fsqrt
= x2 − 2x
a
.
sqrt (x)
Due to restrictions about implementing the inverse function of exact reals, the
Formal Veriﬁcation of Exact Computations Using Newton’s Method 421

library provides functions of the family x → β n1 x where n > 0. So we chose

instead the function fsqrt (x) = β 2 x2 − a which corresponds to gsqrt (x) = x2 −
√
a a
2
2β x . The root of this function is β . So a ﬁnal multiplication by the base
will give the expected result. We apply the algorithm to this function gsqrt and
the user provide a suitable initial approximation. We prove in Coq that the
resulting function actually computes a representation of the square root function
on axiomatic reals divided by the base.

Deﬁnition Ssqrt (a : stream digit) ex0 n := exact newton (g sqrt a) ex0 n.

Theorem sqrt correct : ∀ (a : stream digit) (va : R),
represents a va → represents (Ssqrt a) (( sqrt va)/β).

Theoriginal algorithm is slow. For example the computation of the ﬁrst digit
of 12 in base 2124 using the original algorithm blocks the system, while for the
same algorithm improved with approximations we get the equivalent precision of
37 decimal digits in 12 seconds. The second algorithm exact_newton2 brings
an improvement at each new digit we want to obtain making the algorithm
run in average twice as fast. We should also take into consideration that using
f (x) = xa2 − 1 can improve our execution times considerably as there is only
one division involved. Nevertheless, our intention here was not to implement
an eﬃcient square root, but to test the capabilities of the previously presented
algorithms.

5 Related Work
This work presents different angles in the formal verification of a numerical al-
gorithm. A lot of work is being done concerning formally verified exact real
arithmetic libraries. Besides the library presented here, the development [15] for
PVS and [18] also for Coq are two of the most recent such implementations.
These two libraries have computations that are verified with respect to the real
analysis formalizations in PVS and C-CoRN [5], respectively. A significant part
of the work presented here could be reproduced in any of these libraries. In the
case of [18] the exact reals operations and functions are verified via an isomor-
phism between the exact reals and the C-CoRN real structure; there is also an
isomorphism between C-CoRN reals and Standard Library reals (see [14]), so in
theory it should be possible to verify computations by using the presented proofs
and the two isomorphisms.
Concerned with exact real arithmetic and also with co-inductive aspects we
mention the work of Niqui [17]. This works aims to obtain all field operations on
real numbers via the Edalat-Potts algorithm for lazy exact arithmetic.
Results of the convergence of Newton’s method with rounding have been
proved for some special cases like the the inverse and the square root [3]. Of
course, in these cases the speed of convergence is better than in the general case.
The proof of correctness of square root algorithms has been the subject of
several formal developments. We mention [2] for the verification of the GMP
422 N. Julien and I. Paşca

square root algorithm, [11] for an Intel architecture square root algorithm and
[20] for the verification of the square root algorithm in an IBM processor.
A general algorithm using Newton’s method was developped by Hur and Dav-
enport [12] on a different representation of exact reals but not in a formally
verified setting.

6 Conclusions and Perspectives

As a case study of theorem proving in numerical analysis, this work tries to
underline three aspects of such a development: how do design and formalize
the necessary proofs from “paper” mathematics, how to prove correct numerical
methods implemented on exact reals and provide certified computations, how to
design and verify specific algorithms for an implementation of exact reals. The
Coq development can be found at http://www-sop.inria.fr/marelle/
Exact_Newton.
To the best of the authors’ knowledge, the result and proof of Theorem 5
are new, though the authors are not experts in numerical analysis. Using only a
predetermined precision for our computation makes it that our formalization can
be seen as an (imperfect) model of computation in multiple or arbitrary precision,
thus validating Newton’s method in such a context. The proof of Theorem 5 was
motivated by the need to improve the algorithm discussed in section 4.2. The
contribution of proof assistants in obtaining this proof is twofold: the proof was
motivated by a formal development and the proof was constructed inside the
proof assistant, following the pattern of existing proofs.
The algorithms presented in section 4 are also new. The verified algorithm,
though it is not of practical use, it can serve as a model in obtaining proofs for
our optimized algorithms. The next step is to put together the two proofs we
presented here and get a formally verified algorithm for computing roots in a
(more) efficient manner. With the results presented here we see no difficulties in
obtaining this proof.
The axiomatic formalization on Newton’s method contains the multivariable
version of the theorems 1 - 4. This means we can solve systems of (non-)linear
equations using this method. Exact real arithmetic libraries do not yet treat
such cases, so it would be an interesting experiment to see to what extent the
results presented here can be obtained in the multivarite setting. We note that
the proof of Theorem 5 has the same structure for the multivariate case. We
presented here the real case to make it easier for the reader to follow and to
show the correspondence with results from section 4.2.

Acknowledgments
We thank Yves Bertot for his help and constructive suggestions.
Formal Veriﬁcation of Exact Computations Using Newton’s Method 423

References
1. Bertot, Y., Castéran, P.: Interactive Theorem Proving and Program Development,
Coq’Art:the Calculus of Inductive Constructions. Springer, Heidelberg (2004)
2. Bertot, Y., Magaud, N., Zimmermann, P.: A Proof of GMP Square Root. J. Autom.
Reasoning 29(3-4), 225–252 (2002)
3. Brent, R.P., Zimmermann, P.: Modern Computer Arithmetic (2006) (in prepara-
tion), http://www.loria.fr/zimmerma/mca/pub226.html
4. Coq development team. The Coq Proof Assistant Reference Manual, version 8.1.
(2006)
5. Cruz-Filipe, L., Geuvers, H., Wiedijk, F.: C-CoRN: The Constructive Coq Repos-
itory at Nijmegen. In: Asperti, A., Bancerek, G., Trybulec, A. (eds.) MKM 2004.
LNCS, vol. 3119, pp. 88–103. Springer, Heidelberg (2004)
6. Démidovitch, B., Maron, I., et al.: Éléments de calcul numérique. Mir - Moscou
(1979)
7. Fleuriot, J.D.: On the mechanization of real analysis in Isabelle/HOL. In: Harrison,
J., Aagaard, M. (eds.) TPHOLs 2000. LNCS, vol. 1869, pp. 146–162. Springer,
Heidelberg (2000)
8. Gamboa, R., Kaufmann, M.: Nonstandard Analysis in ACL2. Journal of automated
reasoning 27(4), 323–428 (2001)
9. Giménez, E.: Codifying guarded definitions with recursive schemes. In: Dybjer, P.,
Nordström, B., Smith, J. (eds.) TYPES 1994. LNCS, vol. 996, pp. 39–59. Springer,
Heidelberg (1995)
10. Harrison, J.: Theorem Proving with the Real Numbers. Springer, Heidelberg (1998)
11. Harrison, J.: Formal verification of square root algorithms. Formal Methods in
System Design 22(2), 143–153 (2003)
12. Hur, N., Davenport, J.H.: A generic root operation for exact real arithmetic. In:
Blanck, J., Brattka, V., Hertling, P. (eds.) CCA 2000. LNCS, vol. 2064, pp. 82–87.
Springer, Heidelberg (2001)
13. Julien, N.: Certified exact real arithmetic using co-induction in arbitrary integer
base. In: Garrigue, J., Hermenegildo, M.V. (eds.) FLOPS 2008. LNCS, vol. 4989,
pp. 48–63. Springer, Heidelberg (2008)
14. Kaliszyk, C., O’Connor, R.: Computing with classical real numbers. In: CoRR,
abs/0809.1644 (2008)
15. Lester, D.R.: Real Number Calculations and Theorem Proving. In: Mohamed, O.A.,
Muñoz, C., Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 215–229. Springer,
Heidelberg (2008)
16. Mayero, M.: Formalisation et automatisation de preuves en analyses reelle et nu-
merique. Ph.D thesis, Université de Paris VI (2001)
17. Niqui, M.: Coinductive formal reasoning in exact real arithmetic. Logical Methods
in Computer Science 4(3-6), 1–40 (2008)
18. O’Connor, R.: Certified Exact Transcendental Real Number Computation in Coq.
In: Mohamed, O.A., Muñoz, C., Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170,
pp. 246–261. Springer, Heidelberg (2008)
19. Paşca, I.: A Formal Verification for Kantorovitch’s Theorem. Journées Franco-
phones des Langages Applicatifs, 15–29 (2008)
20. Sawada, J., Gamboa, R.: Mechanical verification of a square root algorithm using
taylor’s theorem. In: Aagaard, M., O’Leary, J.W. (eds.) FMCAD 2002. LNCS,
vol. 2517, pp. 274–291. Springer, Heidelberg (2002)
Construction of Büchi Automata for LTL Model
Checking Verified in Isabelle/HOL

Alexander Schimpf1 , Stephan Merz2 , and Jan-Georg Smaus1

1
University of Freiburg, Germany
{schimpfa,smaus}@informatik.uni-freiburg.de
2
INRIA Nancy, France
[email protected]

Abstract. We present the implementation in Isabelle/HOL of a trans-

lation of LTL formulae into Büchi automata. In automaton-based model
checking, systems are modelled as transition systems, and correctness
properties stated as formulae of temporal logic are translated into cor-
responding automata. An LTL formula is represented by a (generalised)
Büchi automaton that accepts precisely those behaviours allowed by the
formula. The model checking problem is then reduced to checking language
inclusion between the two automata. The automaton construction is thus
an essential component of an LTL model checking algorithm. We imple-
mented a standard translation algorithm due to Gerth et al. The correct-
ness and termination of our implementation are proven in Isabelle/HOL,
and executable code is generated using the Isabelle/HOL code generator.

1 Introduction
The term model checking [2] subsumes several algorithmic techniques for the ver-
ification of reactive and concurrent systems, in particular with respect to proper-
ties expressed as formulae of temporal logics. More specifically, the context of our
work are LTL model checking algorithms based on Büchi automata [19]. In this
approach, the system to be verified is modelled as a finite transition system and
the property is expressed as a formula ϕ of linear temporal logic (LTL). The for-
mula ϕ constrains executions, and the transition system is deemed correct (with
respect to the property) is all its executions satisfy ϕ. After translating the for-
mula into a Büchi automaton [1], the model checking problem can be rephrased in
terms of language inclusion between the transition system (interpreted as a Büchi
automaton) and the automaton representing ϕ or, technically more convenient, as
an emptiness problem for the product of the transition system and the automaton
representing ¬ϕ.
In this paper, we present a verified implementation in Isabelle/HOL of the
classical translation algorithm due to Gerth et al. [7] of LTL formulae into Büchi
automata.1 The automaton translation is at the heart of automaton-based model
1
The Isabelle sources on which our paper is based are available at http://www.
informatik.uni-freiburg.de/~ki/papers/diplomarbeiten/LTL2LGBA.zip.
Extensive documentation on Isabelle can be found at http://isabelle.in.tum.de.
Throughout this paper Isabelle refers to Isabelle/HOL.

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 424–439, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Construction of Büchi Automata for LTL Model 425

checking algorithms, and an error in the design or the implementation of the

translation algorithm compromises the soundness of the verdict returned by the
model checker. Indeed, the original implementation of the translation proposed
by Gastin and Oddoux [6] contained a flaw that went unnoticed for several
years, despite wide-spread use within the Spin model checker. The purpose of
our work is to demonstrate that it is feasible to obtain an executable program
implementing such a translation from a formalisation in a modern interactive
proof assistant. Assuming that the kernel of the proof assistant and the code
generator are correct, we thus obtain a highly trustworthy implementation.
We chose the algorithm of Gerth et al. because it is well-known and represen-
tative of the problems that such algorithms pose. More recent algorithms such
as [6] are known to behave better for larger LTL formulae, but they require
additional automata-theoretic concepts, and we leave their formalisation as a
worthwhile and challenging topic for future work.
The algorithm of Gerth et al. is based on the construction of a graph of
nodes labelled with subformulae of the original formula, similar to a tableau
construction [4]. Acceptance conditions on infinite runs complement the tableau
and enforce “eventuality” (liveness) properties. The main theorem states that the
generated automaton should accept precisely those words (system executions)
that are models of the temporal formula. The correctness of the translation is
by no means obvious; in fact, we already found proving the termination of the
method to be quite challenging. In our formalisation, we limit ourselves to data
structures and operations that are supported by the Isabelle code generator.
In this way, extraction of executable code becomes straightforward, but we are
limited to relatively low-level constructions.
The paper is organised as follows: In the next section, we provide some prelim-
inaries on LTL and Büchi automata. In Sect. 3, we recall the algorithm proposed
by Gerth et al. [7]. Section 4 presents our implementation of the algorithm. In
Sect. 5, we discuss the proof of termination and correctness of this implemen-
tation. Section 6 concludes. The results presented in this paper were obtained
within the Diploma Thesis of the first author [14].

2 Preliminaries
2.1 Linear Temporal Logic
Linear-time temporal logic LTL [13] is a popular formalism for expressing cor-
rectness properties about (runs of) reactive systems. It extends propositional
logic by modal operators that refer to future points of time.

Deﬁnition 1. Let Prop be a ﬁnite, non-empty set of propositions. The set Φ of

LTL formulae is inductively deﬁned as follows:
– Prop ⊆ Φ;
– if ϕ ∈ Φ and ψ ∈ Φ, then ¬ϕ ∈ Φ, ϕ ∨ ψ ∈ Φ, Xϕ ∈ Φ (“next ϕ”), and
ϕ U ψ ∈ Φ (“ϕ until ψ”).
426 A. Schimpf, S. Merz, and J.-G. Smaus

Further logical connectives can be deﬁned as abbreviations. In particular,

we will use the propositional constants (true) and ⊥ (false), as well as the
operators ∧ and V (“release”), which are the duals of ∨ and U.
The semantics of an LTL formula is deﬁned with respect to a (temporal) inter-
pretation ξ = a0 a1 . . ., which is an ω-word2 over 2Prop , consisting of propositional
interpretations ai ∈ 2Prop . The set a0 contains exactly those propositions that
are true in the initial state of the temporal interpretation ξ, a1 gives the propo-
sitions true in the second state, and so on. When ξ = a0 a1 . . ., we write ξi for ai
and ξ|i for the suﬃx ai ai+1 . . ., which is itself a temporal interpretation.

Deﬁnition 2. The relation ξ ϕ (“ξ is a model of ϕ” or “ϕ holds of ξ”) is

inductively deﬁned as follows:

ξ p iff p ∈ ξ0 (p ∈ Prop)
ξ ¬ϕ iff ξϕ
ξ ϕ∨ψ iff ξ ϕ or ξ ψ
ξ Xϕ iff ξ|1 ϕ
ξ ϕUψ iff there exists i ∈ N such that ξ|i ψ and ξ|j ϕ for all 0 ≤ j < i.

2.2 Generalised Büchi Automata

The automata-theoretic approach to LTL model checking [19] relies on translat-

ing LTL formulae ϕ to Büchi automata Aϕ such that a word ξ is accepted by
Aϕ if and only if ξ ϕ. The following variant of Büchi automata underlies the
algorithm by Gerth et al. [7].

Deﬁnition 3. A generalised Büchi automaton (GBA) A is tuple (Q, I, δ, F )

where:

– Q is a ﬁnite set of states;

– I ⊆ Q is the set of initial states;
– δ ⊆ Q × Q is the transition relation;
– F ⊆ 2Q is the set of acceptance sets (the acceptance family).

An ω-word σ over Q is called path of A if σ0 ∈ I and (σi , σi+1 ) ∈ δ for all i ∈ N.

The limit of ω-word σ is given as limit(σ) := {q | ∃∞ n. σn = q}3 . The GBA A
accepts a path σ of A if limit(σ) ∩ M = ∅ holds for all M ∈ F .

Observe that Def. 3 does not mention an alphabet. Instead, it is conventional

to label automaton states by sets of propositional interpretations and use these
labels to deﬁne the acceptance of a temporal interpretation by a GBA. Formally,
this is achieved by the following deﬁnition, where D is chosen as 2Prop .

Deﬁnition 4. A labelled generalised Büchi automaton (LGBA) is given by a

triple (A, D, L) where:
2
An ω-word over alphabet Σ is a sequence s0 s1 . . . where si ∈ Σ for all i ∈ N.
3
The symbol ∃∞ means “there are inﬁnitely many”.
Construction of Büchi Automata for LTL Model 427

– A = (Q, I, δ, F ) is a GBA;
– D is a ﬁnite set of labels;
– L : Q → 2D is the label function.

A path σ of A is consistent with an ω-word ξ over D if ξi ∈ L(σi ) for all i ∈ N.

An LGBA accepts an ω-word ξ over D iﬀ it (more precisely, its underlying GBA)
accepts some path of A that is consistent with ξ.

In model checking, systems are modelled as Kripke structures, that is, ﬁnite
transition systems whose states are labelled with propositional interpretations.
A Kripke structure K is an LGBA whose underlying GBA has a trivial (empty)
acceptance family, and whose label function assigns a single propositional inter-
pretation to every state. Assuming that the LGBA A represents the complement
of the LTL formula ϕ (A accepts precisely those executions of which ϕ does not
hold), K is a model of ϕ if no execution is accepted by both K and A, i.e. if the
intersection of the languages accepted by the two automata is empty.

3 Generating an LGBA for an LTL Formula

We recall the algorithm proposed by Gerth et al. [7] for computing an LGBA Aϕ
(with set of labels 2Prop ) for an LTL formula ϕ such that Aϕ accepts a temporal
interpretation ξ iff ξ ϕ.
The construction of Aϕ proceeds in three stages. First, one builds the graph
of the underlying GBA, using a procedure similar to a tableau construction [4].
Second, the function for labelling states of the LGBA is defined. Finally, the
acceptance family is determined based on the set of “until” subformulae of ϕ.
We now describe each stage in more detail.
The first step builds a graph of nodes (which will become the automaton
states) that contain subformulae of ϕ. Intuitively, a node “promises” that the
formulae it contains hold of any temporal interpretation that has an accepting
run starting at that node. The construction is essentially based on “recursion
laws” of LTL such as

μ U ψ ↔ ψ ∨ (μ ∧ X(μ U ψ)) (1)

that are used to split a promised formula into promises for the current state
and for the successor state. The initial states of the automaton will be precisely
those nodes that promise ϕ.
Without loss of generality, we assume that ϕ is given in negation normal form
(NNF), i.e. the negation symbol is only applied to propositions. Transformation
to NNF is straightforward once we include the dual operators ∧ and V among
the set of logical connectives, using laws such as ¬(ϕ ∨ ψ) ≡ ¬ϕ ∧ ¬ψ.
Gerth et al. [7] represent each node of the graph by a record with the following
ﬁelds:

– Name: a unique identiﬁer of the node.

428 A. Schimpf, S. Merz, and J.-G. Smaus

– Incoming: the set of names of all nodes that have an edge pointing to the
current node. Using this field, the entire graph is represented as the set of
its nodes.
– New : A set of LTL formulae promised by this node but that have not yet
been processed. This set is used during the construction and is empty for all
nodes of the final graph.
– Old : A set of LTL formulae promised by this node and that have already
been processed.
– Next : A set of LTL formulae that all successor nodes must promise.
– Father : During the construction, nodes will be split. This field contains the
name of the node from which the current one has been split. It is used by
Gerth et al. solely for reasoning about the algorithm, and we will not mention
it any further.
The algorithm successively moves formulae from New to Old, decomposing them,
and inserting subformulae into New and Next as appropriate. When the New
field is empty, a successor node is generated whose New field equals the Next
field of the current node. The algorithm maintains a list of all nodes generated so
far to avoid generating duplicate nodes; this is essential for ensuring termination
of the algorithm. More formally, the algorithm is realised by the function expand
whose pseudo-code is reproduced in Fig. 1. For reasons of space and clarity,
we omit some parts of the code in this presentation, in particular, some of the
cases for the currently considered formula η, while preserving the original line
numbering.
The automaton graph is constructed by the following function call:

expand([ Name ⇐ new name(), Incoming ⇐ {init}, (2)

New ⇐ {ϕ}, Old ⇐ ∅, ⇐ Next ⇐ ∅ ], ∅)

where ϕ is the input LTL formula and init is a reserved identifier: all nodes whose
Incoming field contains init will be initial states of the automaton.
In the second step of the construction, we define the function labelling the
nodes with sets of propositional interpretations, each represented as the set of
propositions that evaluate to true. The label of a node q is defined as the set of
interpretations that are compatible with Old(q). Formally, let

Pos(q) = Old(q) ∩ Prop and Neg(q) = {η ∈ Prop | ¬η ∈ Old(q)}.

A propositional interpretation X is compatible with q iﬀ it satisﬁes all atomic

propositions in Pos(q) but none in Neg(q). This motivates the deﬁnition

L(q) = {X ⊆ Prop | X ⊇ Pos(q), X ∩ Neg(q) = ∅}. (3)

It remains to deﬁne the acceptance family of the LGBA. Reconsider the “re-
cursion law” (1) for the U operator, which is implemented by lines 20–27 of
the code of Fig. 1. Every node “promising” a formula μ U ψ has one successor
promising ψ and a second successor promising μ and X(μ U ψ). Thus, the graph
of the LGBA may contain paths such that all nodes along the path promise μ
Construction of Büchi Automata for LTL Model 429

3: function expand(Node, Nodes Set)

4: if New (Node)=∅ then
5: if ∃ND∈Nodes Set with Old (ND)=Old (Node) and Next(ND)=Next(Node) then
6: Incoming(ND):=Incoming(ND)∪Incoming(Node);
7: return(Nodes Set)
8: else return(expand([Name⇐new name(), Incoming⇐{Name(Node)},
10: New ⇐Next(Node), Old ⇐ ∅, Next⇐ ∅], {Node}∪Nodes Set))
11: else
12: let η ∈New (Node);
13: New (Node):=New (Node)\{η};
14: case η of
15: ¬P ⇒
18: Old (Node):=Old (Node)∪{η};
19: return(expand(Node, Nodes Set));
20: η =μUψ ⇒
21: Node1:=[Name⇐new name(), Incoming⇐Incoming(Node),
22: New ⇐New (Node)∪({μ}\Old (Node))
23: Old ⇐Old (Node)∪{η}, Next⇐Next(Node)∪{η}];
24: Node2:=[Name⇐new name(), Incoming⇐Incoming(Node),
25: New ⇐New (Node)∪({ψ}\Old (Node))
26: Old ⇐Old (Node)∪{η}, Next⇐Next(Node)];
27: return(expand(Node2, expand(Node1, Nodes Set)));
32: end expand

Fig. 1. The algorithm by Gerth et al. [7] (incomplete)

but no node promises ψ. Such paths are not models of μ U ψ, which requires ψ
to be true eventually, and the acceptance family is deﬁned in order to exclude
them. Formally, we deﬁne for each formula μ U ψ the set of nodes

FμUψ = {q ∈ Q | μ U ψ ∈
/ Old(q) or ψ ∈ Old(q)}, (4)

and deﬁne the acceptance family F as

F = {FμUψ | μ U ψ is a subformula of ϕ}. (5)

4 Implementation in Isabelle
4.1 LTL Formulae
We represent LTL formulae in Isabelle as an inductive data type. For the pur-
poses of this presentation, we restrict to NNF formulae, although our full devel-
opment also includes unrestricted LTL formulae and NNF transformation. For
simplicity, we represent atomic propositions as strings; alternatively, the type of
propositions could be made a parameter of the data type deﬁnition.
430 A. Schimpf, S. Merz, and J.-G. Smaus

The above definition includes the concrete syntax for each clause of the data
type. For example, (X prop(’’p’’)) and prop(’’q’’) would be the Isabelle
representation of (Xp) ∧ q.
We next introduce types for representing ω-words and temporal interpreta-
tions, and define the semantics of LTL formulae by a straightforward primitive
recursive function definition.

types
’a word = nat ⇒ ’a
interprt = "(string set) word"

fun semantics :: "[interprt, frml] ⇒ bool" ("_ |= _" [80,80] 80)

where
"ξ |= true = True"
| "ξ |= false = False"
| "ξ |= prop(q) = (q∈ξ(0))"
| "ξ |= nprop(q) = (q∈ξ(0))"
/
| "ξ |= ϕ and ψ = (ξ |= ϕ ∧ ξ |= ψ)"
| "ξ |= ϕ or ψ = (ξ |= ϕ ∨ ξ |= ψ)"
| "ξ |= X ϕ = (suffix 1 ξ |= ϕ)"
| "ξ |= ϕ U ψ = (∃ i. suffix i ξ |= ψ ∧ (∀ j<i. suffix j ξ |= ϕ))"
| "ξ |= ϕ V ψ = (∀ i. suffix i ξ |= ψ ∨ (∃ j<i. suffix j ξ |= ϕ))"

4.2 Büchi Automata

We now encode GBAs and LGBAs in Isabelle, following Defs. 3 and 4. In this
encoding we approximate the set Q of states by a type parameter ’q, which will
later be instantiated by the type representing the nodes of the graph. Although
not enforced by the definition, finiteness of the actual set of nodes will be ensured
by the termination of the algorithm, which produces states one by one.
GBAs and LGBAs are naturally modelled as records in Isabelle. Since we aim
at producing executable code, all sets that appear in the original definition are
represented as lists.

record ’q gba =
initial :: "’q list"
Construction of Büchi Automata for LTL Model 431

trans :: "(’q × ’q) list"

accept :: "’q list list"

record ’q lgba =
gbauto :: "’q gba"
label :: "’q string list list"

The node labelling function is represented by a partial function (denoted by the

symbol) because it needs only be defined over actual states of the LGBA,
whereas the type ’q may contain extra elements.
It remains to define the runs and the acceptance family of (L)GBAs. The
following definitions are a straightforward transcription of Def. 3: the utility
function set from the Isabelle library computes the set of list elements, the
limit function is defined as indicated in Def. 3.

deﬁnition gba_path :: "[’q gba, ’q word] ⇒ bool" where

"gba_path A σ
≡ σ 0 ∈ set (initial A) ∧
(∀ n. ((σ n), σ (Suc n)) ∈ set (trans A))"
deﬁnition gba_accept :: "[’q gba, ’q word] ⇒ bool" where
"gba_accept A σ
≡ gba_path A σ ∧
(∀ i<length (accept A). limit σ ∩ set (accept A!i) = {})"

The acceptance condition for an LGBA is deﬁned in a similar fashion. Finally,

the predicate lgba accept characterises the language of an LGBA: a temporal
interpretation ξ is accepted by the LGBA A if there exists some path σ that is
accepted by the GBA underlying A and that is consistent with ξ (cf. Def. 4).

deﬁnition lgba_accept :: "[’q lgba, interprt] ⇒ bool"

where
"lgba_accept A ξ
≡ ∃ σ. (∀ i. ξ i ∈ set (map set (the (label A (σ i)))))
∧ gba_accept (gbauto A) σ"

The use of the function “the” in the above code is a technicality related to the
fact that the node labelling function is, in principle, partial. What matters is
that “the (label A (σ i))” is of type string list list.

4.3 Translation from LTL to LGBA

We now formalise in Isabelle the algorithm due to Gerth et al. that we have
presented informally in Sect. 3. As discussed there, three elementary steps have
to be addressed:
– construct the graph of the underlying GBA using the expand function;
– deﬁne the acceptance family of the GBA;
432 A. Schimpf, S. Merz, and J.-G. Smaus

function (sequential) expand :: "[cnode, node list] ⇒ node list"

where
"expand ([], n) ns
= (if (∃ nd∈set ns. set (old nd) = set (old n) ∧
set (next nd) = set (next n))
then upd_nds (λn nd. set (old nd) = set (old n) ∧
set (next nd) = set (next n)) ns n
else expand (next n,
(|name = Suc(name n),
incoming = [name n],
old = [],
next = []|)) (n#ns))"

| "expand ((nprop(q))#fs, n) ns
= expand (fs, n(| old := (nprop(q))#(old n) |)) ns"

| "expand ((μ U ψ) #fs, n) ns

= (let nds = expand (μ#fs,
n(| old := (μ U ψ)#(old n),
next := (μ U ψ)#(next n) |)) ns
in expand (ψ#fs,
n(| name := . . .,
old := (μ U ψ)#(old n) |)) nds)"

Fig. 2. The Isabelle implementation of expand, simpliﬁed

– compute the labelling of the states of the LGBA with sets of propositional
interpretations.

The algorithm expand constructs a graph, represented as a set of nodes. In

Isabelle, we again use lists instead of finite sets in order to simplify code gen-
eration. We represent node names as integers, and model a node as a record
containing the fields introduced in Sect. 3. We omit the Father field, which is
unnecessary for the construction of the graph. We also replace the field New,
which is used only during the construction, by an extra argument to the expand
function. More precisely, the first argument of the function is of type cnode,
defined as a pair of a formula list and a node.

record node =
name :: nat
incoming :: "nat list"
old :: "frml list"
next :: "frml list"
types cnode = "frml list * node"

Figure 2 contains the fragment of the deﬁnition of function expand in Isabelle that
corresponds to the pseudo-code shown in Fig. 1. The function upd nds merges
Construction of Büchi Automata for LTL Model 433

the incoming fields of the current node with those of the already constructed
nodes whose old and next fields agree with those of the current node.
For the sake of presentation, the code shown in Fig. 2 is somewhat simpli-
fied with respect to our Isabelle theories: the actual definition produces a pair
consisting of a list of nodes and the highest used node name, which is used in
the (omitted) definition of the name of the node created in the second call to
expand in the clause for “until” formulae. Moreover, the actual definition checks
for duplicates whenever a formula is added to the old or next components of a
node.
The graph for an LTL formula is computed by the function create graph,
which in analogy to (2) is defined as

deﬁnition create_graph :: "frml ⇒ node list"

where
"create_graph ϕ
≡ expand ([ϕ], (| name = 1, incoming = [0],
old = [], next = [] |)) []"

We now address the second problem, i.e. the computation of the acceptance
family for an LTL formula and a graph represented as a list of nodes. The
following function accept family is a quite direct transcription of the deﬁnition
of the acceptance family in (5):

deﬁnition accept_family :: "[frml, node list] ⇒ node list list"

where
"accept_cond ϕ ns
≡ map (λη. case η of
_ U ψ ⇒ [q←ns. η∈set(old q) −→ ψ∈set(old q)])
(all_until_frmls ϕ)"

where all until frmls computes the list of “until” subformulae of the argu-
ment formula, without duplicates. It is now straightforward to define a function
create gba that constructs a GBA (of type node gba) from a node list repre-
senting the graph.
It remains to compute the function labelling the nodes with sets of propo-
sitional interpretations, in order to obtain an LGBA. The following definitions
implement the labelling defined by (3) in a straightforward way.

deﬁnition
gen_label :: "[string list list, node] ⇒ string list list"
where
"gen_label lbls n
≡ [xs←lbls. set (pos_props (old n)) ⊆ set xs
∧ list_inter xs (neg_props (old n)) = []]"
deﬁnition
create_lgba :: "frml ⇒ node lgba"
where
434 A. Schimpf, S. Merz, and J.-G. Smaus

"create_lgba ϕ
≡ (let ns = create_graph ϕ in
(| gbauto = create_gba ϕ ns,
label = [ns[→]map (gen_label (list_Pow (get_props ϕ)))
ns] |))"

The auxiliary functions pos props and neg props compute the lists of positive
and negative literals contained in a list of formulae; get props computes the list
of atomic propositions contained in a temporal formula.

4.4 Code Generation

We have set up our theories in such a way that they use only data types and
operations supported by the code generator, except for certain tests that convert
lists to sets. In order to make these tests executable, we derive some auxiliary
lemmas such as

lemma [code inline]:

"set xs ⊆ set ys ←→ list_all (λx. x mem ys) xs"
lemma [code inline]:
"set xs = set ys ←→ set xs ⊆ set ys ∧ set ys ⊆ set xs"

After these preliminaries, executable code can be extracted by simply issuing

the command

export code create_lgba in OCaml ﬁle "ltl2lgba.ml"

from the Isabelle theory file. This command produces an OCaml module con-
taining the function create lgba and all definitions and functions on which that
function depends.
In order to use this code we have manually written a parser and driver pro-
gram that parses an LTL formula, calls the function create lgba, and outputs
the result. We have used this program to generate automata corresponding to
formulae ϕn that are representative of the verification of liveness properties un-
der fairness constraints4

ϕn ≡ ¬ ((GFp1 ∧ . . . ∧ GFpn ) =⇒ G(q =⇒ Fr))

for atomic propositions pi , q, and r.

We have compared our code with implementations of the algorithm of
Gerth et al. that are available in the tools Spin (http://spinroot.com) and
Wring (http://vlsi.colorado.edu). The run- Our code Spin Wring
ning times (in seconds, on a dual-core note- n = 5 30 > 1200 90
book computer with a 2.4GHz CPU and 2GB n = 6 540 > 1200 900
of RAM) for translating ϕn are shown in the
Table 1. Runtimes
table on the right. However, this comparison is
4
Fψ (“ﬁnally ψ”) is an abbreviation for ( U ψ; Gψ (“globally ψ”) denotes ¬F¬ψ.
Construction of Büchi Automata for LTL Model 435

not quite fair, because the other tools go on to translate the LGBA to ordinary
Büchi automata. We plan to formalise this additional (polynomial) translation
in the future, but take the present results as an indication that the execution
times of the implementation generated from Isabelle are not prohibitive.
We have used the LTL-to-Büchi translator testbench [17] for gaining addi-
tional conﬁdence in our program, including the hand-written driver. As expected,
our code passes all the tests.

5 Verifying the Automaton Construction

Our main motivation for implementing the algorithm in Isabelle is of course

the possibility to verify the correctness of our definitions. Assuming we trust
Isabelle’s proof kernel and its code generator, we obtain a verified program for
translating LTL formulae into LGBA. We outline the correctness proof in this
section. In fact, we must address two subproblems: we prove that the function
expand terminates on all arguments, and we show that a temporal interpretation
is accepted by the resulting LGBA iff it is a model of the input formula.

5.1 Termination

HOL is a logic of total functions, and it is essential for consistency to prove that
every function that we define terminates. Indeed, Isabelle inserts a termination
predicate in all theorems that involve a function whose termination has not been
proven. Termination of the expand function (cf. Fig. 2) is not obvious on first
sight but, remarkably, is not discussed at all in the original paper [7].
Consider Fig. 2. A call to expand is of the form expand (fs,n) ns. Now in
all cases of the definition, except the first one, some formula is removed from
fs, suggesting a well-founded ordering based on the size of the list fs. (This
observation is also true of the cases of the definition omitted in Fig. 2.)
However, that simple definition breaks down for the first case where argument
fs equals []. Indeed, the recursive call constructs a new node based on the
contents of the next field of the node n. In this case, the termination argument
must be based on the argument ns of the function call. The apparent difficulty
here is that this list does not become shorter on recursive calls, but (potentially)
longer, so it is not completely obvious how to define a well-founded order. The
solution here is to find a suitable upper bound for the argument ns. This can
be done using the fact that all the nodes that are ever constructed contain
subformulae of the input formula ϕ in their fields old and next, the same holds
for the argument fs of formulae to process, and no two different nodes containing
the same formulae in their old and next fields are ever constructed. It follows
that there are only finitely many possible nodes since there exist only finitely
many distinct sets of subformulae of ϕ. Very roughly speaking, the well-founded
order by which argument ns decreases is given by (LIM ϕ - ns) where LIM is
a function that calculates the appropriate upper bound given an LTL formula
ϕ. The actual definition of the upper bound, which appears in the definition
436 A. Schimpf, S. Merz, and J.-G. Smaus

of the ordering below, depends on the arguments of function expand, not the
formula ϕ.
The two orderings are combined lexicographically, that is to say, either the
argument ns decreases w.r.t. the ordering discussed above, or the ns argument
stays the same and there is a decrease on the fs argument.
The termination proof is complicated further by the fact that we have a nested
recursive call in the last case. This is obvious in line 27 in Fig. 1, but the let
expression in Fig. 2 amounts to the same. We therefore start oﬀ by showing
a partial termination property, which states that if expand terminates, then
nds ⊇ ns, where nds is the result computed by the inner call (see Fig. 2). This
partial result is then used to show that the arguments of the outer recursive call
are smaller according to the well-founded ordering explained above.
The termination order is formally deﬁned in Isabelle as follows:
abbreviation
"expand_term_ord ≡
inv_image (finite_psubset <*lex*> less_than)
(λ(n, ns). (nds_limit n ns - (old_next_pair ‘ set ns),
size_frml_list (fst n)))"

We explain this definition. The termination order compares pairs of the form
(n, ns) where n is a cnode and ns is a node list. This corresponds exactly to
the argument types of expand. The function λ(n, ns). . . . in the above definition
turns (n, ns) into another pair, say (st, sz), where st is given by the old and
next fields of all nodes in ns and subtracting those from the set of all possible
old and next fields—i.e., st states “how far ns is from the limit”. The second
argument sz is simply the length of the list appearing as the first component of
the pair n. To compare two pairs (n, ns) and (n , ns ), the function is used to
compute the corresponding (st, sz) and (st , sz ), and those pairs are compared
using a lexicographical combination of ⊆ and ≤.
The formal termination proof takes about 500 lines of Isar proof script.

5.2 Correctness

We now address the proper correctness proof of the algorithm, whose idea is
presented in the original paper [7]. We have to prove that the LGBA computed
by function create_lgba ϕ accepts precisely those temporal structures that are
a model of ϕ. Formally, this is expressed as the Isabelle theorem

theorem lgba_correct:
assumes "∀ i. ξ i ∈ Pow (set (get_props ϕ))"
shows "lgba_accept (create_lgba ϕ) ξ ←→ ξ |= ϕ".

The hypothesis of the theorem states that ξ is a temporal interpretation over

2Prop where Prop is the set of atomic propositions that occur in ϕ (cf. Sect. 2.2).
As explained in Sect. 3, the idea of the construction is to construct nodes that
“promise” certain formulae and to make sure that these promises are enforced
Construction of Büchi Automata for LTL Model 437

along any path starting at that node. However, the graph construction by itself
can ensure this only partly. For example, we can prove the following lemma
about “until” formulae promised by a node:
lemma L4_2a:
assumes "gba_path (gbauto (create_lgba ϕ)) σ"
and "f U g ∈ set (old (σ 0))"
shows "(∀ i. {f, f U g} ⊆ set (old (σ i))
∧ g ∈/ set (old (σ i)))
∨ (∃ j. (∀ i<j. {f, f U g} ⊆ set (old (σ i)))
∧ g ∈ set (old (σ j)))".
In other words, we know for any path that starts at a node promising formula
f U g that f and f U g are promised as long as g is not promised. However, we
cannot be sure that g will indeed be promised by some node along the path. We
defined the acceptance family precisely in a way to make sure that such paths
are non-accepting, and indeed we can prove the following stronger lemma about
the accepting paths starting at a node promising some formula f U g:
lemma L4_2b:
assumes "gba_path (gbauto (create_lgba ϕ)) σ"
and "f U g ∈ set (old (σ 0))"
and "gba_accept (gbauto (create_lgba ϕ)) σ"
shows "∃ j. (∀ i<j. {f, f U g} ⊆ set (old (σ i)))
∧ g ∈ set (old (σ j))"
The proof of theorem lgba_correct above relies on similar lemmas for each
temporal operator, and then proves by induction on the structure of LTL for-
mulae that all formulae promised along an accepting path indeed hold of the
corresponding suffix of the temporal interpretation. For the proof of the “if”
direction of theorem lgba_correct we inductively construct an accepting path
for any temporal interpretation satisfying a formula. The length of the overall
correctness proof is about 4500 lines of Isar proof script. The effort of working
out the Isabelle proofs was around four person months.

6 Conclusion
In this paper we have presented a formally verified definition of labelled gener-
alised Büchi automata in the interactive proof assistant Isabelle. Our formali-
sation is based on the classical algorithm by Gerth et al. [7], and Isabelle can
generate executable code from our definitions. In this way, we obtain a highly
trustworthy program for a critical component of a model checking engine for
LTL.
Few formalisations of similar translations have been studied in the literature.
Schneider [15] presents a HOL conversion for LTL that produces a symbolic
encoding of an LGBA, which can be used in connection with a symbolic (in par-
ticular BDD-based) model checker. In contrast, our implementation produces
438 A. Schimpf, S. Merz, and J.-G. Smaus

a full LGBA that can be used with explicit-state LTL model checkers. More-
over, it generates a stand-alone program that can be used independently of any
particular proof assistant. The second author [11] previously presented a formali-
sation of weak alternating automata (WAA [12]), including a translation of LTL
formulae into WAA. Due to their much richer combinatorial structure, WAA
afford a rather straightforward LTL translation of linear complexity, whereas
the translation into (generalised) Büchi automata is exponential. Indeed, the
main contribution of [11] was the formalisation of a game-theoretic argument
due to [10,18] that underlies a complementation procedure for WAA.
Since the translation of LTL formulae to Büchi automata is of exponential
complexity, one cannot expect to translate large formulae. Fortunately, the for-
mulae that express typical correctness properties of concurrent systems are quite
small. Although efficiency was not of much concern to us during the development
of our theories, our experiments so far indicate that the extracted program does
not behave significantly worse than existing implementations of the algorithm
of Gerth et al. Of course, several improvements to the code are possible. For
example, we could represent the sets of propositional interpretations labelling
the automaton states symbolically instead of through an explicit enumeration,
for example using a Boolean function that checks whether an interpretation is
consistent with the label. Optimisations at a lower level could be obtained by
replacing the list representation of finite sets with a more efficient data structure.
More significant optimisations could be achieved by basing the construction
on a different algorithm altogether. Although the construction of Gerth et al.
is well known and widely implemented, several alternative constructions have
been studied in the literature [3,16,6,5,8], and the algorithm presented in [6] is
widely considered to behave best in practice. This algorithm makes use of more
advanced automata-theoretic notions, including WAA and various simulation
relations on WAA and Büchi automata. These concepts have wider applications
than just the automata constructions used in model checkers, including the com-
plementation of ω-automata [9] and the synthesis of concurrent systems.
Encouraged by the success we have had so far, we would indeed like to for-
malise the construction of [6] in future work. Our current formalisation will
continue to serve as an important building block that contains essential, funda-
mental concepts.

References

1. Büchi, R.: On a decision method in restricted second-order arithmetic. In: Intl.

Cong. Logic, Methodology, and Philosophy of Science 1960, pp. 1–12. Stanford
University Press (1962)
2. Clarke, E.M., Grumberg, O., Peled, D.A.: Model Checking. MIT Press, Cambridge
(2002)
3. Daniele, M., Giunchiglia, F., Vardi, M.: Improved automata generation for linear
temporal logic. In: Halbwachs, N., Peled, D.A. (eds.) CAV 1999. LNCS, vol. 1633,
pp. 249–260. Springer, Heidelberg (1999)
Construction of Büchi Automata for LTL Model 439

4. Fitting, M.C.: Proof Methods for Modal and Intuitionistic Logic. Synthese Library:
Studies in Epistemology, Logic, Methodology and Philosophy of Science. D. Reidel,
Dordrecht (1983)
5. Fritz, C.: Constructing Büchi automata from linear temporal logic using simulation
relations for alternating Büchi automata. In: Ibarra, O.H., Dang, Z. (eds.) CIAA
2003. LNCS, vol. 2759, pp. 35–48. Springer, Heidelberg (2003)
6. Gastin, P., Oddoux, D.: Fast LTL to Büchi automata translation. In: Berry, G.,
Comon, H., Finkel, A. (eds.) CAV 2001. LNCS, vol. 2102, pp. 53–65. Springer,
Heidelberg (2001)
7. Gerth, R., Peled, D., Vardi, M.Y., Wolper, P.: Simple on-the-fly automatic ver-
ification of linear temporal logic. In: Dembinski, P., Sredniawa, M. (eds.) 15th
Intl. Symp. Protocol Specification, Testing, and Verification (PSTV 1996). IFIP
Conference Proceedings, vol. 38, pp. 3–18. Chapman & Hall, Boca Raton (1996)
8. Gurumurthy, S., Kupferman, O., Somenzi, F., Vardi, M.Y.: On complementing
nondeterministic Büchi automata. In: Geist, D., Tronci, E. (eds.) CHARME 2003.
LNCS, vol. 2860, pp. 96–110. Springer, Heidelberg (2003)
9. Kupferman, O., Vardi, M.: Complementation constructions for nondeterministic
automata on infinite words. In: Halbwachs, N., Zuck, L. (eds.) TACAS 2005. LNCS,
vol. 3440, pp. 206–221. Springer, Heidelberg (2005)
10. Kupferman, O., Vardi, M.Y.: Weak alternating automata are not that weak. ACM
Trans. Comput. Log. 2(3), 408–429 (2001)
11. Merz, S.: Weak alternating automata in Isabelle/HOL. In: Aagaard, M.D., Harri-
son, J. (eds.) TPHOLs 2000. LNCS, vol. 1869, pp. 424–441. Springer, Heidelberg
(2000)
12. Muller, D., Saoudi, A., Schupp, P.: Weak alternating automata give a simple expla-
nation of why most temporal and dynamic logics are decidable in exponential tim.
In: 3rd IEEE Symp. Logic in Computer Science (LICS 1988), Edinburgh, Scotland,
pp. 422–427. IEEE Press, Los Alamitos (1988)
13. Pnueli, A.: The temporal semantics of concurrent programs. Theoretical Computer
Science 13, 45–60 (1981)
14. Schimpf, A.: Implementierung eines Verfahrens zur Erzeugung von Büchi-
Automaten aus LTL-Formeln in Isabelle. Diplomarbeit, Albert-Ludwigs-
Universität Freiburg (2008), http://www.informatik.uni-freiburg.de/~ki/
papers/diplomarbeiten/schimpf-diplomarbeit-08.pdf
15. Schneider, K., Hoffmann, D.W.: A HOL conversion for translating linear time tem-
poral logic to ω-automata. In: Bertot, Y., Dowek, G., Hirschowitz, A., Paulin, C.,
Théry, L. (eds.) TPHOLs 1999. LNCS, vol. 1690, pp. 255–272. Springer, Heidelberg
(1999)
16. Somenzi, F., Bloem, R.: Efficient Büchi automata from LTL formulae. In: Halb-
wachs, N., Peled, D.A. (eds.) CAV 1999. LNCS, vol. 1633, pp. 257–263. Springer,
Heidelberg (1999)
17. Tauriainen, H., Heljanko, K.: Testing LTL formula translation into Büchi au-
tomata. International Journal on Software Tools for Technology Transfer 4(1),
57–70 (2002), http://www.tcs.hut.fi/Software/lbtt/
18. Thomas, W.: Complementation of Büchi automata revisited. In: Rozenberg, G.,
Karhumäki, J. (eds.) Jewels are forever, Contributions on Theoretical Computer
Science in Honor of Arto Salomaa, pp. 109–122. Springer, Heidelberg (2000)
19. Vardi, M.Y., Wolper, P.: Reasoning about infinite computations. Information and
Computation 115(1), 1–37 (1994)
A Hoare Logic for the State Monad
Proof Pearl

Wouter Swierstra

Chalmers University of Technology

[email protected]

Abstract. This pearl examines how to verify functional programs writ-

ten using the state monad. It uses Coq’s Program framework to provide
strong specifications for the standard operations that the state monad
supports, such as return and bind. By exploiting the monadic structure
of such programs during the verification process, it becomes easier to
prove that they satisfy their specification.

1 Introduction

Monads help structure functional programs. Yet proofs about monadic programs
often start by expanding the deﬁnition of return and bind. This seems rather
wasteful. If we exploit this structure when writing programs, why should we
discard it when writing proofs? This pearl examines how to verify functional
programs written using the state monad. It is my express aim to take advantage
of the monadic structure of these programs to guide the veriﬁcation process.
This pearl is a literate Coq script [15]. Most proofs have been elided from the
typeset version, but a complete development is available from my homepage.
Throughout this paper, I will assume that you are familiar with Coq’s syntax
and have some previous exposure to functional programming using monads [16].

2 The State Monad

Let me begin by motivating the state monad. Consider the following inductive
data type for binary trees:

Inductive Tree (a : Set) : Set :=

| Leaf : a → Tree a
| Node : Tree a → Tree a → Tree a.

Now suppose we want to deﬁne a function that replaces every value stored in a
leaf of such a tree with a unique integer, i.e., no two leaves in the resulting tree
should share the same label.
The obvious solution, given by the relabel function below, keeps track of a
natural number as it traverses the tree.

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 440–451, 2009.

c Springer-Verlag Berlin Heidelberg 2009
A Hoare Logic for the State Monad 441

Fixpoint relabel (a : Set) (t : Tree a) (s : nat) : Tree nat ∗ nat

:= match t with
| Leaf ⇒ (Leaf s, 1 + s)
| Node l r ⇒ let (l , s ) := relabel a l s
in let (r , s ) := relabel a r s
in (Node l r , s )
end.

The relabel function uses its argument number as the new label for the leaves.
To make sure that no two leaves get assigned the same number, the number
returned at a leaf is incremented. In the Node case, the number is threaded
through the recursive calls appropriately.
While this solution is correct, there is some room for improvement. It is all too
easy to pass the wrong number to a recursive call, thereby forgetting to update
the state. To preclude such errors, the state monad may be used to carry the
number implicitly as the tree is traversed.
For some ﬁxed type of state s : Set, the state monad is:

Deﬁnition State (a : Set) : Type := s → a ∗ s.

A computation in the state monad State a takes an initial state as its argument.
Using this initial state, it performs some computation yielding a pair consisting
of a value of type a and a ﬁnal state.
The two monadic operations, return and bind, are deﬁned as follows:

Deﬁnition return (a : Set) : a → State a := fun x s ⇒ (x , s).

Deﬁnition bind (a b : Set) : State a → (a → State b) → State b
:= fun c1 c2 s1 ⇒ let (x , s2 ) := c1 s1
in c2 x s2 .

The return function lifts any pure value into the state monad, leaving the state
untouched. Two computations may be composed using the bind function. It
passes both the state and the result arising from the first computation as argu-
ments to the second computation.
In line with the notation used in Haskell [11], I will use a pair of infix op-
erators to write monadic computations. Instead of bind, I will sometimes write
>>=, a right-associative infix operator. Secondly, I will write c1 >
> c2 instead of
bind c1 (fun ⇒ c2 ). This operator binds two computations, discarding the
intermediate result.
Besides return and bind, there are two other operations that may be used to
construct computations in the state monad:

Deﬁnition get : State s := fun s ⇒ (s, s).

Deﬁnition put : s → State unit := fun s ⇒ (tt, s).

The get function returns the current state, whereas put overwrites the current
state with its argument.
442 W. Swierstra

We can now redeﬁne the relabelling function to use the state monad as follows:

Fixpoint relabel (a : Set) (t : Tree a) : State nat (Tree nat)

:= match t with
| Leaf ⇒ get > >= fun n ⇒
put (S n) > >
return (Leaf n)
| Node l r ⇒ relabel l > >= fun l ⇒
relabel r >>= fun r ⇒
return (Node l r )
end.

Note that the type variable s has been instantiated to nat – the state carried
around by the relabelling function is a natural number. By using the state monad,
we no longer need to pass around this number by hand. This deﬁnition is less
error prone: all the ‘plumbing’ is handled by the monadic combinators.

3 The Challenge

How can we prove that this relabelling function is correct?

Before we can talk about correctness, we need to establish the specification
that we expect the relabel function to satisfy. One way of formulating the desired
specification is by defining the following auxiliary function that flattens a tree
to a list of labels:

Fixpoint ﬂatten (a : Set) (t : Tree a) : list a

:= match t with
| Leaf x ⇒ x :: nil
| Node l r ⇒ ﬂatten l + + ﬂatten r
end.

We will prove that for any tree t and number x , the list flatten (fst (relabel t x ))
does not have any duplicates. This property does not completely characterise
relabelling – we should also check that the argument tree has the same shape as
the resulting tree. This is relatively straightforward to verify as the relabelling
function clearly maps leaves to leaves and nodes to nodes. Proving that the
resulting tree satisfies the proposed specification, however, is not so easy.

4 Decorating the State Monad

The relabel function in the previous section is simply typed. We can certainly use
proof assistants such as Coq to formalise equational proofs about such functions.
In this paper, however, I will take a slightly diﬀerent approach.
In this paper I will use strong speciﬁcations, i.e., the type of the relabel function
will capture information about its behaviour. Simultaneously completing the
A Hoare Logic for the State Monad 443

function definition and the proof that this definition satisfies its specification
yields programs that are correct by construction. This approach to verification
can be traced back to Martin-Löf [6].
To give a strong specification of the relabelling function, we decorate com-
putations in the state monad with additional propositional information. Recall
that the state monad is defined as follows:

Deﬁnition State (a : Set) : Type := s → a ∗ s.

We can refine this definition slightly: instead of accepting any initial state of type
s, the initial state should satisfy a given precondition. Furthermore, instead of
returning any pair, the resulting pair should satisfies a postcondition relating
the initial state, resulting value, and final state. Bearing these two points in
mind, we arrive at the following definition of a state monad enriched with Hoare
logic [2, 3].

Deﬁnition Pre : Type := s → Prop.

Deﬁnition Post (a : Set) : Type := s → a → s → Prop.
Program Deﬁnition HoareState (pre : Pre) (a : Set) (post : Post a) : Set
:= forall i : {t : s | pre t }, {(x , f ) : a ∗ s | post i x f }.

Coq uses the notation {x : a | p x } for strong speciﬁcations. Such a speciﬁcation

is inhabited by a pair consisting of a value x of type a, together with a proof
that x satisfies the property p.
The code presented here uses Coq’s Program framework [12, 13]. Defining a
function that manipulates strong specifications using the Program framework
is a two stage process: once we have defined the computational fragment, we
are presented with a series of proof obligations that must be fulfilled before the
Program framework can generate a complete Coq term. When defining the com-
putational fragment, we do not have to manipulate proofs, but rather focus on
programming with the first components of the strong specifications. In fact, the
definition of the HoareState type above already uses one aspect of the Program
framework: a projection is silently inserted to extract a value of type s from the
variable i in the postcondition.
Although we have defined the HoareState type, we still need to define the
return and bind functions. The return function does not place any restriction on
the input state; it simply returns its second argument, leaving the state intact:

Deﬁnition top : Pre := fun s ⇒ True.

Program Deﬁnition return (a : Set)
: forall x , HoareState top a (fun i y f ⇒ i = f ∧ y = x )
:= fun x s ⇒ (x , s).

This definition of the return is identical to the original definition of the state
monad: we have only made its behaviour evident from its type. The Program
framework automatically discharges the trivial proofs necessary to complete the
definition.
444 W. Swierstra

The corresponding revision of bind is a bit more subtle. Recall that the bind
of the state monad has the following type.
State a → (a → State b) → State b
You might expect the definition of the revised bind function to have a type of
the form:
HoareState P1 a Q1 → (a → HoareState P2 b Q2 ) → HoareState ... b ...
Before we consider the precondition and postcondition of the resulting compu-
tation, note that we can generalise this slightly. In the above type signature, the
second argument of the bind function is not dependent. We can parametrise P2
and Q2 by the result of the first computation:
HoareState P1 a Q1
→ (forall (x : a), HoareState (P2 x ) b (Q2 x ))
→ HoareState ... b ...
This generalisation allows the pre- and postconditions of the second computation
to refer to the results of the first computation.
Now we need to choose a suitable precondition and postcondition for the com-
posite computation returned by the bind function. To motivate the choice of pre-
and postcondition, recall that the bind of the state monad is defined as follows:
Definition bind (a b : Set) : State a → (a → State b) → State b
:= fun c1 c2 s1 ⇒ let (x , s2 ) := c1 s1
in c2 x s2 .
The bind function starts by running the first computation, and subsequently
feeds its result to the second computation. So clearly the precondition of the
composite computation should imply the precondition of the first computation c1
– otherwise we could not justify running c1 with the initial state s1 . Furthermore
the postcondition of the first computation should imply the precondition of the
second computation – if this wasn’t the case, we could not give grounds for the
call to c2 . These considerations lead to the following choice of precondition for
the composite computation:
fun s1 ⇒ P1 s1 ∧ forall x s2 , Q1 s1 x s2 → P2 x s2
What about the postcondition? Recall that a postcondition is a relation be-
tween the initial state, resulting value, and the final state. We would expect the
postcondition of both argument computations to hold after executing the com-
posite computation resulting from a call to bind. This composite computation,
however, cannot refer to the initial state passed to the second computation or
the results of the first computation: it can only refer to its own initial state
and results. To solve this we existentially quantify over the results of the first
computation, yielding the below postcondition for the bind operation.
fun s1 y s3 ⇒ exists x , exists s2 , Q1 s1 x s2 ∧ Q2 x s2 y s3
A Hoare Logic for the State Monad 445

In words, the postcondition of the composite computation states that there is an

intermediate state s2 and a value x resulting from the first computation, such
that these satisfy the postcondition of the first computation Q1 . Furthermore, the
postcondition of the second computation Q2 relates these intermediate results
to the final state s3 and the final value y.
Once we have chosen the desired precondition and postcondition of bind, its
definition is straightforward:

Program Deﬁnition bind : forall a b P1 P2 Q1 Q2 ,

(HoareState P1 a Q1 ) →
(forall (x : a), HoareState (P2 x ) b (Q2 x )) →
HoareState (fun s1 ⇒ P1 s1 ∧ forall x s2 , Q1 s1 x s2 → P2 x s2 )
b
(fun s1 y s3 ⇒ exists x , exists s2 , Q1 s1 x s2 ∧ Q2 x s2 y s3 )
:= fun a b P1 P2 Q1 Q2 c1 c2 s1 ⇒
match c1 s1 with
(x , s2 ) ⇒ c2 x s2
end.

This deﬁnition does give rise to two proof obligations: the intermediate state
s2 must satisfy the precondition of the second computation c2 ; the application
c2 x s2 must satisfy the postcondition of bind. Both these obligations are fairly
straightforward to prove.
Before we have another look at the relabel function, we redeﬁne the two aux-
iliary functions get and put to use the HoareState type:

Program Deﬁnition get : HoareState top s (fun i x f ⇒ i = f ∧ x = i)

:= fun s ⇒ (s, s).
Program Deﬁnition put (x : s) : HoareState top unit (fun f ⇒ f = x)
:= fun ⇒ (tt, x ).

Both functions have the trivial precondition top. The postcondition of the get
function guarantees that it will return the current state without modifying it.
The postcondition of the put function declares that the ﬁnal state is equal to
put’s argument.

5 Relabelling Revisited
Finally, we return to the original question: how can we prove that the relabel
function satisfies its specification?
Using the HoareState type, we now arrive at the definition of the relabelling
function presented in Figure 1. The function definition of relabel is identical to
the version using the state monad in Section 3. The only novel aspect is the
choice of pre- and postcondition.
As we do not need any assumptions about the initial state, we choose the
trivial precondition top. The postcondition uses two auxiliary functions, size and
446 W. Swierstra

Fixpoint size (a : Set) (t : Tree a) : nat :=

match t with
| Leaf x ⇒ 1
| Node l r ⇒ size l + size r
end.
Fixpoint seq (x n : nat) : list nat :=
match n with
| 0 ⇒ nil
| S k ⇒ x :: seq (S x ) k
end.
Program Fixpoint relabel (a : Set) (t : Tree a) :
HoareState nat top
(Tree nat)
(fun i t f ⇒ f = i + size t ∧ ﬂatten t = seq i (size t))
:= match t with
| Leaf x ⇒ get > >= fun n ⇒
put (n + 1) > >
return (Leaf n)
| Node l r ⇒ relabel l >>= fun l ⇒
relabel r >>= fun r ⇒
return (Node l r )
end.

Fig. 1. The revised deﬁnition of the relabel function

seq, and consists of two parts. First of all, the final state should be exactly size t
larger than the initial state, where t refers to the resulting tree. Furthermore,
when the relabelling function is given an initial state i, flattening t should yield
the sequence i, i + 1, ...i + size t .
This definition gives rise to two proof obligations, one for each branch of the
pattern match in the relabel function. In the Leaf case, the proof obligation is
trivial. It is discharged automatically by the Program framework. To solve the
remaining obligation, we need to apply several tactics to trigger β-reduction
and introduce the assumptions. After giving the variables in the context more
meaningful names, we arrive at the proof state in Figure 2.
To complete the proof, we must prove that the postcondition holds for the
tree Node l r under the assumption that it holds for recursive calls to l and
r . The first part of the conjunction follows immediately from the assumptions
finalRes, sizeR, and sizeL and the associativity of addition. The second part of
the conjunction is a bit more interesting. After applying the induction hypothe-
ses, flattenL and flattenR, the remaining goal becomes:

=================================
seq i (size l ) +
+ seq lState (size r ) = seq i (size l + size r )
A Hoare Logic for the State Monad 447

1 subgoal
i : nat
t : Tree nat
n : nat
l : Tree nat
lState : nat
sizeL : lState = i + size l
flattenL : flatten l = seq i (size l )
r : Tree nat
rState : nat
sizeR : rState = lState + size r
flattenR : flatten r = seq lState (size r )
finalState : rState = n
finalRes : t = Node l r
============================
n = i + size t ∧ flatten t = seq i (size t)

Fig. 2. Proving the obligation of the relabelling function

To complete the proof we need to use the assumption sizeL. If we had chosen the
obvious postcondition ﬂatten t = seq i (size t ) we would not have been able to
complete this proof. Once we apply sizeL we can use one last lemma to complete
the proof:

Lemma SeqSplit : forall y x z , seq x (y + z ) = seq x y +

+ seq (x + y) z .

This lemma is easy to prove by induction on y.

It is interesting to note that extracting a Haskell program from this revised
relabelling function yields the same extracted code as the original deﬁnition of
relabel in Section 2. As Coq’s extraction mechanism discards propositional infor-
mation, using the HoareState type does introduce any computational overhead.

6 Wrapping It Up

Now suppose we need to show that relabel satisfies a weaker postcondition. For
instance, consider the NoDup predicate on lists from the Coq standard libraries.
A list satisfies the NoDup predicate if it does not contain duplicates. The predi-
cate’s definition is given below.

Inductive NoDup : list a → Prop :=

| NoDup nil : NoDup nil
| NoDup cons : forall x xs, x ∈ xs → NoDup xs → NoDup (x :: xs).
448 W. Swierstra

How can we prove that the tree resulting from a call to the relabelling function
satisfies NoDup (flatten t )?
We cannot define a relabelling function that has this postcondition – the in-
duction hypotheses are insufficient to complete the required proofs in the Node
case. We can, however, weaken the postcondition and strengthen the precondi-
tion explicitly. In line with Hoare Type Theory [10, 9, 8], we call this operation
do:
Program Definition do (s a : Set) (P1 P2 : Pre s) (Q1 Q2 : Post s a) :
(forall i, P2 i → P1 i) → (forall i x f , Q1 i x f → Q2 i x f ) →
HoareState s P1 a Q1 → HoareState s P2 a Q2
:= fun c ⇒ c.
This function has no computational content. It merely changes the precondition
and postcondition associated with a computation in the HoareState type. We
can now define the final version of the relabelling function as follows:
Program Fixpoint finalRelabel (a : Set) (t : Tree a) :
HoareState (top nat) (Tree nat) (fun i t f ⇒ NoDup (flatten t ))
:= do (relabel a t ).
The precondition is unchanged. As a result, the first argument to the do func-
tion is trivial. To complete this definition, however, we need to prove that the
postcondition can be weakened appropriately. This proof boils down to showing
that the list seq i (size t ) does not have any duplicates. Using one last lemma,
forall n x y, x < y → ¬In x (seq y n), we complete the proof.

7 Discussion
Related Work
This pearl draws inspiration from many different sources. Most notably, it is
inspired by recent work on Hoare Type Theory [10, 9, 8]. Ynot, the implemen-
tation of Hoare Type Theory in Coq, postulates the existence of return, bind,
and do to use Hoare logic to reason about functions that use mutable references.
This paper shows how these functions may be defined in Coq, rather than postu-
lated. Furthermore, the HoareState type generalises their presentation somewhat:
where Hoare Type Theory has specifically been designed to reason about muta-
ble references, this pearl shows that the HoareState type can be used to reason
about any computation in the state monad.
The relabelling problem is taken from Hutton and Fulger [4], who give an
equational proof. Their proof, however, revolves around defining an intermediate
function relabel that carries around an (infinite) list of fresh labels.
relabel : forall a b, Tree a → State (list b) (Tree b)
To prove that relabel meets the required specification, Hutton and Fulger prove
various lemmas relating relabel and relabel. It is not clear how their proof tech-
niques can be adapted to other functions in the state monad.
A Hoare Logic for the State Monad 449

Similar techniques have been used by Leroy [5] in the Compcert project. His
solution, however, revolves around deﬁning an auxiliary data type:

Inductive Res (a : Set) (t : s) : Set :=

| Error : Res a t
| OK : a → forall (t : s), R t t → Res a t .

Where R is some relation between states. Unfortunately, the bind of this monad
yields less efficient extracted code, as it requires an additional pattern match on
the Res resulting from the first computation. Using the HoareState type, it may be
possible to rule out errors by strengthening the precondition, thereby eliminating
the need for this additional pattern match. Furthermore, the HoareState type
presented here is slightly more general as its postcondition may also refer to the
result of the computation.
Similar monadic structures to the one presented here have appeared in the ver-
ification of the seL4 microkernel [1] and security protocol verification [14]. There
are a few differences between these approaches and the development presented
here. Firstly, the postconditions presented here are ternary relations between
the initial state, result, and final state. As a result, we do not need to introduce
auxiliary variables to relate intermediate results. Sprenger and Basin [14] con-
struct a Hoare logic on top of a weakest-precondition calculus. They present a
shallow embedding of a series of logical rules that describe how the return and
bind behave. On the other hand, Cock et al. [1] present their rules are presented
as predicate transformers, using Isabelle/HOL’s verification condition generator
to infer the weakest precondition of a computation. The approach taken here fo-
cuses on programming with strong specifications in type theory, where the type
of a computation fixes the desired pre- and postcondition.

Further Work
I have not provided justiﬁcation for the choice of pre- and postcondition of bind
and return. Other choices are certainly possible. For instance, we could choose
the following type for return:

forall x , HoareState top a (fun i y f ⇒ True)

Clearly this is a bad choice – applying the return function will no longer yield
any information about the computation. It would be interesting to investigate if
the choices presented here are somehow canonical, for instance, by showing that
the HoareState type forms a monad in some category of strong speciﬁcations.
McKinna’s thesis [7] on the categorical structure of strong speciﬁcations may
form the starting point for such research.
Using the HoareState type to write larger programs will lead to larger proof
obligations. For this approach to scale, it is important to provide a suitable set
of custom tactics to alleviate the burden of proof. Some tactics that are already
provided by the Program framework proved useful in the development presented
here, but further automation might still be necessary.
450 W. Swierstra

Acknowledgements. I would like to thank Matthieu Sozeau for developing Coq’s

Program framework and for helping me to use it. Both Matthieu and Jean-
Philippe Bernardy provided invaluable feedback on a draft version of this paper.
I would like to thank Thorsten Altenkirch, Peter Hancock, Graham Hutton, and
James McKinna for their useful suggestions. Finally, I would like to thank the
anonymous referees for their extremely helpful reviews.

References
[1] Cock, D., Klein, G., Sewell, T.: Secure microkernels, state monads and scalable
refinement. In: Munoz, C., Ait, O. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp.
167–182. Springer, Heidelberg (2008)
[2] Floyd, R.W.: Assigning meanings to programs. Mathematical Aspects of Com-
puter Science 19 (1967)
[3] Hoare, C.A.R.: An axiomatic basis for computer programming. Communications
of the ACM 12(10), 576–580 (1969)
[4] Hutton, G., Fulger, D.: Reasoning about effects: seeing the wood through the trees.
In: Proceedings of the Ninth Symposium on Trends in Functional Programming
(2008)
[5] Leroy, X.: Formal certification of a compiler back-end, or: programming a com-
piler with a proof assistant. In: POPL 2006: 33rd Symposium on Principles of
Programming Languages, pp. 42–54. ACM Press, New York (2006)
[6] Martin-Löf, P.: Constructive mathematics and computer programming. In: Pro-
ceedings of a discussion meeting of the Royal Society of London on Mathematical
logic and programming languages, pp. 167–184. Prentice-Hall, Inc., Englewood
Cliffs (1985)
[7] McKinna, J.: Deliverables: a categorical approach to program development in type
theory. Ph.D thesis, School of Informatics at the University of Edinburgh (1992)
[8] Nanevski, A., Morrisett, G.: Dependent type theory of stateful higher-order func-
tions. Technical Report TR-24-05, Harvard University (2005)
[9] Nanevski, A., Morrisett, G., Birkedal, L.: Polymorphism and separation in Hoare
Type Theory. In: ICFP 2006: Proceedings of the Eleventh ACM SIGPLAN Inter-
nation Conference on Functional Programming (2006)
[10] Nanevski, A., Morrisett, G., Shinnar, A., Govereau, P., Birkedal, L.: Ynot: Rea-
soning with the awkward squad. In: ICFP 2008: Proceedings of the Twelfth ACM
SIGPLAN International Conference on Functional Programming (2008)
[11] Peyton Jones, S. (ed.): Haskell 98 Language and Libraries: The Revised Report.
Cambridge University Press, Cambridge (2003)
[12] Sozeau, M.: Subset coercions in Coq. In: Altenkirch, T., McBride, C. (eds.)
TYPES 2006. LNCS, vol. 4502, pp. 237–252. Springer, Heidelberg (2007)
A Hoare Logic for the State Monad 451

[13] Sozeau, M.: Un environnement pour la programmation avec types dépendants.

Ph.D thesis, Université de Paris XI (2008)
[14] Sprenger, C., Basin, D.: A monad-based modeling and veriﬁcation toolbox with
application to security protocols. In: Schneider, K., Brandt, J. (eds.) TPHOLs
2007. LNCS, vol. 4732, pp. 302–318. Springer, Heidelberg (2007)
[15] The Coq development team. The Coq proof assistant reference manual. LogiCal
Project, Version 8.2 (2008)
[16] Wadler, P.: The essence of functional programming. In: POPL 1992: Conference
Record of the Nineteenth Annual ACM SIGPLAN-SIGACT Symposium on Prin-
ciples of Programming Languages, pp. 1–14 (1992)
Certiﬁcation of Termination Proofs Using CeTA

René Thiemann and Christian Sternagel

Institute of Computer Science, University of Innsbruck, Austria

{rene.thiemann,christian.sternagel}@uibk.ac.at

Abstract. There are many automatic tools to prove termination of term

rewrite systems, nowadays. Most of these tools use a combination of
many complex termination criteria. Hence generated proofs may be of
tremendous size, which makes it very tedious (if not impossible) for hu-
mans to check those proofs for correctness.
In this paper we use the theorem prover Isabelle/HOL to automat-
ically certify termination proofs. To this end, we first formalized the
required theory of term rewriting including three major termination
criteria: dependency pairs, dependency graphs, and reduction pairs. Sec-
ond, for each of these techniques we developed an executable check which
guarantees the correct application of that technique as it occurs in the
generated proofs. Moreover, if a proof is not accepted, a readable error
message is displayed. Finally, we used Isabelle’s code generation facilities
to generate a highly efficient and certified Haskell program, CeTA, which
can be used to certify termination proofs without even having Isabelle
installed.

1 Introduction
Termination provers for term rewrite systems (TRSs) became more and more
powerful in the last years. One reason is that a proof of termination no longer
is just some reduction order which contains the rewrite relation of the TRS.
Currently, most provers construct a proof in the dependency pair framework
which allows to combine basic termination techniques in a ﬂexible way. Then
a termination proof is a tree where at each node a speciﬁc technique has been
applied. So instead of stating the precedence of some lexicographic path order
(LPO) or giving some polynomial interpretation, current termination provers
return proof trees which reach sizes of several megabytes. Hence, it would be too
much work to check by hand whether these trees really form a valid proof.
That we cannot blindly trust the output of termination provers is regularly
demonstrated: Every now and then some tool delivers a faulty proof for some
TRS. But most often this is only detected if there is some other prover giving
the opposite answer on the same TRS, i.e., that it is nonterminating. To solve
this problem, in the last years two systems have been developed which auto-
matically certify or reject a generated termination proof: CiME/Coccinelle [4,6]
and Rainbow/CoLoR [3] where Coccinelle and CoLoR are libraries on rewriting for

This research is supported by FWF (Austrian Science Fund) project P18763.

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 452–468, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Certiﬁcation of Termination Proofs Using CeTA 453

Coq (http://coq.inria.fr), and CiME and Rainbow are used to convert proof
trees into Coq-proofs which heavily rely on the theorems within those libraries.

CiME/Rainbow Coq + Coccinelle/CoLoR

proof tree proof.v accept/failure

In this paper we present a new combination, CeTA/IsaFoR, to automatically

certify termination proofs. Note that the system design has two major differ-
ences in comparison to the two existing ones. First, our library IsaFoR (Isabelle
Formalization of Rewriting, containing 173 definitions, 863 theorems, and 269
functions) is written for the theorem prover Isabelle/HOL1 [16] and not for Coq.
Second, and more important, instead of generating for each proof tree a new
Coq-proof using the auxiliary tools CiME/Rainbow, our library IsaFoR contains
several executable “check”-functions (within Isabelle) for each termination tech-
nique we formalized. We have formally proven that whenever such a check is
accepted, then the termination technique is applied correctly. Hence, we do not
need to create an individual Isabelle-proof for each proof tree, but just call the
“check”-function for checking the whole tree (which does nothing else but calling
the separate checks for each termination technique occurring in the tree). This
second difference has several advantages:

• In the other two systems, whenever a proof is not accepted, the user just
gets a Coq-error message that some step in the generated Coq-proof failed. In
contrast, our functions deliver error messages using notions of term rewriting.
• Since the analysis of the proof trees in IsaFoR is performed by executable
functions, we can just apply Isabelle’s code-generator [11] to create a certiﬁed
Haskell program [17], CeTA, leading to the following workﬂow.

Isabelle Haskell compiler

IsaFoR Haskell program CeTA
CeTA
proof tree accept/error message

Hence, to use our certifier CeTA (Certified Termination Analysis) you do not
have to install any theorem prover, but just execute some binary. Moreover,
the runtime of certification is reduced significantly. Whereas the other two
approaches take more than one hour to certify all (≤ 580) proofs during
the last certified termination competition, CeTA needs less than two minutes
for all (786) proofs that it can handle. Note that CeTA can also be used for
modular certification. Each single application of a termination technique can
be certified—just call the corresponding Haskell-function.

Concerning the techniques that have been formalized, the other two systems
oﬀer techniques that are not present in IsaFoR, e.g., LPO or matrix interpreta-
tions. Nevertheless, we also feature one new technique that has not been certiﬁed
1
In the remainder of this paper we just write Isabelle instead of Isabelle/HOL.
454 R. Thiemann and C. Sternagel

so far. Whereas currently only the initial dependency graph estimation of [1] has
been certified, we integrated the most powerful estimation which does not re-
quire tree automata techniques and is based on a combination of [9,12] where
the function tcap is required. Initial problems in the formalization of tcap led to
the development of etcap, an equivalent but more efficient version of tcap which
is also beneficial for termination provers. Replacing tcap by etcap within the
termination prover TTT2 [14] reduced the time to estimate the dependency graph
by a factor of 2. We will also explain, how to reduce the number of edges that
have to be inspected when checking graph decompositions.
Another benefit of our system is its robustness. Every proof which uses weaker
techniques than those formalized in IsaFoR is accepted. For example, termination
provers can use the graph estimation of [1], as it is subsumed by our estimation.
The paper is structured as follows. In Sect. 2 we recapitulate the required
notions and notations of term rewriting and the dependency pair framework
(DP framework). Here, we also introduce our formalization of term rewriting
within IsaFoR. In Sect. 3–6 we explain our certification of the four termination
techniques we currently support: dependency pairs (Sect. 3), dependency graph
(Sect. 4), reduction pairs (Sect. 5), and combination of proofs in the dependency
pair framework (Sect. 6). However, to increase readability we abstract from our
concrete Isabelle code and present the checks for the techniques on a higher
level. How we achieved readable error-messages while at the same time having
maintainable Isabelle proofs is the topic of Sect. 7. We conclude in Sect. 8 where
we show how CeTA is created from IsaFoR and where we give experimental data.
IsaFoR, CeTA, and all details about our experiments are available at CeTA’s
website http://cl-informatik.uibk.ac.at/software/ceta.

2 Formalizing Term Rewriting

We assume some basic knowledge of term rewriting [2]. Variables are denoted
by x, y, z, etc., function symbols by f , g, h, etc., terms by s, t, u, etc., and
substitutions by σ, μ, etc. Instead of f (t1 , . . . , tn ) we write f (tn ). The set of all
variables occurring in term t is denoted by Var(t). By T (F , V) we denote the set
of terms over function symbols from F and variables from V.
In the following we give an overview of our formalization of term rewriting
in IsaFoR. Our main concern is termination of rewriting. This property—also
known as strong normalization—can be stated without considering the struc-
ture of terms. Therefore it is part of our Isabelle theory AbstractRewriting.
An abstract rewrite system (ARS) is represented by the type (’a×’a)set in Is-
abelle. Strong normalization (SN) of a given ARS A is equivalent to the absence
of an infinite sequence of A-steps. On the lowest level we have to link our notion
of strong normalization to the notion of well-foundedness as defined in Isabelle.
This is an easy lemma since the only difference is the orientation of the relation,
i.e., SN(A) = wf(A−1 ). At this point we can be sure that our notion of strong
normalization is valid.
Certification of Termination Proofs Using CeTA 455

Now we come to the level of ﬁrst-order terms (in theory Term):

datatype (’f,’v)"term" = Var ’v | Fun ’f "(’f,’v)term list"
Many concepts related to terms are formalized in Term, e.g., an induction scheme
for terms (as used in textbooks), substitutions, contexts, the (proper) subterm
relation etc.
By restricting the elements of some ARS to terms, we reach the level of TRSs
(in theory Trs), which in our formalization are just binary relations over terms.

Example 1. As an example, consider the following TRS, encoding rules for sub-
traction and division on natural numbers.

minus(x, 0) → x div(0, s(y)) → 0

minus(s(x), s(y)) → minus(x, y) div(s(x), s(y)) → s(div(minus(x, y), s(y)))

Given a TRS R, (, r) ∈ R means that is the lhs and r the rhs of a rule in
R (usually written as → r ∈ R). The rewrite relation induced by a TRS R is
denoted by →R and has the following deﬁnition in IsaFoR:

Deﬁnition 2. Term s rewrites to t by R, iﬀ there are a context C, a substitu-

tion σ, and a rule → r ∈ R such that s = C[σ] and t = C[rσ].
Note that this section contains the only parts where you have to trust our for-
malization, i.e., you have to believe that SN(→R ) as deﬁned in IsaFoR really
describes “R is terminating.”

3 Certifying Dependency Pairs

Before we introduce dependency pairs [1] formally and give some details about
our Isabelle formalization, we recapitulate the ideas that led to the final definition
(including a refinement proposed by Dershowitz [7]).
For a TRS R, strong normalization means that there is no infinite derivation
t1 →R t2 →R t3 →R · · · . Additionally we can concentrate on derivations, where
t1 is minimal in the sense that all its proper subterms are terminating. Such
terms are called minimal nonterminating. The set of all minimally nonterminat-
ing terms with respect to a TRS R is denoted by TR∞ . Observe that for every
term t ∈ TR∞ there is an initial part of an infinite derivation having a specific
shape: A (possibly empty) derivation taking place below the root, followed by an
>ε∗ ε
application of some rule → r ∈ R at the root, i.e., t →R σ →R rσ, for some
substitution σ. Furthermore, since rσ is nonterminating, there is some subterm
u of r, such that uσ ∈ TR∞ , i.e., rσ = C[uσ]. Then the same reasoning can be
used to get a root reduction of uσ, . . . , cf. [1].
To get rid of the additional contexts C a new TRS, DP(R), is built.
456 R. Thiemann and C. Sternagel

Deﬁnition 3. The set DP(R) of dependency pairs of R is deﬁned as follows:

For every rule → r ∈ R, and every subterm u of r such that u is not a proper
subterm of and such that the root of u is deﬁned,2 → u is contained in
DP(R). Here t is the same as t except that the root of t is marked with the
special symbol %.

Example 4. The dependency pairs for the TRS from Ex. 1 consist of the rules

M(s(x), s(y)) → M(x, y) (MM) D(s(x), s(y)) → M(x, y) (DM)

D(s(x), s(y)) → D(minus(x, y), s(y)) (DD)
where we write M instead of minus and D instead of div for brevity.

Note that after switching to ’%’-terms, the derivation from above can be written
as t →∗R σ →DP(R) u σ. Hence every nonterminating derivation starting at
a term t ∈ TR∞ can be transformed into an inﬁnite derivation of the following
shape where all →DP(R) -steps are applied at the root.

t →∗R s1 →DP(R) t1 →∗R s2 →DP(R) t2 →∗R · · · (1)

Therefore, to prove termination of R it suﬃces to prove that there is no such

derivation. To formalize DPs in Isabelle we modify the signature such that every
function symbol now appears in a plain version and in a %-version.
datatype ’f shp = Sharp ’f ("_%") | Plain ’f ("_@")
Sharping a term is done via
fun plain :: "(’f,’v)term => (’f shp,’v)term"
where "plain(Var x) = Var x"
| "plain(Fun f ss) = Fun f@ (map plain ss)"

fun sharp :: "(’f,’v)term => (’f shp,’v)term"

where "sharp(Var x) = Var x"
| "sharp(Fun f ss) = Fun f% (map plain ss)"
Thus t in Def. 3 is the same as sharp(t). Since the function symbols in DP(R)
are of type ’f shp and the function symbols of R are of type ’f, it is not possible
to use the same TRS R in combination with DP(R). Thus, in our formalization
we use the lifting ID—that just applies plain to all lhss and rhss in R.
Considering this technicalities and omitting the initial derivation t →∗R s1
from the derivation (1), we obtain

s1 →DP(R) t1 →∗ID(R) s2 →DP(R) t2 →∗ID(R) · · ·

and hence a so called inﬁnite (DP(R), ID(R))-chain. Then the corresponding DP

problem (DP(R), ID(R)) is called to be not finite, cf. [8]. Notice that in IsaFoR a
DP problem is just a pair of two TRSs over arbitrary signatures—similar to [8].
2
A function symbol f is defined (w.r.t. R) if there is some rule f (. . . ) → r ∈ R.
Certification of Termination Proofs Using CeTA 457

In IsaFoR an infinite chain3 and finite DP problems are defined as follows.

fun ichain where "ichain(P, R) s t σ = (∀i.
(s i,t i) ∈ P ∧ (t i)·(σ i) →∗R (s(i+1))·(σ(i+1))"

fun finite_dpp where

"finite_dpp(P, R) = (¬(∃s t σ. ichain (P, R) s t σ))"
where ‘t · σ’ denotes the application of substitution σ to term t.
We formally established the connection between strong normalization and
ﬁniteness of the initial DP problem (DP(R), ID(R)). Although this is a well-
known theorem, formalizing it in Isabelle was a major eﬀort.

Theorem 5. wf_trs(R) ∧ finite_dpp(DP(R), ID(R)) −→ SN(→R ).

The additional premise wf_trs(R) ensures two well-formedness properties for

R, namely that for every → r ∈ R, is not a variable and that Var(r) ⊆ Var().
At this point we can obviously switch from the problem of proving SN(→R )
for some TRS R, to the problem of proving finite_dpp(DP(R), ID(R)), and
thus enter the realm of the DP framework [8]. Here, the current technique is to
apply so-called processors to a DP problem, in order to get a set of simpler DP
problems. This is done recursively, until the leafs of the so built tree consist of DP
problems with empty P-components (and therefore are trivially ﬁnite). For this
to be correct, the applied processors need to be sound, i.e., every processor P roc
has to satisfy the implication

(∀p ∈ P roc(P, R). finite_dpp(p)) −→ finite_dpp(P, R)

for every input. The termination techniques that will be introduced in the fol-
lowing sections are all such (sound) processors.
So much to the underlying formalization. Now we will present how the check
in IsaFoR certifies a set of DPs P that was generated by some termination tool
for some TRS R. To this end, the function checkDPs is used.
checkDPs(P,R) = checkWfTRS(R) ∧ computeDPs(R) ⊆ P
Here checkWfTRS checks the two well-formedness properties mentioned above
(the difference between wf_trs and checkWfTRS is that only the latter is exe-
cutable) and computeDPs uses Def. 3, which is currently the strongest definition
of DPs. To have a robust system, the check does not require that exactly the set
of DPs w.r.t. to Def. 3 is provided, but any superset is accepted. Hence we are
also able to accept proofs from termination tools that use a weaker definition of
DP(R). The soundness result of checkDPs is formulated as follows in IsaFoR.

Theorem 6. If checkDPs(P, R) is accepted then ﬁniteness of (P, ID(R)) im-

plies SN(→R ).
3
We also formalized minimal chains, but here only present chains for simplicity.
458 R. Thiemann and C. Sternagel

4 Certifying the Dependency Graph Processor

One important processor to prove finiteness of a DP problem is based on the
dependency graph [1,8]. The dependency graph of a DP problem (P, R) is a
directed graph G = (P, E) where (s → t, u → v) ∈ E iff s → t, u → v is a
(P, R)-chain. Hence, every infinite (P, R)-chain corresponds to an infinite path
in G and thus, must end in some strongly connected component (SCC) S of G,
provided that P contains only finitely many DPs. Dropping the initial DPs of
the chain results in an infinite (S, R)-chain.4 Hence, if for all SCCs S of G the DP
problem (S, R) is finite, then (P, R) is finite. In practice, this processor allows
to prove termination of each block of mutual recursive functions separately.
To certify an application of the dependency graph processor there are two
main challenges. First of all, we have to certify that a valid SCC decomposition
of G is used, a purely graph-theoretical problem. Second, we have to generate the
edges of G. Since the dependency graph G is in general not computable, usually
estimated graphs G are used which contain all edges of the real dependency
graph G. Hence, for the second problem we have to implement and certify one
estimation of the dependency graph.
Notice that there are various estimations around and that the result of an
SCC decomposition depends on the estimation that is used. Hence, it is not
a good idea to implement the strongest estimation and then match the result
of our decomposition against some given decomposition: problems arise if the
termination prover used a weaker estimation and thus obtained larger SCCs.
Therefore, in the upcoming Sect. 4.1 about graph algorithms we just speak
of decompositions where the components do not have to be SCCs. Moreover, we
will also elaborate on how to minimize the number of tests (s → t, u → v) ∈ E.
The reason is that in Sect. 4.2 we implemented one of the strongest dependency
graph estimations where the test for an edge can become expensive. In Sect. 4.3
we finally show how to combine the results of Sections 4.1 and 4.2.

4.1 Certifying Graph Decompositions

Instead of doing an SCC decomposition of a graph within IsaFoR we base our
check on the decomposition that is provided by the termination prover. Essen-
tially, we demand that the set of components is given as a list C1 , . . . , Ck in
topological order where the component with no incoming edges is listed first.
Then we aim at certifying that every infinite path must end in some Ci . Note
that the general idea of taking the topological sorted list as input was already
publicly mentioned at the “Workshop on the certification of termination proofs”
in 2007. In the following we present how we filled the details of this general idea.
The main idea is to ensure that all edges (p, q) ∈ E correspond to a step
forward in the list C1 , . . . , Ck , i.e., (p, q) ∈ Ci × Cj where i ≤ j. However,
iterating over all edges of G will be costly, because it requires to perform the test
(p, q) ∈ E for all possible edges (p, q) ∈ P × P. To overcome this problem we do
not iterate over the edges but over P. To be more precise, we check that
4
We identify an SCC S with the set of nodes S within that SCC.
Certification of Termination Proofs Using CeTA 459

∀(p, q) ∈ P × P. (∃i ≤ j. (p, q) ∈ Pi × Pj ) ∨ (p, q) ∈

/E (2)

where the latter part of the disjunction is computed only on demand. Thus, only
those edges have to be computed, which would contradict a valid decomposition.

Example 7. Consider the set of nodes P = {(DD), (DM), (MM)}. Suppose that
we have to check a decomposition of P into L = {(DD)}, {(DM)}, {(MM)} for
some graph G = (P, E). Then our check has to ensure that the dashed edges in
the following illustration do not belong to E.

(DD) (DM) (MM)

It is easy to see that (2) is satisfied for every list of SCCs that is given
in topological order. What is even more important, whenever there is a valid
SCC decomposition of G, then (2) is also satisfied for every subgraph. Hence,
regardless of the dependency graph estimation a termination prover might have
used, we accept it, as long as our estimation delivers less edges.
However, the criterion is still too relaxed, since we might cheat in the input
by listing nodes twice. Consider P = {p1 , . . . , pm } where the corresponding
graph is arbitrary and L = {p1 }, . . . , {pm }, {p1}, . . . , {pm }. Then trivially (2)
is satisfied, because we can always take the source of edge (pi , pj ) from the first
part of L and the target from the second part of L. To prevent this kind of
problem, our criterion demands that the sets Ci in L are pairwise disjoint.
Before we formally state our theorem, there is one last step to consider, namely
the handling of singleton nodes which do not form an SCC on their own. Since
we cannot easily infer at what position these nodes have to be inserted in the
topological sorted list—this would amount to do an SCC decomposition on our
own—we demand that they are contained in the list of components.5
To distinguish a singleton node without an edge to itself from a “real SCC”, we
require that the latter ones are marked. Then condition (2) is extended in a way
that unmarked components may have no edge to themselves. The advantage of
not marking a component is that our IsaFoR-theorem about graph decomposition
states that every infinite path will end in some marked component, i.e., here the
unmarked components can be ignored.

Theorem 8. Let L = C1 , . . . , Ck be a list of sets of nodes, some of them

marked, let G = (P, E) be a graph, let α be an inﬁnite path of G. If

• ∀(p, q) ∈ P × P. (∃i < j. (p, q) ∈ Ci × Cj ) ∨ (∃i. (p, q) ∈ Ci × Ci ∧

Ci is marked) ∨ (p, q) ∈
/ E and
• ∀i = j. Ci ∩ Cj = ∅

then there is some suﬃx β of α and some marked Ci such that all nodes of β
belong to Ci .
5
Note that Tarjan’s SCC decomposition algorithm produces exactly this list.
460 R. Thiemann and C. Sternagel

Example 9. If we continue with example Ex. 7 where only components {(MM)}

and {(DD)} are marked, then our check also analyzes that G contains no edge
from (DM) to itself. If it succeeds, every inﬁnite path will in the end only contain
nodes from {(DD)} or only nodes from {(MM)}. In this way, only 4 edges of G
have to be calculated instead of analyzing all 9 possible edges in P × P.

4.2 Certifying Dependency Graph Estimations

What is currently missing to certify an application of the dependency graph

processor, is to check, whether a singleton edge is in the dependency graph or
not. Hence, we have to estimate whether the sequence s → t, u → v is a chain,
i.e., whether there are substitutions σ and μ such that tσ →∗R uμ. An obvious
solution is to just look at the root symbols of t and u—if they are different there
is no way that the above condition is met (since all the steps in tσ →∗R uμ
take place below the root, by construction of the dependency pairs). Although
efficient and often good enough, there are more advanced estimations around.
The estimation EDG [1] first replaces via an operation cap all variables and
all subterms of t which have a defined root-symbol by distinct fresh variables.
Then if cap(t) and u do not unify, it is guaranteed that there is no edge.
The estimation EDG∗ [12] does the same check and additionally uses the re-
versed TRS R−1 = {r → | → r ∈ R}, i.e., it uses the fact that tσ →∗R uμ
implies uμ →∗R−1 tσ and checks whether cap(u) does not unify with t. Of course
in the application of cap(u) we have to take the reversed rules into account (pos-
sibly changing the set of defined symbols) and it is not applicable if R contains
a collapsing rule → x where x ∈ V.
The last estimation we consider is based on a better version of cap, called tcap
[9]. It only replaces subterms with defined symbols by a fresh variable, if there
is a rule that unifies with the corresponding subterm.

Deﬁnition 10. Let R be a TRS.

• tcap(f (tn )) = f (tcap(t1 ), . . . , tcap(tn )) iﬀ f (tcap(t1 ), . . . , tcap(tn )) does not

unify with any variable renamed left-hand side of a rule from R
• tcap(t) is a fresh variable, otherwise

To illustrate the difference between cap and tcap consider the TRS of Ex. 1
and t = div(0, 0). Then cap(t) = xfresh since div is a defined symbol. However,
tcap(t) = t since there is no division rule where the second argument is 0.
Apart from tree automata techniques, currently the most powerful estimation
is the one based on tcap looking both forward as in EDG and backward as in
EDG∗ . Hence, we aimed to implement and certify this estimation in IsaFoR.
Unfortunately, when doing so, we had a problem with the domain of variables.
The problem was that although we first implemented and certified the standard
unification algorithm of [15], we could not directly apply it to compute tcap. The
reason is that to generate fresh variables as well as to rename variables in rules
apart, we need a type of variables with an infinite domain. One solution would
Certification of Termination Proofs Using CeTA 461

have been to constrain the type of variables where there is a function which
delivers a fresh variable w.r.t. any given finite set of variables.
However, there is another and more efficient approach to deal with this prob-
lem than the standard approach to rename and then do unification. Our solution
is to switch to another kind of terms where instead of variables there is just one
special constructor “” representing an arbitrary fresh variable. In essence, this
data structure represents contexts which do not contain variables, but where mul-
tiple holes are allowed. Therefore in the following we speak of ground-contexts
and use C, D, . . . to denote them.

Deﬁnition 11. Let C be the equivalence class of a ground-context C where the
holes are ﬁlled with arbitrary terms: C = {C[t1 , . . . , tn ] | t1 , . . . , tn ∈ T (F , V)}.

Obviously, every ground-context C can be turned into a term t which only con-
tains distinct fresh variables and vice-versa. Moreover, every unification problem
between t and can be formulated as a ground-context matching problem be-
tween C and , which is satisfiable iff there is some μ such that μ ∈ C.
Since the result of tcap is always a term which only contains distinct fresh
variables, we can do the computation of tcap using the data structure of ground-
contexts; it only requires an algorithm for ground-context matching. To this end
we first generalize ground-context matching problems to multiple pairs (Ci , i ).

Deﬁnition 12. A ground-context matching problem is a set of pairs M =

{(C1 , 1 ), . . . , (Cn , n )}. It is solvable iﬀ there is some μ such that i μ ∈ Ci
for all 1 ≤ i ≤ n. We sometimes abbreviate {(C, )} by (C, ).

To decide ground-context matching we devised a specialized algorithm which is

similar to standard uniﬁcation algorithms, but which has some advantages: it
does neither require occur-checks as the uniﬁcation algorithm, nor is it necessary
to preprocess the left-hand sides of rules by renaming (as would be necessary for
standard tcap). And instead of applying substitutions on variables, we just need
a basic operation on ground-contexts called merge such that merge(C, D) = ⊥
implies C ∩ D = ∅, and merge(C, D) = E implies C ∩ D = E.

Deﬁnition 13. The following rules simplify a ground-context matching problem

into solved form (where all terms are distinct variables) or into ⊥.

(a) M ∪ {(, )} ⇒match M

(b) M ∪ {(f (Dn ), f (un ))} ⇒match M ∪ {(D1 , u1 ), . . . , (Dn , un )}
(c) M ∪ {(f (Dn ), g(uk ))} ⇒match ⊥ if f = g or n = k
(d) M ∪ {(C, x), (D, x)} ⇒match M ∪ {(E, x)} if merge(C, D) = E
(e) M ∪ {(C, x), (D, x)} ⇒match ⊥ if merge(C, D) = ⊥

Rules (a–c) obviously preserve solvability of M (where ⊥ represents an unsolv-

able matching problem). For Rules (d,e) we argue as follows:
462 R. Thiemann and C. Sternagel

{(C, x), (D, x)} ∪ . . . is solvable iﬀ

• there is some μ such that xμ ∈ C and xμ ∈ D and . . . iff
• there is some μ such that xμ ∈ C ∩ D and . . . iff
• there is some μ such that xμ ∈ merge(C, D) and . . . iff
• {(merge(C, D), x)} ∪ . . . is solvable

Since every ground-context matching problem in solved form is solvable, we

have devised a decision procedure. It can be implemented in two stages where
the ﬁrst stage just normalizes by the Rules (a–c), and the second stage just
applies the Rules (d,e). It remains to implement merge.

merge(, C) ⇒merge C
merge(C, ) ⇒merge C
merge(f (C n ), g(D k )) ⇒merge ⊥ if f = g or n = k
merge(f (C n ), f (Dn )) ⇒merge f (merge(C1 , D1 ), . . . , merge(Cn , Dn ))
f (. . . , ⊥, . . .) ⇒merge ⊥

Note that our implementations of the matching algorithm and the merge func-
tion in IsaFoR are slightly diﬀerent due to diﬀerent data structures. For example
matching problems are represented as lists of pairs, so it may occur that we have
duplicates in M. The details of our implementation can be seen in IsaFoR (theory
Edg) or in the source of CeTA.
Soundness and completeness of our algorithms are proven in IsaFoR.

Theorem 14. • If merge(C, D) ⇒∗merge ⊥ then C ∩ D = ∅.

• If merge(C, D) ⇒∗merge E then C ∩ D = E.
• If (C, ) ⇒∗match ⊥ then there is no μ such that μ ∈ C.
• If (C, ) ⇒∗match M where M is in solved form, then there exists some μ such
that μ ∈ C.

Using ⇒match , we can now easily reformulate tcap in terms of ground-context

matching which results in the eﬃcient implementation etcap.

Deﬁnition 15. • etcap(f (tn )) = f (etcap(t1 ), . . . , etcap(tn )) iﬀ

(f (etcap(t1 ), . . . , etcap(tn )), ) ⇒∗match ⊥ for all rules → r ∈ R.
• , otherwise

One can also reformulate the desired check to estimate the dependency graph
whether tcap(t) does not unify with u in terms of etcap. It is the same requirement
as demanding (etcap(t), u) ⇒∗match ⊥. Again, the soundness of this estimation
has been proven in IsaFoR where the second part of the theorem is a direct
consequence of the ﬁrst part by using the soundness of the matching algorithm.

Theorem 16. (a) Whenever tσ →∗R s then s ∈ etcap(t).

(b) Whenever tσ →∗R uτ then (etcap(t), u) ⇒∗match ⊥ and (etcap(u), t) ⇒∗match ⊥
where etcap(u) is computed w.r.t. the reversed TRS R−1 .
Certiﬁcation of Termination Proofs Using CeTA 463

4.3 Certifying Dependency Graph Decomposition

Eventually we can connect the results of the previous two subsections to obtain
one function to check a valid application of the dependency graph processor.

checkDepGraphProc(P, L, R) = checkDecomposition(checkEdg(R), P, L)

where checkEdg just applies the criterion of Thm. 16 (b).

In IsaFoR the soundness result of our check is proven.
Theorem 17. If checkDepGraphProc(P, L, R) is accepted and if for all P ∈ L
where P is marked, the DP problem (P , R) is finite, then (P, R) is finite.
To summarize, we have implemented and certified the currently best depen-
dency graph estimation which does not use tree automata techniques. Our check-
function accepts any decomposition which is based on a weaker estimation, but
requires that the components are given in topological order. Since our algorithm
computes edges only on demand, the number of tests for an edge is reduced con-
siderably. For example, the five largest graphs in our experiments contain 73,100
potential edges, but our algorithm only has to consider 31,266. This reduced the
number of matchings from 13 millions down to 4 millions.
Furthermore, our problem of not being able to generate fresh variables or to
rename variables in rules apart led to a more efficient algorithm for tcap based on
matching instead of unification: simply replacing tcap by etcap in TTT2 reduced
the time for estimating the dependency graph by a factor of two.

5 Certifying the Reduction Pair Processor

One important technique to prove finiteness of a DP problem (P, R) is the so-
called reduction pair processor. The general idea is to use a well-founded order
where all rules of P ∪ R are weakly decreasing. Then we can delete all strictly
decreasing rules from P and continue with the remaining dependency pairs.
We first state a simplified version of the reduction pair processor as it is
introduced in [8], where we ignore the usable rules refinement.
Theorem 18. If all the following properties are satisfied, then finiteness of (P \
, R) implies finiteness of (P, R).
(a) is a well-founded and stable order
(b) is a stable and monotone quasi-order
(c) ◦ ⊆ and ◦ ⊆
(d) P ⊆ ∪ and R ⊆
Of course, to instantiate the reduction pair processor with a new kind of reduc-
tion pair, e.g., LPO, polynomial orders,. . . , we first have to prove the first three
properties for that kind of reduction pairs. Since we plan to integrate many re-
duction pairs, but only want to write the reduction pair processor once, we tried
to minimize these basic requirements such that the reduction pair processor still
remains sound in total. In the end, we replaced the first three properties by:
464 R. Thiemann and C. Sternagel

(a) is a well-founded and stable relation

(b) is a stable and monotone relation
(c) ◦ ⊆

In this way, for every new class of reduction pairs, we do not have to prove
transitivity of or anymore, as it would be required for Thm. 18. Currently, we
just support reduction pairs based on polynomial interpretations with negative
constants [13], but we plan to integrate other reduction pairs in the future.
For checking an application of a reduction pair processor we implemented
a generic function checkRedPairProc in Isabelle, which works as follows. It
takes as input two functions checkS and checkNS which have to approximate a
reduction pair, i.e., whenever checkS(s, t) is accepted, then s t must hold in
the corresponding reduction pair and similarly, checkNS has to guarantee s t.
Then checkRedPairProc(checkS, checkNS, P, P , R) works as follows:

• iterate once over P to divide P into P and P where the former set contains
all pairs of P where checkS is accepted
• ensure for all s → t ∈ R∪P that checkNS(s, t) is accepted, otherwise reject
• accept if P ⊆ P , otherwise reject

The corresponding theorem in IsaFoR states that a successful application of

checkRedPairProc(. . . , P, P , R) proves that (P, R) is finite whenever (P , R) is
finite. Obviously, the first two conditions of checkRedPairProc ensure condition
(d) of Thm. 18. Note, that it is not required that all strictly decreasing pairs are
removed, i.e., our checks may be stronger than the ones that have been used in
the termination provers.

6 Certifying the Whole Proof Tree

From Sect. 3–5 we have basic checks for the three techniques of applying depen-
dency pairs (checkDPs), the dependency graph processor (checkDepGraphProc),
and the reduction pair processor (checkRedPairProc). For representing proof
trees within the DP framework we used the following data structures in IsaFoR.
datatype ’f RedPair = NegPolo "(’f × (cint × nat list))list"

datatype (’f,’v)DPProof = . . . 6
| PisEmpty
| RedPairProc "’f RedPair" "(’f,’v)trsL" "(’f,’v)DPProof"
| DepGraphProc "((’f,’v)DPProof option × (’f,’v)trsL)list"

datatype (’f,’v)TRSProof = . . . 6
| DPTrans "(’f shp,’v)trsL" "(’f shp,’v)DPProof"

6
CeTA supports even more techniques, cf. CeTA’s website for a complete list.
Certiﬁcation of Termination Proofs Using CeTA 465

The first line fixes the format for reduction pairs, i.e., currently of (linear)
polynomial interpretations where for every symbol there is one corresponding en-
try. E.g., the list [(f, (−2, [0, 3]))] represents the interpretation where Pol(f )(x, y)
= max(−2 + 3y, 0) and Pol(g)(x1 , . . . , xn ) = 1 + Σ1≤i≤n xi for all f = g.
The datatype DPProof represents proof trees for DP problems. Then the check
for valid DPProofs gets as input a DP problem (P, R) and a proof tree and tries to
certify that (P, R) is finite. The most basic technique is the one called PisEmpty,
which demands that the set P is empty. Then (P, R) is trivially finite.
For an application of the reduction pair processor, three inputs are required.
First, the reduction pair redp, i.e., some polynomial interpretation. Second, the
dependency pairs P that remain after the application of the reduction pair
processor. Here, the datatype trsL is an abbreviation for lists of rules. And
third, a proof that the remaining DP problem (P , R) is finite. Then the checker
just has to call createRedPairProc(redp, P, P , R) and additionally calls itself
recursively on (P , R). Here, createRedPairProc invokes checkRedPairProc
where checkS and checkNS are generated from redp.
The most complex structure is the one for decomposition of the (estimated)
dependency graph. Here, the topological list for the decomposition has to be
provided. Moreover, for each subproblem P , there is an optional proof tree.
Subproblems where a proof is given are interpreted as “real SCCs” whereas the
ones without proof remain unmarked for the function checkDepGraphProc.
The overall function for checking proof trees for DP problems looks as follows.
checkDPProof(P,R,PisEmpty) = (P = [])
checkDPProof(P,R,(RedPairProc redp P prf)) =
createRedPairProc(redp,P,P ,R) ∧ checkDP(P ,R,prf)
checkDPProof(P,R,DepGraphProc P s) =
checkDepGraphProc(P,map
! (λ(prfO,P ).(isSome prfO,P )) P s, R)
∧ (Some prf,P )∈P s checkDPProof(P ,R,prf)

Theorem 19. If checkDPProof(P,R,prf) is accepted then (P, R) is ﬁnite.

Using checkDPProof it is now easy to write the ﬁnal method checkTRSProof
for proving termination of a TRS, where computeID is an implementation of ID.
checkTRSProof(R,DPTrans P prf) =
checkDPs(R,P) ∧ checkDPProof(P,computeID(R),prf)
For the external usage of CeTA we developed a well documented XML-format,
cf. CeTA’s website. Moreover, we implemented two XML parsers in Isabelle, one
that transforms a given TRS into the internal format, and another that does
the same for a given proof. The function certifyProof, ﬁnally, puts everything
together. As input it takes two strings (a TRS and its proof). Then it applies
the mentioned parsers and afterwards calls checkTRSProof on the result.
Theorem 20. If certifyProof(R,prf) is accepted then SN(→R ).
To ensure that the parser produces the right TRS, after the parsing process it
is checked that when converting the internal data-structures of the TRS back to
466 R. Thiemann and C. Sternagel

XML, we get the same string as the input string for the TRS (modulo white-
space). This is a major benefit in comparison to the two other approaches where
it can and already has happened that the uncertified components Rainbow/CiME
produced a wrong proof goal from the input TRS, i.e., they created a termination
proof within Coq for a different TRS than the input TRS.

7 Error Messages
To generate readable error messages, our checks do not have a Boolean return
type, but a monadic one (isomorphic to ’e option). Here, None represents an
accepted check whereas Some e represents a rejected check with error message e.
The theory ErrorMonad contains several basic operations like >> for conjunction
of checks, <- for changing the error message, and isOK for testing acceptance.
Using the error monad enables an easy integration of readable error messages.
For example, the real implementation of checkTRSProof looks as follows:
fun checkTRSProof where "checkTRSProof R (DPTrans P prf) = (
checkDPs R P
<- (λs. ’’error . . .’’ @ showTRS R @ ’’. . .’’ @ showTRS P @ s)
>> checkDPProof P (computeID R) prf
<- (λs. ’’error below switch to dependency pairs’’ @ s))"
However, since we do not want to adapt the proofs every time the error mes-
sages are changed, we setup the Isabelle simpliﬁer such that it hides the details of
the error monad, but directly removes all the error handling and turns monadic
checks via isOK(...) into Boolean ones using the following lemmas.
lemma "isOK(m >> n) = isOK(m) ∧ isOK(n)"
lemma "isOK(m <- s) = isOK(m)"
Then for example isOK(checkTRSProof R (DPTrans P prf)) directly simpli-
ﬁes to isOK(checkDPs R P) ∧ isOK(checkDPProof P (computeID R) prf).

8 Experiments and Conclusion

Isabelle’s code-generator is invoked to create CeTA from IsaFoR. To compile CeTA

one auxiliary hand-written Haskell file CeTA.hs is needed, which just reads two
files (one for the TRS, one for the proof) and then invokes certifyProof.
We tested CeTA (version 1.03) using TTT2 as termination prover (TC and TC+ ).
Here, TTT2 uses only the techniques of this paper in the combination TC, whereas
in TC+ all supported techniques are tried, including usable rules and nontermi-
nation. We compare to CiME/Coccinelle using AProVE [10] or CiME [5] as provers
(ACC,CCC), and to Rainbow/CoLoR using AProVE or Matchbox [18] (ARC,MRC)
where we take the results of the latest certified termination competition in Nov
20087 involving 1391 TRSs from the termination problem database.
7
http://termcomp.uibk.ac.at/
Certification of Termination Proofs Using CeTA 467

We performed our experiments using a PC with a 2.0 GHz processor running

Linux where both TTT2 and CeTA where aborted after 60 seconds. The following
table summarizes our experiments and the termination competition results.

TC TC+ ACC CCC ARC MRC

proved / disproved 401 / 0 572 / 214 532 / 0 531 / 0 580 / 0 458 / 0
certiﬁed 391 / 0 572 / 214 437 / 0 485 / 0 558 / 0 456 / 0
rejected 10 0 3 0 0 2
cert. timeouts 0 0 92 46 22 0
total cert. time 33s 113s 6212s 6139s 7004s 3602s

The 10 proofs that CeTA rejected are all for nonterminating TRSs which do
not satisfy the variable condition. Since TC supports only polynomial orders as
reduction pairs, it can handle less TRSs than the other combinations. But, there
are 44 TRSs which are only solved by TC (and TC+ ), the reason being the time-
limit of 60 seconds (19 TRSs), the dependency graph estimation (8 TRSs), and
the polynomial order allowing negative constants (17 TRSs).
The second line clearly shows that TC+ (with nontermination and usable rules
support) currently is the most powerful combination with 786 certified proofs.
Moreover, TC+ can handle 214 nonterminating and 102 terminating TRSs where
none of ACC, CCC, ARC, and MRC were successful. The efficiency of CeTA is also
clearly visible: the average certification time in TC and TC+ for a single proof is
by a factor of 50 faster than in the other combinations.8
For more details on the experiments we refer to CeTA’s website.
To conclude, we presented a modular and competitive termination certifier,
CeTA, which is directly created from our Isabelle library on term rewriting, IsaFoR.
Its main features are that CeTA is available as a stand-alone binary, the efficiency,
the dependency graph estimation, nontermination and usable rules support, the
error handling, and the robustness.
As each sub-check for a termination technique can be called separately, and as
our check to certify a whole termination proof just invokes these sub-checks, it
seems possible to integrate other techniques (even if they are proved in a different
theorem prover) as long as they are available as executable code. However, we
will need a common proof format and a compatible definition.
As future work we plan to certify several other termination techniques where
we already made progress in the formalization of semantic labeling and the
subterm-criterion. We would further like to contribute to a common proof format.

8
Note that in the experiments above, for each TRS, each combination might have
certified a different proof. In an experiment where the certifiers where run on the
same proofs for each TRS (using only techniques that are supported by all certifiers,
i.e., EDG and linear polynomials without negative constants), CeTA was even 190
times faster than the other approaches and could certify all 358 proofs, whereas
each of the other two approaches failed on more than 30 proofs due to timeouts.
468 R. Thiemann and C. Sternagel

References
1. Arts, T., Giesl, J.: Termination of term rewriting using dependency pairs. Theo-
retical Computer Science 236, 133–178 (2000)
2. Baader, F., Nipkow, T.: Term Rewriting and All That. Cambridge University Press,
Cambridge (1998)
3. Blanqui, F., Delobel, W., Coupet-Grimal, S., Hinderer, S., Koprowski, A.: CoLoR,
a Coq library on rewriting and termination. In: Proc. WST 2006, pp. 69–73 (2006)
4. Contejean, E., Courtieu, P., Forest, J., Pons, O., Urbain, X.: Certification of au-
tomated termination proofs. In: Konev, B., Wolter, F. (eds.) FroCos 2007. LNCS,
vol. 4720, pp. 148–162. Springer, Heidelberg (2007)
5. Contejean, E., Marché, C., Monate, B., Urbain, X.: CiME, http://cime.lri.fr
6. Courtieu, P., Forest, J., Urbain, X.: Certifying a termination criterion based on
graphs, without graphs. In: Mohamed, O.A., Muñoz, C., Tahar, S. (eds.) TPHOLs
2008. LNCS, vol. 5170, pp. 183–198. Springer, Heidelberg (2008)
7. Dershowitz, N.: Termination dependencies. In: Proc. WST 2003, pp. 27–30 (2003)
8. Giesl, J., Thiemann, R., Schneider-Kamp, P.: The dependency pair framework:
Combining techniques for automated termination proofs. In: Baader, F., Voronkov,
A. (eds.) LPAR 2004. LNCS (LNAI), vol. 3452, pp. 301–331. Springer, Heidelberg
(2005)
9. Giesl, J., Thiemann, R., Schneider-Kamp, P.: Proving and disproving termina-
tion of higher-order functions. In: Gramlich, B. (ed.) FroCos 2005. LNCS (LNAI),
vol. 3717, pp. 216–231. Springer, Heidelberg (2005)
10. Giesl, J., Schneider-Kamp, P., Thiemann, R.: AProVE 1.2: Automatic termination
proofs in the DP framework. In: Furbach, U., Shankar, N. (eds.) IJCAR 2006.
LNCS (LNAI), vol. 4130, pp. 281–286. Springer, Heidelberg (2006)
11. Haftmann, F.: Code generation from Isabelle/HOL theories (April 2009),
http://isabelle.in.tum.de/doc/codegen.pdf
12. Hirokawa, N., Middeldorp, A.: Automating the dependency pair method. Informa-
tion and Computation 199(1-2), 172–199 (2005)
13. Hirokawa, N., Middeldorp, A.: Tyrolean Termination Tool: Techniques and features.
Information and Computation 205(4), 474–511 (2007)
14. Korp, M., Sternagel, C., Zankl, H., Middeldorp, A.: Tyrolean Termination Tool 2.
In: Proc. RTA 2009. LNCS, vol. 5595, pp. 295–304 (2009)
15. Martelli, A., Montanari, U.: An efficient unification algorithm. ACM Transactions
on Programming Languages and Systems 4(2), 258–282 (1982)
16. Nipkow, T., Paulson, L.C., Wenzel, M.T.: Isabelle/HOL. LNCS, vol. 2283. Springer,
Heidelberg (2002)
17. Peyton Jones, S., et al.: The Haskell 98 language and libraries: The revised report.
Journal of Functional Programming 13(1)–255 (2003)
18. Waldmann, J.: Matchbox: A tool for match-bounded string rewriting. In: van Oost-
rom, V. (ed.) RTA 2004. LNCS, vol. 3091, pp. 85–94. Springer, Heidelberg (2004)
A Formalisation of Smallfoot in HOL

Thomas Tuerk

University of Cambridge Computer Laboratory

William Gates Building, JJ Thomson Avenue, Cambridge CB3 0FD, United Kingdom
http://www.cl.cam.ac.uk

Abstract. In this paper a general framework for separation logic inside

the HOL theorem prover is presented. This framework is based on Ab-
stract Separation Logic. It contains a model of an abstract, imperative
programming language as well as an abstract specification logic for this
language. While the formalisation mainly follows the original definition
of Abstract Separation Logic, it contains some additional features. Most
noticeably is added support for procedures.
As a case study, the framework is instantiated to build a tool that is
able to parse Smallfoot specifications and verify most of them completely
automatically. In contrast to Smallfoot this instantiation can handle the
content of data-structures as well as their shape. This enables it to verify
fully functional specifications. Some noteworthy examples that have been
verified are parallel mergesort and an interactive filter-function for single
linked lists.

1 Motivation
Separation logic is an extension of Hoare logic that allows local reasoning [7, 9].
It is used to reason about mutable data structures in combination with low
level imperative programming languages that use pointers and explicit memory
management. Thanks to local reasoning, it scales better than classical Hoare
logic to the verification of large programs and can easily be used to reason
about parallelism. There are several implementations: Smallfoot [2], SLAyer1
and SpaceInvader [5] are probably some of the best know examples. Moreover,
there are formalisations inside theorem provers [1, 6, 10, 11].
The problem, as I see it, is that all these tools and formalisations focus on one
concrete setting. They fix the programming languages, their exact semantics, the
supported specifications etc. However, there are a lot of different possible design
choices and the tools differ in these. I’m therefore building a general framework
for separation logic in HOL that can be instantiated to a variety of different
separation logics. By building such a framework, I hope to be able to concentrate
on the essence of separation logic as well as keeping the formalisation clean and
easy.
In this paper, the results of these efforts to build a separation logic framework
in HOL are presented. The framework is based on Abstract Separation Logic [4],
1
http://research.microsoft.com/SLAyer/

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 469–484, 2009.

c Springer-Verlag Berlin Heidelberg 2009
470 T. Tuerk

an abstract, high level variant of separation logic. It consists of both an abstract,

imperative programming language and an abstract specification logic for this
language. Both the abstract language and the specification logic are designed to
be instantiated to a concrete programming language and a concrete language for
specifications.
As a case study, I instantiated this framework to build a tool similar to Small-
foot [2], one of the oldest and best documented separation logic tools. Smallfoot
is able to automatically prove specifications about programs written in a sim-
ple, low-level imperative language with support for parallelism. The tool, called
Holfoot, combines ideas from Abstract Separation Logic, Variables as Resource
in Hoare Logic [8] and Smallfoot. It is able to parse Smallfoot-specifications
and prove nearly all of them completely automatically inside the HOL theorem
prover. In addition to Smallfoot, specifications can talk about the content of
data-structures as well as their shape. Proving the resulting fully functional spec-
ifications exploits the fact that Holfoot is implemented inside HOL. All existing
libraries and proof tools can be used, while a substantial amount of automation
is still available to reason about the structure of the program.
Reasoning about the data-content as well as the shape of data-structures is
one of the main challenges of separation logic tools at the moment. To help the
communication within the community and in general to further the progress of
the field, a benchmark collection called A Heap of Problems 2 was created. It
collects interesting examples, usually with at least a natural-language descrip-
tion, a C-implementation and some pseudo-code. Often implementations for a
specific separation logic tool are available as well.Moreover, there are proofs of
the examples using different tools and techniques.
Here, I would like to highlight just two of these benchmark examples: merge-
sort, whose verification needs some knowledge about orderings and permutations,
and filtering of a single linked list, whose iterative version uses a very complicated
loop invariant. Both examples can easily be verified using Holfoot. The tool is
able to reason automatically about the shape part of the problem, leaving the
user to reason about properties of the data-content, i. e. about the essence of
these algorithms. Fully functional specifications of simpler algorithms like re-
versing or copying of a single-linked-list, determining its length or a recursive
filter function can even be verified completely automatically. For more exam-
ples and discussions about them, please have a lock at the A Heap of Problems
webpage.
It took considerable effort to build this framework and instantiate it. This
work cannot be presented here in detail due to space limitations. Therefore,
the next section, will present a high level view on Holfoot. It is intended to
give a glimpse of the features and power of this tool. Semantic foundations and
implementation details are not discussed. This high level presentation of Holfoot
is followed by a detailed description of the formalisation of Abstract Separation
Logic in HOL. This description explains the semantic background of Holfoot.
However, it is barely scratched, how the Abstract Separation Logic framework is
2
http://wiki.heap-of-problems.org.
A Formalisation of Smallfoot in HOL 471

instantiated to build Holfoot. The paper ends with a section about future work
and some conclusions.

2 Formalisation of Smallfoot

Smallfoot [2] is one of the oldest and best documented separation logic tools.
It is able to automatically prove specifications about programs written in a
simple, low-level imperative language, which is designed to resemble C. This
language contains pointers, local and global variables, dynamic memory alloca-
tion/deallocation, conditional execution, while-loops and recursive procedures
with call-by-value and call-by-reference arguments. Moreover, there is support
for parallelism with conditional critical regions that synchronise the access to so-
called resources. Smallfoot-specifications are concerned with the shape of mem-
ory. Common specifications, for example, say that some stack-variable points to
a single linked list in memory. However, nothing is e. g. said about the length of
the list or about its data-content.
Smallfoot comes with a selection of example specifications. There are com-
mon algorithms about single linked lists like copying, reversing or deallocating
them. Another set of examples contains similar algorithms for trees. There is an
implementation of mergesort, some code about queues, circular-lists, buffers and
similar examples. Holfoot3 is able to parse Smallfoot-specifications and prove
most of the mentioned examples completely automatically inside the HOL the-
orem prover.
While some features like local variables or procedures with call-by-value ar-
guments took some effort, and while it turned out to be useful to use explicit
permission for stack-variables, it was nevertheless possible to formalise Small-
foot based on Abstract Separation Logic in a natural way. As far as I know,
this is the first time Abstract Separation Logic has been used to implement a
separation logic tool. The formalisation of Smallfoot illustrates that Abstract
Separation Logic is powerful and flexible enough to model languages and spec-
ifications used by well-known separation logic tools. Moreover, it demonstrates
that it is possible to automate reasoning in this framework. While Holfoot is
slower than Smallfoot, it provides the additional assurance of a formal proof
inside HOL. That this is really valuable, is underlined by the fact, that an error
in Smallfoot was detected while building Holfoot. Due to a bug in its implemen-
tation, Smallfoot handles call-by-value parameters like call-by-reference ones.
However, besides a formal foundation and much higher trust in the tool, an-
other advantage of Holfoot is, that it is straightforward to use all the libraries
and proof-tools HOL provides. Smallfoot specifications talk about the shape
of data-structures. The Smallfoot-specification of mergesort for example states
that mergesort returns a single linked list. It does not guarantee anything about
the content of this list, much less that mergesort really sorts lists. In fact, to
prove a fully functional specification of mergesort, substantial knowledge about
3
Holfoot as well as a collection of examples can be found in the HOL-repository.
472 T. Tuerk

permutations of lists, orderings and sorted lists is needed. Here, the existing
infrastructure of HOL is very useful.
Once the formalisation of the features provided by Smallfoot was completed, it
was straight-forward to extend it with support for the content of data-structures.
This allows the verification of fully functional specifications. Holfoot is able to
automatically verify fully functional specifications of simple algorithms like list-
reversal, list-copy or list-length:
list_copy(z;c) [data_list(c,data)] { list_reverse(i;) [data_list(i,data)] {
local x,y,w,d; local p, x;
if (c == NULL) {z=NULL;} p = NULL;
else { while (i != NULL) [
z=new(); z->tl=NULL; data_list(i,_idata) *
x = c->dta; z->dta = x; data_list(p,_pdata) *
w=z; ‘‘(data:num list) =
y=c->tl; (REVERSE _pdata) ++ _idata‘‘] {
while (y != NULL) [ x = i->tl; i->tl = p; p = i; i = x;
data_lseg(c, }
‘‘_data1++[_cdate]‘‘,y) * i = p;
data_list(y,_data2) * } [data_list(i,‘‘REVERSE data‘‘)]
data_lseg(z,_data1,w) *
w |-> tl:0,dta:_cdate *
‘‘data:num list = list_length(r;c) [data_list(c,cdata)] {
_data1 ++ _cdate::_data2‘‘] { local t;
d=new(); d->tl=NULL; if (c == NULL) {r = 0;} else {
x=y->dta; d->dta=x; t = c->tl;
w->tl=d; w=d; list_length(r;t);
y=y->tl; r = r + 1;
} }
} } [data_list(c,cdata) *
} [data_list(c,data) * data_list(z,data)] r == ‘‘LENGTH (cdata:num list)‘‘]

The syntax of the this pseudo-code used by Smallfoot and Holfoot is indented
to be close to C. However, there are some uncommon features: the arguments
of a procedure before the semicolon are call-by-reference arguments, the others
call-by-reference ones. So the argument z of list copy is a call-by-reference
argument, whereas c is a call-by-value argument. The pre- and postconditions of
procedures are denoted in brackets around the procedure’s body. Similarly, loops
are annotated with their invariant. In specifications, a variable name that starts
with an underscore denotes an existentially quantified variable. For example,
data1, data2 and cdate are existentially quantified in the loop-invariant of
copy. This invariant requires that data can somehow be split into these three.
How it is split changes from iteration to iteration. Finally, everything within
quotation marks is regarded as a HOL term. So, REVERSE or LENGTH are not
part of the Smallfoot formalisation but functions from HOL’s list library.
While these simple algorithms can be handled completely automatically, more
complicated ones like the aforesaid mergesort need user interaction. However,
even in these interactive proofs, there is a clear distinction between reasoning
about the content and about the shape. While the shape can mostly be handled
automatically, the user is left to reason about properties of the content. Let’s
consider the following specification of parallel mergesort:
merge(r;p,q) [data_list(p,pdata) * if (q == NULL) r = p;
data_list(q,qdata) * else if(p == NULL) r = q;
‘‘(SORTED $<= pdata) /\ else {
(SORTED $<= qdata)‘‘] { p_date = p->dta;
local t, q_date, p_date; q_date = q->dta;
A Formalisation of Smallfoot in HOL 473

if (q_date < p_date) { p->tl = t2;

t = q; q = q->tl; t1->tl = r;
} else { r = t1;
t = p; p = p->tl; }
} }
merge(r;p,q); } [data_list(p,_pdata) *
t->tl = r; r = t; data_list(r,_rdata) *
} ‘‘PERM (_pdata ++ _rdata) data‘‘]
} [data_list(r,_rdata) *
‘‘(SORTED $<= _rdata) /\ mergesort(r;p) [data_list(p,data)] {
(PERM (pdata ++ qdata) _rdata)‘‘] local q,q1,p1;
if (p == NULL) r = p;
split(r;p) [data_list(p,data)] { else {
local t1,t2; split(q;p);
if (p == NULL) r = NULL; mergesort(q1;q) || mergesort(p1;p);
else { merge(r;p1,q1);
t1 = p->tl; }
if (t1 == NULL) r = NULL; } [data_list(r,_rdata) *
else { ‘‘(SORTED $<= _rdata) /\
t2 = t1->tl; (PERM data _rdata)‘‘]
split(r;t2);

Holfoot can automatically reduce this fully functional speciﬁcation of

mergesort to a small set of simple verification conditions. These verification
conditions are just concerned with permutations and sorted lists. The whole
structure of the program and the shape of the data-structures can be handled
automatically. Some of the remaining verification conditions are very simple
as for example SORTED $<= x::xs ==> SORTED xs. Others require some knowl-
edge about permutations like PERM (x::(xs ++ ys)) l ==> PERM (x::(xs ++
y::ys)) (y::l). However, most of them can easily be handled by automated
proof tools for permutations and orderings. The only remaining verification con-
ditions are of the form
SORTED $<= x::xs /\ SORTED $<= y::ys /\ SORTED $<= l /\
y < x /\ PERM l (x::xs++ys) ==> SORTED $<= y::l

Their proof needs a combination of properties of permutations and sorted lists.

Thus, the standard proof tools fail and a tiny manual proof is required. The
following proof-script is suﬃcient to prove the given speciﬁcation of mergesort:
val thm = smallfoot_verbose_prove(mergesort-specification-filename,
SMALLFOOT_VC_TAC THEN
ASM_SIMP_TAC (arith_ss++PERM_ss) [SORTED_EQ, SORTED_DEF, transitive_def] THEN
REPEAT STRIP_TAC THEN (
IMP_RES_TAC PERM_MEM_EQ THEN
FULL_SIMP_TAC list_ss [] THEN
RES_TAC THEN ASM_SIMP_TAC arith_ss []
));

After parsing and preprocessing the specification stored in the given file, ver-
ification conditions are generated using SMALLFOOT VC TAC. This single call is
sufficient to eliminate the whole program structure and leave just the described
verification conditions. The next line calls some proof-tools for permutations and
sorted lists and is able to discharge most of the verification conditions. The rest
of the proof-script handles the remaining verification conditions which are all of
the aforesaid form.
As this example illustrates, human interaction is often only needed to reason
about the essence of an algorithms and HOL provides powerful tools to aid this
474 T. Tuerk

reasoning. This shows the power of Holfoot and with it the ﬂexibility and power
of the whole framework.

3 Formalisation of Abstract Separation Logic

In the previous section, a high-level view of Holfoot and with it of the framework
and its capabilities was presented. In this section its semantic foundations –
Abstract Separation Logic [4] – will be explained. This explanation follows closely
the HOL formalisation4 .
Abstract Separation Logic abstracts from both the concrete states and the
concrete programming language. Instead of using a concrete model of memory
consisting usually of a stack and a heap, Abstract Separation Logic uses an
abstract set of states Σ. A partial function ◦, called separation combinator, is
used to combine states.

Deﬁnition 1 (Separation Combinator). A separation combinator ◦ is a par-

tially deﬁned function that satisﬁes the following properties:

– ◦ is partially associative
– ◦ is partially commutative
– ◦ is cancellative, i. e.
∀s1 , s2 , s3 . Deﬁned(s1 ◦ s2 ) ∧ (s1 ◦ s2 = s1 ◦ s3 ) =⇒ (s2 = s3 ) holds
– for all states s there exists a neutral element us with us ◦ s = s

Deﬁnition 2 (Separateness, Substates). This deﬁnition of separation com-

binators induces notions of separateness (#) and substates (().

s1 # s2 iff s1 ◦ s2 is defined s1 ( s3 iff ∃s2 . s3 = s1 ◦ s2

Deﬁnition 3 (∗, emp). Predicates are as usual elements of the powerset of

states P (Σ). This allows to deﬁne the spatial conjunction operator ∗ of separa-
tion logic and its neutral element emp as follows:

P ∗ Q := {s | ∃p, q. (p ◦ q = s) ∧ p ∈ P ∧ q ∈ Q}
emp := {u | ∃s. u ◦ s = s}

∗ forms together with emp a commutative monoid. Other standard separation

logic constructs can be deﬁned in a natural way as well. There is a shallow em-
bedding of the most common constructs available in the framework. Additional
constructs can be added easily.
In order to instantiate the framework, one has to provide a concrete set of
states Σ and a concrete separation combinator ◦.
4
The sources can be found in the HOL - repository at Sourceforge in the subdirectory
examples/separationLogic.
A Formalisation of Smallfoot in HOL 475

Example 4. Heaps, modelled as finite partial functions, are commonly used with
separation logic. In this model, Σ is the set of all heaps and ◦ is given by
"
h1 h2 iff dom(h1 ) ∩ dom(h2 ) = ∅
h1 ◦ h2 =
undefined otherwise
In this setting, two heaps are disjoint (h1 # h2 ) iff their domains are disjoint.
The combination of two separate heaps (h1 ◦ h2 ) is their disjoint union. The
empty heap is the neutral element for all heaps.

3.1 Actions
The programming language used by Abstract Separation Logic is abstract as
well. Its elementary constructs are actions.
Deﬁnition 5 (Action). An action act : Σ → P (Σ) is a function from a state
to a set of states or a special failure state .
If executing an action act in a state s results in , then an error may occur
during the execution of the action. Otherwise, if act(s) results in a set of states
S, no error can occur and executing the action will nondetermistically lead to
one of the states in S. The empty set can be used to model actions that do not
terminate. Actions can be combined to form new actions. The most common
combination is consecutive execution:
⎧
⎪
⎪ if act1 (s) =
⎨ if ∃s . s ∈ act1 (s) ∧ act2 (s ) =
(act1 ; act2 )(s) =
⎪
⎪
⎩
act2 (s ) otherwise
s ∈act1 (s)

Another common combination is nondeterministic choice:

$ & ⎧
% ⎨ if ∃act ∈ act-set. act(s) =
act (s) = act(s) otherwise
⎩
act∈act-set act∈act-set
%
act1 + act2 = act
act∈{act1 , act2 }

Deﬁnition 6 (Semantic Hoare Triples). For predicates P, Q and an action

act, a semantic Hoare triple 2 P # act 2 Q # holds, iff for all states p
that satisfy the precondition P the action does not fail, i. e. ∀p ∈ P. act(p) = ,
and leads to a state that satisfies the postcondition Q, i. e. ∀p ∈ P. act(p) ⊆ Q.
Notice, that this describes partial correctness, since a Hoare triple is trivially
satisfied, if act does not terminate, i. e. if act(s) = ∅ holds.
Local reasoning is an essential feature of separation logic. It allows to extend a
specification with an arbitrary context:
2 P # act 2 Q #
2 P ∗ R # act 2 Q ∗ R #
476 T. Tuerk

In order to provide local reasoning, only those actions are considered whose
speciﬁcations can be safely extended using this inference rule. These actions are
called local.

Deﬁnition 7 (Local Actions). An action act is called local , iﬀ for all states
s, s1 , s2 with s = s1 ◦ s2 and act(s1 ) = the evaluation of the action on the
extended state does not fail (act(s) = ) and act(s) ⊆ act(s1 ) ∗ {s2 } holds.

The skip action deﬁned by skip(s) := {s} is a simple example of a local action.
Other examples are diverge(s) := ∅ or fail(s) := . Sequential composition and
nondeterministic choice preserve locality. The set of local actions forms together
with the following order a complete lattice.

Definition 8 (Order of Actions). act1 ' act2 iff act2 allows more behaviour
than act1 , i. e. iff ∀s. (act2 (s) = ) ∨ (act1 (s) ⊆ act2 (s)) holds. Notice that this
is equivalent to ∀P, Q. 2 P # act2 2 Q # =⇒ 2 P # act1 2 Q #.

This lattice of local actions is used to define a best local action as an infimum of
local actions in this lattice. The HOL formalisation contains the corresponding
definitions and theorems. However, here the discussion of this lattice is skipped.
Instead an equivalent, high level characterisation is used.

Deﬁnition 9 (Best Local Action). Given a precondition P and a postcondi-

tion Q the best local action bla[P, Q] is the most general local action that satisﬁes
2 P # bla[P, Q] 2 Q #. This means:

– bla[P, Q] is a local action

– 2 P # bla[P, Q] 2 Q # holds
– bla[P, Q] is more general than any local actions act with 2 P # act 2 Q #,
i. e. act ' bla[P, Q]

One common use of the best local action bla are the materialisation and anni-
hilation actions. materialise(P ) := bla[emp, P ] can be used to materialise some
new part of the state that satisfies the predicate P . Similarly, annihilate(P ) :=
bla[P, emp] is used to annihilate some part of the state that satisfies P . No-
tice, that for certain P the annihilation annihilate(P ) behaves unexpectedly. If
there is more than one substate that satisfies P , then annihilate(P ) diverges.
Therefore, usually just precise predicates are used with annihilation:

Deﬁnition 10 (Precise Predicates). A predicate P is called precise iﬀ for

every state there is at most one substate that satisﬁes P .

As shown by the examples of materialisation and annihilation, bla is useful to

deﬁne local actions. Often it is however necessary to relate the pre- and post-
condition. For example, the postcondition of an action that increments the value
of a variable needs to refer to the old value of this variable. This leads to the
following extension of best local actions:
A Formalisation of Smallfoot in HOL 477

Deﬁnition 11 (Quantiﬁed Best Local Action). Given two functions P(·)

and Q(·) that map some argument type to predicates the quantiﬁed best local
action (qbla) is the most general local action that satisﬁes

∀arg. 2 Parg # qbla[P(·) , Q(·) ] 2 Qarg #

Another useful local action is assume. Given a predicate, assume skips if the
predicate holds and diverges if it does not hold. In the next section, assume is
used in combination nondeterministic choice and Kleene star to model condi-
tional execution and loops. In order to be a local action, the predicate has to be
intuitionistic, though.
Definition 12 (Intuitionistic Predicate). A predicate P is called intuition-
istic, iff P ∗ true = P holds. This means that iff P holds for a state s, then it
holds for all superstates s 3 s as well. The intuitionistic negation ¬i P holds in
a state s, if P does not hold for all superstates s 3 s. P is called decided in a
set of states S, iff ∀s ∈ S. s ∈ P ∨ s ∈ ¬i P holds.
For an intuitionistic predicate P the local action assume(P ) can be defined as
⎧
⎨ {s} if s ∈ P
assume(P )(s) = ∅ if s ∈ ¬i P
⎩
otherwise

3.2 Programs

This notion of local actions is extended to an abstraction of an imperative pro-

gramming language. The basic constructs of this language are local actions.
Besides local actions, the language contains the usual control structures like
conditional execution and while-loops. Additionally, nondeterminism, concur-
rency and semaphores are supported. The deﬁnition of the semantics of this
language follow ideas from Brooks [3] about Concurrent Separation Logic. Pro-
grams are translated to a set of traces that capture all possible interleavings
during concurrent execution. The semantics of a program is given by nondeter-
ministic choice between the semantics of its traces. As an additional layer of
abstraction proto-traces are used between programs and traces.

Deﬁnition 13 (Proto-Trace). The set of proto-traces PTr is inductively de-

ﬁned to be the smallest set with

– act ∈ PTr for all local actions act

– pt1 ; pt2 ∈ PTr (sequential composition) for pt1 , pt2 ∈ PTr
– pt1 || pt2 ∈ PTr (parallel composition) for pt1 , pt2 ∈ PTr
– proccall(name, arg) ∈ PTr (procedure call) for all procedure-names name
and all arguments arg
– l.pt ∈ PTr (lock declaration) for a lock l and pt ∈ P T r
– with l do pt ∈ PTr (critical region) for a lock l and pt ∈ P T r
478 T. Tuerk

Deﬁnition 14 (Program). A program is a set of proto-traces. The set of all

programs is denoted by Prog.
Definition 15 (Atomic Action). An atomic action is either a local action,
a check check(act1 , act2 ) for local actions act1 , act2 or a lock operation P (l) or
V (l) for a lock l.
Definition 16 (Trace). A trace is a list of atomic actions. Let denote the
empty trace. The concatenation of two traces t1 , t2 is denoted as t1 · t2 .
To define the traces of a program, an environment is needed that fixes the se-
mantics of procedure calls.
Definition 17 (Procedure Environment). A procedure environment is a fi-
fin
nite map penv : procedure-names arguments → Prog from procedure-names
to a function from procedure arguments to programs.
Definition 18 (Traces of Proto-traces). Given an procedure environment
penv, the traces of a proto-trace t after unfolding procedures n-times with respect
n
to penv (denoted as Tpenv (t)) are given by:

n
Tpenv (act) = {act}
n
Tpenv (pt1 ; pt2 ) = {t1 · t2 | t1 ∈ Tpenv
n
(pt1 ) ∧ t2 ∈ Tpenv
n
(pt2 )}

n
Tpenv (pt1 || pt2 ) = t1 zip t2
n (pt ),t ∈T n (pt )
t1 ∈Tpenv 1 2 penv 2
⎧
⎪
⎪ {fail} if name ∈
/ dom(penv)
⎨ ∅ if name ∈ dom(penv) ∧ n = 0
n
Tpenv (proccall(name, arg)) =
⎪
⎪
n−1
Tpenv (pt) otherwise
⎩
pt∈penv(name,arg)
n
Tpenv (l.pt) = {remove-locks(l,t) | t ∈ Tpenv
n
(pt) ∧ t is l-synchronised}
n
Tpenv (with l do pt) = {P (l) · t · V (l) | t ∈ Tpenv
n
(pt)}

In this deﬁnition, remove-locks(l,t) removes all atomic actions concerned with

the lock l, i. e. P (l) and V (l), from the trace t. A trace is l-synchronised, iﬀ the
lock-actions P (l) and V (l) are properly aligned. Finally, the auxiliary function
zip builds all interleavings of two traces. It is given by

check(a1 , a2 ) · t if a1 and a2 are local actions
add-check(a1 , a2 , t) =
t otherwise
zip t = t zip = {t}
'
add-check(a1 , a2 , t) | t ∈ {a1 ; u | u ∈ t1 zip (a2 ; t2 )}(∪
(a1 ; t1 ) zip (a2 ; t2 ) =
{a2 ; u | u ∈ (a1 ; t1 ) zip t2 }

Finally, the traces of a proto-trace pt and a program p with respect to penv are
deﬁned as

n
Tpenv (pt) = Tpenv (pt) Tpenv (p) = Tpenv (pt)
n∈N pt∈p
A Formalisation of Smallfoot in HOL 479

It remains to deﬁne the semantics of traces. Local actions in traces are just inter-
preted by themselves. Checks are added to enforce race-freedom. The semantics
of lock actions is however more complicated.
One central idea behind Concurrent Separation Logic is to split the state into
parts for each thread and each lock: a lock protects a part of the state. If a
thread holds a lock, it can access this state, otherwise it cannot. Therefore, a
precise predicate called lock invariant is associated with each lock. This invariant
abstracts the part of the state that is protected by the lock. materialise and
annihilate actions are used to make this abstracted state accessible/inaccessible.
Deﬁnition 19 (Semantics of Atomic Actions). The semantics of an atomic
action with respect to a lock-environment lenv : locks → P (Σ) is given by

actlenv = act
⎧
⎨ if ∃s1 , s2 . s = s1 ◦ s2 ∧
{s}
check(act1 , act2 )lenv (s) = act1 (s1 ) = ∧ act2 (s2 ) =
⎩
otherwise
P (l)lenv = materialise(lenv(l))
V (l)lenv = annihilate(lenv(l))

Notice, that the semantics of an atomic action is a local action.

Deﬁnition 20 (Semantics of Traces, Programs). The semantics of a trace
with respect to a lock-environment is the sequential combination of the semantics
of its atomic actions. The semantics of a program is given by the nondeterministic
choice between the semantics of its traces.
%
lenv = skip a · tlenv = alenv ; tlenv prog(penv,lenv) = tlenv
t∈Tpenv (prog)

Notice that the semantics of a program is a always a local action. This allows
concepts for actions to be easily lifted to programs:
Definition 21 (Hoare triple). A Hoare triple (penv,lenv) {P } prog {Q} holds,
iff 2 P # prog(penv,lenv) 2 Q # holds. If a Hoare triple holds for all
environments, it is written as {P } prog {Q}.
Definition 22 (Program Abstractions). A program p2 is an abstraction of
a program p1 with respect to some environment env (denoted as p1 'env p2 ), iff
p1 env ' p2 env holds.

3.3 Programming Constructs

In the previous section a concept of programs has been introduced. However,
these programs hardly resemble the usual programs written in imperative lan-
guages. Common constructs like loops or conditional execution are missing.
However, these can be easily deﬁned.
480 T. Tuerk

Every proto-trace pt can be regarded as the program {pt}. This immediately

enriches the programming language with procedure calls and local actions. In
particular, one can use skip, fail, assume, diverge, bla and qbla as programs. A lot
of instructions can easily be defined using bla or qbla. Given some suitable defini-
tions for a state containing a stack, one could for example define an instruction
that increments a variable as x++ = qbla[λc. x = c, λc. x = (c + 1)].
The HOL-formalisation uses a shallow embedding of local actions. So, any
function f : Σ → P (Σ) can be used as a program. However, to enforce that
just local actions are used, f is implicitly replaced by fail, if it is not local. The
other constructs for proto-traces can be lifted to programs as well:
p1 ; p2 = {pt1 ; pt2 | pt1 ∈ p1 ∪ {diverge} ∧ pt2 ∈ p2 ∪ {diverge}}
p1 || p2 = {pt1 || pt2 | pt1 ∈ p1 ∧ pt2 ∈ p2 }
l.p = {l.pt | pt ∈ p}
with l do p = {with l do pt | pt ∈ p}
Some other constructs that are not available for proto-traces can be defined
using the fact that programs are just sets of proto-traces. A simple example
is nondeterministic choice: p1 + p2 := p1 ∪ p2 . In combination with assume
and sequential composition of programs, this can be used to define conditional
execution:
if B then p1 else p2 = (assume(B); p1 ) + (assume(¬i B); p2 )
This definition of conditional execution might seem weird. Remember however,
that the framework is just interested in partial correctness. Therefore, it is fine
to nondetermistically choose between paths and then diverge, if the wrong choice
has been made.
Loops can be defined in a similar manner. However, to define loops, Kleene
star is needed:

p0 = skip pn+1 = p ; pn p∗ = pn
n∈N
while B do p = (assume(B); p)∗ ; assume(¬i B)
This time, one chooses nondetermistically, how often one needs to go around the
loop. If the wrong number of iterations is picked, the trace is aborted by one of
the assume statements.
Notice the definition of Kleene star. It is represented as a shallow embedding
in HOL. Moreover, it uses nondeterministic choice over an infinite set of proto-
traces. This simple example illustrates how flexible and powerful the combina-
tion of shallow and deep embeddings is. Depending on the needs of a concrete
instantiation this power and flexibility can be used to define more constructs.

3.4 Inference Rules

Using the semantics of Abstract Separation Logic as presented above, one can
deduce high-level inference rules. These inferences are used to verify speciﬁ-
cation on a high-level of abstraction instead of breaking every proof down to
A Formalisation of Smallfoot in HOL 481

the semantic foundations. Some important inference rules, that are valid in
Abstract Separation Logic are:
P2 ⇒ P1 Q1 ⇒ Q2 p1 p2
env {P1 }p{Q1 } env {P }p2 {Q}
env {P2 } p{Q2 } env {P } p1 {Q}

env {P } p {Q} env {P } p1 {Q} env {Q} p2 {R}

env {P ∗ R} p {Q ∗ R} env {P } p1 ; p2 {R}

B is decided in P
env {P } assume(B) {P ∧ B} env {Parg } qbla[P(·) , Q(·) ]{Qarg }

env {P } p {P } env {P } p1 {Q} env {P } p2 {Q}

∗
env {P } p {P } env {P } p1 + p2 {Q}

B is decided in P
name ∈ dom(penv) env {B ∧ P } p1 {Q}
(penv,lenv) {P } penv(name, arg) {Q} env {¬i B ∧ P } p2 {Q}
(penv,lenv) {P } proccall(name, arg){Q} env {P } if B then p1 else p2 {Q}

B is decided in P env {P1 } p1 {Q1 }

env {B ∧ P } p {P } env {P2 } p2 {Q2 }
env {P } while B do p {¬i B ∧ P } env {P1 ∗ P2 } p1 || p2 {Q1 ∗ Q2 }

lenv(l) = r lenv(l) = r
(penv,lenv) {P } p {Q} (penv,lenv) {P ∗ r} p {Q ∗ r}
(penv,lenv) {P ∗ r} l.p {Q ∗ r} (penv,lenv) {P } with l do p{Q}

These inference rules are very useful. However, the reader might notice, that
there is a problem with recursive functions. The inference rule that handles
procedure-calls replaces the call with the definition of the procedure. This is
fine for non-recursive functions. However, an implicit induction is needed for
recursive functions.
Definition 23 (Procedure Specification). A procedure specification consists
of a lock-environment lenv, a procedure-environment penv and specification func-
tions P(·,·,·) , Q(·,·,·). It holds, iff all procedures satisfy their specification in the
given environment:

∀f ∈ dom(penv), arg, x. (penv,lenv) {P(f,arg,x) }proccall(name, arg){Q(f,arg,x) }

To prove that a procedure speciﬁcation holds, it is suﬃcient to show that as-

suming that all procedures satisfy their speciﬁcation, their bodies satisfy the
speciﬁcation. One does not need to show that possible recursions terminate,
since Abstract Separation Logic just talks about partial correctness.
482 T. Tuerk

There is tool-support in the HOL-formalisation to handle procedure speciﬁca-

tions. To prove a procedure specification, it is sufficient to prove the specifications
of all procedure bodies, where a procedure call proccall(name, arg) has been re-
placed by qbla[P(name,arg,·) , Q(name,arg,·) ]. This means that the resulting Hoare
triples do not contain procedure calls any more. Therefore, the resulting Hoare
triples do not depend on the procedure environment.
Making the Hoare triples independent from the lock-environment as well is
not necessary, but often useful. The lock-operations l.p and with l do p can be
eliminated by introducing annihilate(lenv(l)) and materialise(lenv(l)) at appro-
priate places in p. This moves the knowledge about lock invariants from the
environment to the program itself, making the environment redundant.
Loops can be eliminated in a similar manner. Given a loop-invariant I(·)
such that for all x an intuitionistic predicate B is decided in Ix , a while-loop
while B do p can be abstracted by qbla[I(·) , I(·) ∧ ¬i B], if it can be proved that
the body of the loop really satisfies the invariant.
After these preprocessing steps, one usually just needs to reason about
programs consisting of local actions and conditional-execution, for which the
presented inference rules are very useful.

3.5 Holfoot

The instantiation of the Abstract Separation Logic framework to Holfoot con-

sists of two steps. First, the framework is instantiated to use a stack that maps
variables to permissions and values. This instantiation is based on ideas from
Variables as Resource in Hoare Logic [8]. The concrete type of the variables and
values is not specified. Similarly, the stack is just a part of an abstract state.
Nevertheless, this instantiation is sufficient to reason about pure expressions,
assignments, local variables, etc. In a second step, this setting is instantiated to
Holfoot.
Holfoot represents stack-variables with strings and uses natural numbers as
values. Furthermore the abstract component of the state is instantiated to a
heap from locations (represented by natural numbers without zero) and tags
(represented as strings) to values (natural numbers). Using this concrete repre-
sentation of a state, it is easy to define actions on these states. For example, the
field-lookup action v = e->t is defined as

val holfoot_field_lookup_action_def = Define ‘

(holfoot_field_lookup_action v e t) ((st,h):holfoot_state)) =
if (~(var_res_sl___has_write_permission v st) \/
IS_NONE (e st)) then NONE else
let loc = (THE loc_opt) in (
if (~(loc IN FDOM h) \/ (loc = 0)) then NONE else
SOME {var_res_ext_state_var_update v ((h ’ loc) t) (st,h)})‘;

This action fails, if there is no write permission on the variable v or if the

expression e fails to be evaluated in the current state (for example because a
read permission on a variable it uses is missing). Otherwise, it checks whether
the location pointed to by e is in the heap. If it is, the value of v is updated by
A Formalisation of Smallfoot in HOL 483

the value found in the heap at that location indexed by tag t. Otherwise, i. e. if
the location is not in the heap, the action fails.
Similarly to actions, it is straightforward to deﬁne predicates. For example,
e1 |-> L is deﬁned as:
val holfoot_ap_points_to_def = Define ‘
holfoot_ap_points_to e1 L = \(st,h):holfoot_state.
let loc_opt = (e1 st) in (IS_SOME (loc_opt) /\
let (loc = THE loc_opt) in (~(loc = 0) /\ ((FDOM h)= {loc}) /\
(FEVERY (\(tag,exp). IS_SOME (exp st) /\ (THE (exp st) = (h’ loc) tag)) L))‘;

This deﬁnition of |-> is used to deﬁne predicates for single linked lists. The data
content of these lists is represented by lists of natural numbers. Therefore HOL’s
list libraries can be used to reason about the data content.
Since actions and predicates are shallowly embedded, it is easy to extend
Holfoot with new actions and predicates. Moreover, the automation has been
designed with extensions in mind.

4 Conclusion and Future Work

The main contribution of this work is the formalisation of Abstract Separation
Logic and demonstrating that Abstract Separation Logic is powerful and ﬂexible
enough to be used as a basis for separation logic tools. The formalisation of
Abstract Separation Logic contains some minor extensions like the addition of
procedures. However, it mainly follows the original deﬁnitions [4].
The Smallfoot case study demonstrates the potential of Abstract Separation
Logic. However, it is interesting in its own right as well. The detected bug in
Smallfoot shows that high-assurance implementations of even comparatively sim-
ple tools like Smallfoot are important. Moreover, Holfoot is one of the very few
separation logic tools that can reason about the content of data-structures as
well as the shape. Combining separation logic with reasoning about data-content
is currently one of the main challenges for separation logic tools. As the example
of parallel mergesort demonstrates, Holfoot can answer this challenge by com-
bining the power of the interactive prover HOL with the automation separation
logic provides.
In the future, I will try to improve the level of automation. Moreover, I plan
to add a concept of arrays to Holfoot. This will put my claim that Holfoot is
easily extensible to a test, since it requires adding new actions for allocating /
deallocating blocks of the heap as well as adding predicates for arrays. However,
the main purpose of adding arrays is reasoning about pointer arithmetic. It will
be interesting to see, how HOL can help to verify algorithms that use pointer-
arithmetic and how much automation is possible.

Acknowledgements
I would like to thank Matthew Parkinson, Mike Gordon, Alexey Gotsman, Mag-
nus Myreen and Viktor Vafeiadis for a lot of discussions, comments and criticism.
484 T. Tuerk

References
[1] Appel, A.W., Blazy, S.: Separation logic for small-step Cminor. In: Schneider, K.,
Brandt, J. (eds.) TPHOLs 2007. LNCS, vol. 4732, pp. 5–21. Springer, Heidelberg
(2007)
[2] Berdine, J., Calcagno, C., O’Hearn, P.W.: Smallfoot: Modular automatic assertion
checking with separation logic. In: de Boer, F.S., Bonsangue, M.M., Graf, S.,
de Roever, W.-P. (eds.) FMCO 2005. LNCS, vol. 4111, pp. 115–137. Springer,
Heidelberg (2006)
[3] Brookes, S.: A semantics for concurrent separation logic. Theor. Comput.
Sci. 375(1-3), 227–270 (2007)
[4] Calcagno, C., O’Hearn, P.W., Yang, H.: Local action and abstract separation logic.
In: LICS 2007: Proceedings of the 22nd Annual IEEE Symposium on Logic in
Computer Science, Washington, DC, USA, pp. 366–378. IEEE Computer Society,
Los Alamitos (2007)
[5] Distefano, D., O’Hearn, P.W., Yang, H.: A local shape analysis based on separation
logic. In: Hermanns, H., Palsberg, J. (eds.) TACAS 2006. LNCS, vol. 3920, pp.
287–302. Springer, Heidelberg (2006)
[6] Marti, N., Affeldt, R., Yonezawa, A.: Towards formal verification of memory prop-
erties using separation logic. In: 22nd Workshop of the Japan Society for Soft-
ware Science and Technology, Tohoku University, Sendai, Japan, September 13–15.
Japan Society for Software Science and Technology (2005)
[7] O’Hearn, P.W., Reynolds, J.C., Yang, H.: Local reasoning about programs that
alter data structures. In: Fribourg, L. (ed.) CSL 2001 and EACSL 2001. LNCS,
vol. 2142, pp. 1–19. Springer, Heidelberg (2001)
[8] Parkinson, M., Bornat, R., Calcagno, C.: Variables as resource in hoare logics.
In: LICS 2006: Proceedings of the 21st Annual IEEE Symposium on Logic in
Computer Science, Washington, DC, USA, pp. 137–146. IEEE Computer Society,
Los Alamitos (2006)
[9] Reynolds, J.C.: Separation logic: A logic for shared mutable data structures. In:
LICS 2002: Proceedings of the 17th Annual IEEE Symposium on Logic in Com-
puter Science, Washington, DC, USA, pp. 55–74. IEEE Computer Society, Los
Alamitos (2002)
[10] Tuch, H., Klein, G., Norrish, M.: Types, bytes, and separation logic. In: POPL
2007: Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on
Principles of programming languages, pp. 97–108. ACM, New York (2007)
[11] Weber, T.: Towards mechanized program verification with separation logic. In:
Marcinkowski, J., Tarlecki, A. (eds.) CSL 2004. LNCS, vol. 3210, pp. 250–264.
Springer, Heidelberg (2004)
Liveness Reasoning with Isabelle/HOL

Jinshuang Wang1,2 , Huabing Yang1 , and Xingyuan Zhang1

1
PLA University of Science and Technology, Nanjing 210007, China
2
State Key Lab for Novel Software Technology, Nanjing University, Nanjing 210093, China
{wangjinshuang,xingyuanz}@gmail.com

Abstract. This paper describes an extension of Paulson’s inductive protocol ver-

ification approach for liveness reasoning. The extension requires no change of
the system model underlying the original inductive approach. Therefore, all the
advantages, which makes Paulson’s approach successful for safety reasoning are
kept, while liveness reasoning becomes possible. To simplify liveness reasoning,
a new fairness notion, named Parametric Fairness is used instead of the standard
ones. A probabilistic model is established to support this new fairness notion.
Experiments with small examples as well as real world communication protocols
confirm the practicality of the extension. All the work has been formalized with
Isabelle/HOL using Isar.

Keywords: Liveness Proof, Inductive Protocol Verification, Probabilistic Model,

Parametric Fairness.

1 Introduction
Paulson’s inductive approach has been used to verify safety properties of many secu-
rity protocols [1, 2]. The success gives incentives to extend this approach to a general
approach for protocol verification. To achieve this goal, a method for the verification
of liveness properties is needed. According to Manna and Pnueli [3], temporal proper-
ties are classified into three classes: safety properties, response properties and reactivity
properties. The original inductive approach only deals with safety properties. In this pa-
per, proof rules for liveness properties (both response and reactivity) are derived under
the same execution model as the original inductive approach. These liveness proof rules
can be used to reduce the proof of liveness properties to the proof of safety properties,
a task well solved by the original approach.
The proof rules are derived based on a new notion of fairness, parametric fairness,
which is an adaption of the α-fairness [4,5,6] to the setting of HOL. Parametric fairness
is properly stronger than standard fairness notions such as weak fairness and strong
fairness. We will explain why the use of parametric fairness can deliver more liveness
results through simpler proofs.
A probabilistic model is established to show the soundness of our new fairness no-
tion. It is proved that the set of all parametrically fair execution traces is measurable
and has probability 1. Accordingly, the definition of parametric fairness is reasonable.

This research was funded by 863 Program(2007AA01Z409) and NNSFC(60373068) of China.

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 485–499, 2009.
c Springer-Verlag Berlin Heidelberg 2009
486 J. Wang, H. Yang, and X. Zhang

The practicability of this liveness reasoning approach has been confirmed by exper-
iments of various sizes [7, 8, 9]. All the work has been formalized with Isabelle/HOL
using Isar [10, 11], although the general approach is not necessarily confined to this
particular system.
The paper is organized as the following: section 2 presents concurrent system, the
execution model of inductive approach; section 3 gives a shallow embedding of LTL
(Linear Temporal Logic); section 4 gives an informal explanation of parametric fair-
ness; section 5 explains the liveness proof rules; section 6 establishes a probabilis-
tic model for parametric fairness; section 7 describes liveness verification examples.
section 8 discusses related works; section 9 concludes.

2 Concurrent Systems

In the inductive approach, system state only changes with the happening of events. Ac-
cordingly, it is natural to represent a system state with the list of events happening so
far, arranged in reverse order. A system is concurrent because its states are nondeter-
ministic, where a state is nondeterministic, if there are more than one event eligible to
happen under that state. The specification of a concurrent system is just a specification
of this eligible relation. Based on this view, formal definition of concurrent system is
given in Fig. 1.

σi ≡ σ i

primrec [[σ]]0 = []
[[σ]](Suc i) = σ i # [[σ]]i

τ [cs> e ≡ (τ , e) ∈ cs

inductive-set vt :: ( a list × a) set ⇒ a list set

for cs ::( a list × a) set where
vt-nil [intro] : [] ∈ vt cs |
vt-cons [intro]: [[τ ∈ vt cs; τ [cs> e]] =⇒ (e # τ ) ∈ vt cs

consts derivable :: a ⇒ b ⇒ bool (- - [64, 64] 50)

fnt-valid-def: cs τ ≡ τ ∈ vt cs
inf-valid-def: cs σ ≡ ∀ i. [[σ]]i [cs> σ i

Fig. 1. Definition of Concurrent System

The type of concurrent systems is ( a list × a) set, where a is the type of events.
Concurrent systems are written as cs. The expression (τ , e) ∈ cs means that event
e is eligible to happen under state τ in concurrent system cs. The notation (τ , e) ∈ cs
is abbreviated as τ [cs> e. The set of reachable states is written as vt cs and τ ∈ vt cs
is abbreviated as cs τ . An execution of a concurrent system is an infinite sequence of
events, represented as a function with type nat ⇒ a. The i-th event in execution σ is
abbreviated as σ i . The prefix consisting of the first i events is abbreviated as [[σ]]i . For σ
to be a valid execution of cs (written as cs σ), σ i must be eligible to happen under [[σ]]i .
Liveness Reasoning with Isabelle/HOL 487

Infinite execution σ is called nondeterministic if it has infinitely many nondeterministic

prefixes. It is obvious from the definition of cs σ that σ represents the choices on
which event to happen next at its nondeterministic prefixes.

3 Embedding LTL
LTL (Linear Temporal Logic) used to represent liveness properties in this paper is de-
fined in Fig. 2. LTL formulae are written as ϕ, ψ, κ etc. The type of LTL formulae is
defined as a tlf. The expression (σ, i) |= ϕ means that LTL formula ϕ is valid at mo-
ment i of the infinite execution σ. The operator |= is overloaded, so that σ |= ϕ can be
defined as the abbreviation of (σ, 0) |= ϕ. The always operator , eventual operator ♦,
next operator 1, until operator are defined literally. An operator - is defined to lift a
predicate on finite executions up to a LTL formula. The temporal operator !→ is the lift
of logical implication −→ up to LTL level. For an event e, the term (|e|) is a predicate
on finite executions stating that the last happened event is e. Therefore, the expression
(|e|) is an LTL formula saying that event e happens at the current moment.

types a tlf = (nat ⇒ a) ⇒ nat ⇒ bool

consts valid-under :: a ⇒ b ⇒ bool (- |= - [64, 64] 50)

defs (overloaded) pr |= ϕ ≡ let (σ, i) = pr in ϕ σ i
defs (overloaded) (σ::nat⇒ a) |= (ϕ:: a tlf) ≡ (σ::nat ⇒ a, (0::nat)) |= ϕ

ϕ ≡ λ σ i. ∀ j. i ≤ j −→ (σ, j) |= ϕ
♦ϕ ≡ λ σ i. ∃ j. i ≤ j ∧ (σ, j) |= ϕ

constdefs lift-pred :: ( a list ⇒ bool) ⇒ a tlf (- [65] 65)

P ≡ λ σ i. P [[σ]]i

constdefs lift-imply :: a tlf ⇒ a tlf ⇒ a tlf (- → - [65, 65] 65)

ϕ → ψ ≡ λ σ i. ϕ σ i −→ ψ σ i

constdefs last-is :: a ⇒ a list ⇒ bool

last-is e τ ≡ (case τ of [] ⇒ False | (e # τ ) ⇒ e = e)

syntax -is-last :: a ⇒ ( a list ⇒ bool) ((|-|) [64] 1000)

translations (|e|) last-is e

Fig. 2. The Embedding of Linear Temporal Logic in Isabelle/HOL

4 The Notion of Parametric Fairness

In this section, parametric fairness is introduced as a natural extension of standard
fairness notions. To show this point, the definition of parametric fairness PF is given in
Fig. 3 as the end of a spectrum of fairness notions starting from standard ones.
A study of Fig. 3 may reveal how the definition of PF is obtained through incremental
modifications, from the standard weak fairness WF to the standard strong fairness SF
and finally through the less standard extreme fairness EF. Each fairness notion ?F has a
488 J. Wang, H. Yang, and X. Zhang

constdefs WFα :: ( a list × a) set ⇒ a ⇒ (nat ⇒ a) ⇒ bool

WFα cs e σ ≡ σ |= (λ σ i. [[σ]]i [cs> e) −→ σ |= ♦(λ σ i. σ i = e)

constdefs WF :: ( a list × a) set ⇒ (nat ⇒ a) ⇒ bool

WF cs σ ≡ ∀ e. WFα cs e σ

constdefs SFα :: ( a list × a) set ⇒ a ⇒ (nat ⇒ a) ⇒ bool

SFα cs e σ ≡ σ |= ♦(λ σ i. [[σ]]i [cs> e) −→ σ |= ♦(λ σ i. σ i = e)

constdefs SF :: ( a list × a) set ⇒ (nat ⇒ a) ⇒ bool

SF cs σ ≡ ∀ e. SFα cs e σ

constdefs EFα :: ( a list × a) set ⇒ ( a list ⇒ bool) ⇒ ( a list ⇒ a) ⇒ a seq ⇒ bool
EFα cs P E σ ≡ σ |= ♦(λ σ i. P [[σ]]i ∧ [[σ]]i [cs> E [[σ]]i )
−→ σ |= ♦(λ σ i. P [[σ]]i ∧ σ i = E [[σ]]i )

constdefs EF :: ( a list × a) set ⇒ (nat ⇒ a) ⇒ bool

EF cs σ ≡ ∀ P E. EFα cs P E σ

types a pe = ( a list ⇒ bool) × ( a list ⇒ a)

constdefs PF :: ( a list × a) set ⇒ a pe list ⇒ (nat ⇒ a) ⇒ bool

PF cs pel σ ≡ list-all (λ (P, E). EFα cs P E σ) pel

Fig. 3. Definition of Fairness Notions

corresponding pre-version ?F α . For example, WF is obtained from WF α by quantifying

over e.
For any nondeterministic execution σ, its progress towards desirable states depends
on the choice of helpful events under the corresponding helpful states (or helpful pre-
fixes). The association between helpful states and helpful events can be specified using
(P, E)-pairs, where P is a predicate on states used to identify helpful states and E is the
choice function used to choose the corresponding helpful events under P-states. A pair
(P, E) is called enabled under state τ if P τ is true, and it is called executed under τ if
E τ is chosen to happen.
An infinite execution σ is said to treat the pair (P, E) fairly, if (P, E) is executed
infinitely often in σ, unless (P, E) is enabled for only finitely many times. In an unified
view, every fairness notion is about how fair (P, E)-pairs should be treated by infinite
executions. For example, EF requires that any expressible (P, E)-pair be fairly treated
while PF only requires that (P, E)-pairs in the parameter pel be fairly treated, and this
explains why it is called parametric fairness.
Common sense tells us that a fair execution σ should make nondeterministic choices
randomly. However, standard fairness notions such as WF and SF fail to capture this
intuition. Consider the concurrent system defined in Fig. 4 and Fig. 5.
Fig. 4 is a diagram of the concurrent system cs2 formally defined in Fig. 5, where
diagram states are numbered with the value of function F 2 . Any intuitively fair execu-
tion starting from state 2 should finally get into state 0. If a fairness notion ?F correctly
captures our intuition, the following property should be valid:
[[cs2 σ; ?F cs2 σ]] =⇒ σ |= (λ τ . F τ = 2 → ♦ λ τ . F τ = 0) (1)
Liveness Reasoning with Isabelle/HOL 489

datatype Evt = e0 | e1 | e2 | e3 | e4

fun F2 :: Evt list ⇒ nat

where
F2 [] = 3 |
F2 (e0 # τ ) = (if (F2 τ = 1) then 0 else
e2 e1 if (F2 τ = 2) then 3 else F2 τ ) |
F2 (e1 # τ ) = (if (F2 τ = 2) then 1 else F2 τ ) |
F2 (e2 # τ ) = (if (F2 τ = 3) then 2 else F2 τ ) |
3 2 1 e0 0
F2 (e4 # τ ) = (if (F2 τ = 1) then 2 else F2 τ )

e0 e4 inductive-set cs2 :: (Evt list × Evt) set

where
Fig. 4. The diagram of cs2 r0 : F2 τ = 1 =⇒ (τ , e0 ) ∈ cs2 |
r1 : F2 τ = 2 =⇒ (τ , e1 ) ∈ cs2 |
r2 : F2 τ = 3 =⇒ (τ , e2 ) ∈ cs2 |
r3 : F2 τ = 2 =⇒ (τ , e0 ) ∈ cs2 |
r4 : F2 τ = 1 =⇒ (τ , e4 ) ∈ cs2

Fig. 5. The definition of cs2

Unfortunately, neither WF nor SF serves this purpose. Consider the execution (e2 . e1 .
e4 . e0 )ω , which satisfies both WF cs2 and SF cs2 while violating the conclusion of (1).
The deficiency of standard fairness notions such as SF and WF is their failure to
specify the association between helpful states and helpful events explicitly. For exam-
ple, SF only requires any infinitely enabled event be executed infinitely often. Even
though execution (e2 . e1 . e4 . e0 )ω satisfies SF, it is not intuitively random in that it is
still biased towards avoiding the helpful event e0 under the corresponding helpful state
1. Therefore, (1) is not valid if ?F is instantiated either to SF or WF. Extreme fairness
was proposed by Pnueli [12] to solve this problem. The EF is a direct expression of ex-
treme fairness in HOL, which requires that all expressible (P,E)-pairs be fairly treated.
Execution (e2 . e1 . e4 . e0 )ω does not satisfy EF, because the (P, E)-pair (λτ . F 2 τ = 1,
λτ . e1 ) is not fairly treated. In fact, (1) is valid if ?F is instantiated to EF.
Unfortunately, a direct translation of extreme fairness in HOL is problematic, be-
cause the universal quantification over P, E may accept any well-formed expression of
the right type. Given any nondeterministic execution σ, it is possible to construct a pair
(Pσ , Eσ ) which is not fairly treated by σ. Accordingly, any nondeterministic execution
σ is not EF, as confirmed by the following lemma:

σ |= ♦(λσ i. {e. [[σ]]i [cs> e ∧ e = σ i } = {}) =⇒ ¬ EF cs σ (2)

The premise of (2) formally expresses that σ is nondeterministic. The construction of

(Pσ , Eσ ) is a diagonal one which makes a choice different from the one made by σ on
every nondeterministic prefix. Detailed proof of (2) can be found in [13].
Now, since most infinite executions of a concurrent system are nondeterministic, EF
will rule out almost all executions. If EF is used as the fairness premises of a liveness
property, the overall statement is practically meaningless. Parametric fairness PF is
proposed to solve the problem of EF. Instead of requiring all (P, E)-pairs be fairly
490 J. Wang, H. Yang, and X. Zhang

treated, PF only requires (P, E)-pairs appearing in its parameter pel be fairly treated.
Since there are only finitely many (P, E)-pairs in pel, most executions of a concurrent
system are kept, even if every (P, E)-pair in pel rules out some measurement of them.
Section 6 will make this argument precise by establishing a probabilistic model for PF.

5 Liveness Rules
According to Manna [3], response properties are of the form σ |= (P !→ ♦Q),
where P and Q are past formulae (in [3]’s term), obtained by lifting predicates on
finite traces. The conclusion of (1) is of this form. Reactivity properties are of the form
σ |= (♦P) !→ (♦Q) meaning: if P holds infinitely often in σ, then Q holds
infinitely often in σ as well.
The proof rule for response property is the theorem resp-rule:
[[RESP cs F E N P Q; cs σ; PF cs {|F, E, N|} σ]] =⇒ σ |= P → ♦Q

and the proof rule for reactivity property is the theorem react-rule:
[[REACT cs F E N P Q; cs σ; PF cs {|F, E, N|} σ]] =⇒ σ |= ♦P → ♦Q

The symbols used in these two theorems are given in Fig. 6.

consts pel-of :: ( a list ⇒ nat) ⇒ ( a list ⇒ a) ⇒ nat ⇒

(( a list ⇒ bool) × ( a list ⇒ a)) list ({|-, -, -|} [64, 64, 64] 1000)
primrec {|F, E, 0|} = []
{|F, E, (Suc n)|} = (λ τ . F τ = Suc n, E) # {|F, E, n|}

syntax -drop :: a list ⇒ nat ⇒ a list ()-*- [64, 64] 1000)

translations )l*n drop n l

)P −→¬Q ∗* ≡ λ τ . (∃ i ≤ |τ |. P )τ *i ∧ (∀ k. 0 < k ∧ k ≤ i −→ ¬ Q )τ *k ))

locale RESP =
fixes cs :: ( a list × a) set and F :: a list ⇒ nat and E :: a list ⇒ a
and N :: nat and P :: a list ⇒ bool and Q :: a list ⇒ bool
assumes mid: [[cs τ ; )P −→¬Q ∗* τ ; ¬ Q τ ]] =⇒ 0 < F τ ∧ F τ < N
assumes fd: [[cs τ ; 0 < F τ ]] =⇒ τ [cs> E τ ∧ F (E τ # τ ) < F τ

locale REACT =
fixes cs :: ( a list × a) set and F :: a list ⇒ nat and E :: a list ⇒ a
and N :: nat and P :: a list ⇒ bool and Q :: a list ⇒ bool
assumes init: [[cs τ ; P τ ]] =⇒ F τ < N
assumes mid: [[cs τ ; F τ < N; ¬ Q τ ]] =⇒ τ [cs> E τ ∧ F (E τ # τ ) < F τ

Fig. 6. Premises of Liveness Rules

Let’s explain the resp-rule first. Premise RESP cs F E N P Q expresses constraints on

cs’s state transition diagram. RESP is defined as a locale predicate, where assumption
mid requires: for any state τ , after reaching P before reaching Q (characterized by
Liveness Reasoning with Isabelle/HOL 491

4P −→¬Q ∗5 τ ), the value of F τ is between 0 and N; assumption fd requires that if F

τ is between 0 and N, event E τ must be eligible to happen under τ and the happening
of it will decrease the value of function F. For any state satisfying 4P −→¬Q ∗5 τ , by
repeatedly applying fd, a path leading from τ to Q can be constructed. If every pair in
list [(λ τ . F τ = n, E) | n ∈ {1 . . . N}] is fairly treated by σ, σ will follow this path and
eventually get into a Q state. This fairness requirement is expressed by the premise PF
cs {|F, E, N|} σ, where {|F, E, N|} evaluates to [(λ τ . F τ = n, E) | n ∈ {1 . . . N}].
As an example, when resp-rule is used to prove statement (1), the P is instantiated
to (λ τ . F 2 = 2), and Q to (λ τ . F 2 = 0). The F is instantiated to F 2 , and E to the
following E2 :
constdefs E 2 :: Evt list ⇒ Evt
E 2 τ ≡ (if (F 2 τ = 3) then e2 else if (F 2 τ = 2) then e1 else if (F 2 τ = 1) then e0 else e0 )
The N is instantiated to 3, so that {|F 2 , E2 , 3|} evaluates to [(λ τ . F 2 τ =3, E), (λ τ .
F 2 τ =2, E), (λ τ . F 2 τ =1, E)].
Now, let’s explain rule react-rule. Premise PF cs {|F, E, N|} σ still has the same
meaning. Since now P-states are reached infinitely often, a condition much stronger
than in resp-rule, premise REACT can be weaker, which only requires there exists a
path leading to Q from every P state, while RESP requires the existence of such a path
on every 4P −→¬Q ∗5 state.

6 Probabilistic Model for PF

6.1 Some General Measure Theory
The definition of probability space given in Fig. 7 is rather standard, where U is the base
set, F the measurables, Pr the measure function. The definition of measure space uses
standard notions such as σ-algebra, positivity and countable additivity. In the definition
of countable
ω additivity, we use Isabelle library function sums, where ‘f sums c’ stands
for n=0 f (n) = c.
Carathéodory’s extension theorem [14,15] is the standard way to construct probabil-
ity space. The theorem is proved using Isabelle/HOL:
[[algebra (U, F); positive (F, Pr); countably-additive (F, Pr)]]
(3)
=⇒ ∃ P. (∀ A. A ∈ F −→ P A = Pr A) ∧ measure-space (U, sigma (U, F), P)

where the definition of sigma is in Fig. 8.

6.2 Probability Space on Infinite Executions

Elements used to construct a measure space for infinite executions are defined in
Fig. 7 as locale RCS, which accepts a parameter R, where R(τ , e) is the probability
of choosing event e to happen under state τ . The definition of RCS is given in Fig.
9. The purpose of RCS is to define a measure space (PA, Path, μ) on infinite exe-
cutions in terms of parameter R. The underlying concurrent system is given by CS.
Set N τ contains all events eligible to happen under state τ . Function π is a mea-
sure function on finite executions. Path is the set of valid infinite executions of CS.
492 J. Wang, H. Yang, and X. Zhang

consts algebra :: ( a set × a set set) ⇒ bool

algebra (U, F) = (F ⊆ Pow(U) ∧ {} ∈ F ∧ (∀ a∈F. (U − a) ∈ F) ∧
(∀ a b. a ∈ F ∧ b ∈ F −→ a ∪ b ∈ F))

consts sigma-algebra ::( a set × a set set) ⇒ bool

sigma-algebra(U, F) = (F ⊆ Pow(U) ∧ U ∈ F ∧ (∀ a ∈ F. U − a ∈ F) ∧
(∀ a. (∀ i::nat. a(i) ∈ F) −→ ( i. a(i)) ∈ F))

consts positive:: ( a set set × ( a set ⇒ real)) ⇒ bool

positive(F, Pr) = (Pr {} = 0 ∧ (∀ A. A ∈ F −→ 0 ≤ Pr A))

consts countably-additive:: ( a set set × ( a set ⇒ real)) ⇒ bool

countably-additive(F, Pr) = (∀ f::(nat ⇒ a set). range(f) ⊆ F ∧
(∀ m n. m = n −→ f(m) ∩ f(n) = {}) ∧ ( i. f(i)) ∈ F
−→ (λn. Pr(f(n))) sums Pr ( i. f(i)))

consts measure-space:: ( a set × a set set × ( a set ⇒ real)) ⇒ bool

measure-space (U, F, Pr) = (sigma-algebra (U, F) ∧ positive (F, Pr) ∧
countably-additive (F, Pr))

consts prob-space:: ( a set × a set set × ( a set ⇒ real)) ⇒ bool

prob-space (U, F, Pr) = (measure-space (U, F, Pr) ∧ Pr U = 1)

Fig. 7. Definition of Probability Space

Set palgebra-embed([τ 0 ,. . .,τ n ]) contains all infinite executions prefixed by some τ in

[τ 0 , τ 1 ,. . .,τ n ], and palgebra-embed([τ 0 ,. . .,τ n ]) is said to be supported by [τ 0 ,. . .,τ n ].
The measure μ S is defined in terms of measures on supporting sets for S.
It is proved that algebra (Path, PA), positive (PA, μ) and countably-additive (PA, μ).
Applying Carathéodory’s extension theorem to these gives (PA, Path, μ) as a measure
space on infinite executions of CS. It can be further derived that (PA, Path, μ) is a
probability space.

inductive-set sigma :: ( a set × a set set) ⇒ a set set for M::( a set × a set set) where
basic: (let (U, A) = M in (a ∈ A)) =⇒ a ∈ sigma M|
empty: {} ∈ sigma M|
! a ∈ sigma M =⇒ (let (U,
complement: A) = M in U − a) ∈ sigma M|
union: ( i::nat. a i ∈ sigma M) =⇒ ( i. a i) ∈ sigma M

Fig. 8. Definition of the sigma Operator

6.3 The Probabilistic Meaning of PF

In Fig. 10, locale BTS is proposed as a refinement of RCS. The motivation of BTS is to
provide a low bound bnd for R. Function P is the probability function on execution sets.
Based on BTS, the following theorem can be proved and it gives a meaning to PF:
[[BTS R bnd; set pel = {}]]
Theorem 1.
=⇒ BTS.P R {σ ∈ RCS.Path R. PF {(τ , e). 0 < R (τ , e)} pel σ} = 1
Liveness Reasoning with Isabelle/HOL 493

locale RCS =
fixes R :: ( a list × a) ⇒ real
assumes Rrange: 0 ≤ R(τ , e) ∧ R(τ , e) ≤ 1
fixes CS :: ( a list × a) set
defines CS-def: CS ≡ {(τ , e) . 0 < R(τ , e)}
fixes N :: a list ⇒ a set
defines N-def: N τ ≡ {e. 0 < R(τ , e)}
assumes Rsum1: CS τ =⇒ ( e ∈ (N τ ). R(τ ,e)) = 1
begin
fun π :: a list ⇒ real where
π [] = 1 |
π (e#τ ) = R(τ , e) ∗ π τ

definition Path:: a seq set where

Path ≡ {σ. (∀ i. π [[σ]]i > 0)}

definition palg-embed :: a list ⇒ a seq set where

palg-embed τ ≡ {σ ∈ Path. [[σ]]|τ | = τ }

fun palgebra-embed :: a list list ⇒ a seq set where

palgebra-embed [] = {} |
palgebra-embed (τ #l) = (palg-embed τ ) ∪ palgebra-embed l

definition PA :: a seq set set where

PA ≡ {S. ∃ l. palgebra-embed l = S ∧ S ⊆ Path}

palg-measure :: a list list ⇒ real (μ0 ) where

definition
μ0 l ≡ ( τ ∈ set l. π τ )

definition palgebra-measure:: a list list ⇒ real (μ1 ) where

μ1 l ≡ inf (λr. ∃ l . palgebra-embed l = palgebra-embed l ∧ μ0 l = r)

definition μ:: a seq set ⇒ real where

μ S ≡ sup (λr. ∃ b. μ1 b = r ∧ (palgebra-embed b) ⊆ S)
end

Fig. 9. Formalization of Probabilistic Execution

Theorem 1 shows almost all executions are fair in the sense of PF. It says that the set
of valid executions satisfying PF has probability 1. The underlying concurrent system
is {(τ ,e).0<R(τ ,e)}, which is the expansion of CS in RCS. The probability function
BTS.P R is the one derived from R using BTS. The set RCS.Path R contains all valid
infinite executions of the underlying CS.

6.4 The Proof of Theorem 1

For the proof of theorem 1, a generalized notion of fairness GF is introduced in

Fig. 11. GF is defined as a locale accepting three parameters, cs a concurrent system,
L a countable set of labels, and l is a labeling function. A label ι is said to be enabled
in state τ (written as enabled ι τ ) if there exists some event e eligible to happen under
state τ and ι belongs to l(τ ,e). A label ι is said to be taken in state τ if ι is assigned
494 J. Wang, H. Yang, and X. Zhang

locale BTS = RCS +

fixes bnd :: real
assumes system-bound: 0 < bnd ∧ bnd ≤ 1
and bound-imply: R(τ , e) > 0 =⇒ R(τ , e) ≥ bnd
definition P::( a seq set ⇒ real) where
P ≡ (SOME P ::( a seq set ⇒ real).
(∀ A. A ∈ PA −→ P A = μ A) ∧ measure-space (Path, sigma (Path, PA), P ))

Fig. 10. The definition of locale BT S

to the last execution step of τ , which is denoted by hd(τ ), and the state before the last
step is denoted by tl(τ ). Label ι is said to be fairly treated by an infinite execution, if ι
is taken infinite many times whenever it is enabled infinite many times in σ. An infinite
execution is fair in the sense of GF, if all labels in set L are fairly treated. By instantiat-
ing label set L with the set of elements on pel(denoted by set pel), and label function l
with function lf where lf (τ , e) = {(Q, E)| (Q, E) ∈ (set pel) ∧ Q τ ∧ e = (E τ )}, it
can be shown PF is just an instance of GF:

PF cs pel σ = GF.fair cs (set pel) (lf pel) σ (4)

It is proved that the probability of GF fair execution equals 1:

BTS.P R {σ ∈ RCS.Path R. GF.fair {(τ , e). 0 < R (τ , e)} L l σ} = 1 (5)

The combination of (4) and (5) gives rise to Theorem 1. The intuition behind (4)
it that (P, E)-pairs in PF can be seen as labels in GF. To prove (5), it is sufficient to
prove the probability of unfair executions equals to 0. In turn, it is sufficient to prove
the probability of unfair executions with respect to any one label ι equals to 0, because

locale GF =
fixes cs:: ( a list × a) set
fixes L :: b set
assumes countable-L: countable L
and non-empty-L: L = {}
fixes l :: a list × a ⇒ b set
assumes subset-l: l(τ ,e) ⊆ L
begin
definition enabled:: b ⇒ a list ⇒ bool where enabled ι τ ≡ (∃ e. (ι ∈ l(τ , e) ∧ (τ , e) ∈ cs))

definition taken :: b ⇒ a list ⇒ bool where taken ι τ ≡ ι ∈ l(tl(τ ), hd(τ ))

definition fairι :: b ⇒ (nat ⇒ a) ⇒ bool where

fairι ι σ ≡ σ |= ♦(λσ i. enabled ι [[σ]]i ) −→ σ |= ♦(λσ i. taken ι [[σ]]Suc(i) )

definition fair :: (nat ⇒ a) ⇒ bool where fair σ ≡ ∀ ι ∈ L. fairι ι σ

end

Fig. 11. A general fairness definition

Liveness Reasoning with Isabelle/HOL 495

the number of labels is countable. If label ι is treated unfairly by execution σ, σ must

avoid ι infinite many times, with every avoidance reduce the probability by a certain
proportion. This series of avoidances finally reduce the probability to 0.

7 Liveness Proof Experiments

7.1 Elevator Control System
In [7], Yang developed a formal verification of elevator control system. The events of
an elevator control system are: Arrive p m n: User p arrives at floor m, planning to go
to floor n; Enter p m n: User p enters elevator at floor m, planning to go to floor n; Exit
p m n: User p gets off elevator at floor n, m is the floor, where user p entered elevator;
Up n: A press of the -button on floor n; Down n: A press of the -button on floor n;
To n: A press of button n on elevator’s control panel; StartUp n: Elevator starts moving
upward from floor n; StartDown n: Elevator starts moving down from floor n; Stop n m:
Elevator stops at floor m, before stopping, the elevator is moving from floor n to floor m;
Pass n m: Elevator passes floor n without stopping, before passing floor m, the elevator
is moving from floor n to floor m.
The proved liveness property for the elevator control system is:
[[elev-cs σ;
PF elev-cs
({|F p m n, E p m n, 4 ∗ H + 3|} @ {|FT p m n, ET p m n, 4 ∗ H + 3|}) σ]]
=⇒ σ |= (|Arrive p m n|)→♦(|Exit p m n|)
The conclusion says: if user p arrives at floor m and wants to go to floor n (represented
by the happening of event Arrive p m n), then he will eventually get there (represented
by the happening of event Exit p m n).
The liveness proof splits into two stages. The first is to prove once user p arrives,
it will eventually get into elevator, the second is to prove once p gets into elevator, it
will eventually get out of elevator at its destination. Both stages use rule resp-rule. The
definition of cs, P and Q is obvious. The difficult part is to find a proper definition of F
and E so that premise RESP cs F E N P Q can be proved. We explain the finding of F
and E by example of a 5-floor elevator system, the state diagram of which is shown in
Fig. 12. Helpful transitions are represented using normal lines while unhelpful transi-
tions with dotted lines.
The arrival of p (event Arrive p 2 4) brings system from state Arrive p 2 4 ∈ / arr-set τ
to Arrive p 2 4 ∈ arr-set τ ∧ 2 ∈
/ up-set τ . To call the elevator, p will press the -button
(event Up 2), and this will bring system into macro-state Arrive p 2 4 ∈ arr-set τ ∧ 2 ∈
up-set τ , which contains many sub-states characterized by the status of elevator. Status
(n, m) means moving from floor n to m, while d(n,n) means stopped on floor n in down
passes, u(n,n) in up passes. It can be seen that helpful transitions in Fig. 12 form a
spanning tree rooted at the desirable state. Premise RESP cs F E N P Q is valid if F τ
evaluates to the height of τ in the spanning tree and E τ evaluates to the helpful event
leading out of τ . Fig. 12 shows such F and E can be defined. Details of the definition
can be found in [7]. Once F and E is given, the rest of the proof is straight forward.
Statistics in Table 1 may give some idea the amount of work required.
496 J. Wang, H. Yang, and X. Zhang

Arrive p 2 4 ∈
/ arr set τ
Arrive p 2 4

Arrive p 2 4 ∈ arr set τ ∧

2 ∈
/ up set τ
Up 2

Arrive p 2 4 ∈ arr set τ ∧

Enter p 2 4 u(4, 4) 2 ∈ up set τ Enter p 2 4
Stop 3 4
StartDown 4
(3, 4) (4, 3)
StartU p 3
StartU p 3 Stop 4 3
P ass 2 3 u(3, 3) d(3, 3) P ass 4 3
StartDown 3
Stop 2 3 StartDown 3
(2, 3) (3, 2)
StartU p 2
StartU p 2 Stop 3 2
P ass 1 2 u(2, 2) d(2, 2) P ass 3 2
StartDown 2
Stop 1 2 StartDown 2
(1, 2) (2, 1)
StartU p 1
StartU p 1 Stop 2 1
P ass 0 1 u(1, 1) d(1, 1) P ass 2 1
StartDown 1
Stop 0 1 StartDown 1
(0, 1) (1, 0)

StartU p 0 Stop 1 0
d(0, 0)

Fig. 12. The state transition diagram of elevator for Arrive p 2 4

Table 1. Summary of Isabelle/HOL proof scripts

Nr. of
Nr. of Nr. of
Contents working
lines lemmas
days
Definitions of the elevator control system, F, E, FT and ET ≤ 200 – *
Safety properties 579 12 2
First stage of the proof 2012 10 5
Second stage of the proof 1029 8 3

7.2 Mobile Ad Hoc Secure Routing Protocols

In mobile Ad Hoc networks, nodes keep moving around in a certain area while com-
municating over wireless channels. The communication is in a peer-to-peer manner,
with no central control. This makes the network very susceptible to malicious attacks.
Therefore, security is an important issue in Mobile Ad Hoc networks. Secure routing
protocol is chosen as our verification target.
Liveness Reasoning with Isabelle/HOL 497

To counteract attacks, specialized procedures are used by secure routing protocols

together with cryptographic mechanisms such as public key and hashing. The goal of
verification is to make sure the procedures combined with cryptographic mechanisms
do not collapse even in highly adversary environments.
Attackers are modeled using the same method as Paulson’s [1]. The correctness of
routing protocol is verified by combining it with an upper transfer layer. It is proved that
the transfer protocol can deliver every user packet with the support of the underlying
routing protocol. This is a typical liveness property, which says if a user packet gets
into the transfer layer at one end it will eventually get out at the other end, just like an
elevator passenger.
Because nodes are moving around, existing routes between nodes constantly become
obsolete, in which case routing protocols have to initiate route finding procedure, this
procedure may be interfered by attacks aimed at slowing down or even collapsing the
network.
The verification is like the one for elevator but on a much larger scale. We have man-
aged to find definitions of F and E, which essentially represent spanning trees covering
all relevant states like the one in Fig. 12. The proof is laborious but doable, suggesting
that specialized tactics may be needed.
We have verified two secure routing protocols, details can be found in [8, 9]. The
work in [8,9] confirmed the practicability of our approach on one hand and the resilience
of the security protocol on the other hand.

8 Related Works

Approaches for verification of concurrent systems can roughly be divided into theorem
proving and model checking. The work in this paper belongs to the theorem proving
category, which can deal with infinite state systems.
The purpose of this paper is to extend Paulson’s inductive protocol verification ap-
proach [1, 2] to deal with general liveness properties, so that it can be used as a general
protocol verification approach. The notion of PF and the corresponding proof rules
were first proposed by Zhang in [13]. At the same time, Yang developed a benchmark
verification for elevator control system to show the practicality of the approach [7].
Later, Yang used the approach serious to verify the liveness properties of Mobile Ad
Hoc network protocols [8, 9]. These works confirm the practicality of the extension.
The liveness rules in this paper relies on a novel fairness notion PF, an adaption
of the α-fairness [4, 5, 16] to suit the setting of HOL. The use of PF can derive more
liveness properties and usually the proofs are simpler than using the standard WF and
SF.
According to Baier’s work [5], PF should have a sensible probabilistic meaning. To
confirm this, Wang established a probabilistic model for PF, to show that the measure
of PF executions equals 1 [17]. The formalization of this probabilistic model is deeply
influenced by Hurd and Stefan [14, 15], however, due to different type restrictions from
[14], and the need of extension theorem which is absent from [15], most proofs have to
be done from scratch. Wang’s work established the soundness of PF.
498 J. Wang, H. Yang, and X. Zhang

9 Conclusion
The advantage of theorem proving over model checking is its ability to deal with infinite
state systems. Unfortunately, relatively less work is done to verify liveness properties
using theorem proving. This paper improves the situation by proposing an extension
of Paulson’s inductive approach for liveness verification and showing the soundness,
feasibility and practicality of the approach.
The level of automation in our work is still low compared to model checking. One
direction for further research is to develop specialized tactics for liveness proof. Addi-
tionally, the verification described in this paper is still at abstract model level. Another
important direction for further research is to extend it to code level verification.

References
1. Paulson, L.C.: The inductive approach to verifying cryptographic protocols. Journal of Com-
puter Security 6(1-2), 85–128 (1998)
2. Paulson, L.C.: Inductive analysis of the Internet protocol TLS. ACM Transactions on Com-
puter and System Security 2(3), 332–351 (1999)
3. Manna, Z., Pnueli, A.: Completing the temporal picture. Theor. Comput. Sci. 83(1), 91–130
(1991)
4. Pnueli, A., Zuck, L.D.: Probabilistic verification. Information and Computation 103(1), 1–29
(1993)
5. Baier, C., Kwiatkowska, M.: On the verification of qualitative properties of probabilistic
processes under fairness constraints. Information Processing Letters 66(2), 71–79 (1998)
6. Jaeger, M.: Fairness, computable fairness and randomness. In: Proc. 2nd International Work-
shop on Probabilistic Methods in Verification (1999)
7. Yang, H., Zhang, X., Wang, Y.: Liveness proof of an elevator control system. In: The ‘Emerg-
ing Trend’ of TPHOLs, Oxford University Computing Lab. PRG-RR-05-02, pp. 190–204
(2005)
8. Yang, H., Zhang, X., Wang, Y.: A correctness proof of the srp protocol. In: 20th International
Parallel and Distributed Processing Symposium (IPDPS 2006), Proceedings, Rhodes Island,
Greece, April 25-29 (2006)
9. Yang, H., Zhang, X., Wang, Y.: A correctness proof of the dsr protocol. In: Cao, J.,
Stojmenovic, I., Jia, X., Das, S.K. (eds.) MSN 2006. LNCS, vol. 4325, pp. 72–83. Springer,
Heidelberg (2006)
10. Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle/HOL — A Proof Assistant for Higher-Order
Logic. LNCS, vol. 2283. Springer, Heidelberg (2002)
11. Wenzel, M.: Isar - a generic interpretative approach to readable formal proof documents. In:
Nipkow, T., Paulson, L.C., Wenzel, M.T. (eds.) Isabelle/HOL. LNCS, vol. 2283. Springer,
Heidelberg (2002)
12. Pnueli, A.: On the extremely fair treatment of probabilistic algorithms. In: Proceedings of
the fifteenth annual ACM symposium on Theory of computing, pp. 278–290. ACM Press,
New York (1983)
13. Zhang, X., Yang, H., Wang, Y.: Liveness reasoning for inductive protocol verification. In:
The ‘Emerging Trend’ of TPHOLs, Oxford University Computing Lab. PRG-RR-05-02, pp.
221–235 (2005)
14. Hurd, J.: Formal Verification of Probabilistic Algorithms. Ph.D thesis, University of Cam-
bridge (2002)
Liveness Reasoning with Isabelle/HOL 499

15. Richter, S.: Formlizing integration theory with an application to probabilistic algorithms. In:
Slind, K., Bunker, A., Gopalakrishnan, G.C. (eds.) TPHOLs 2004. LNCS, vol. 3223, pp.
271–286. Springer, Heidelberg (2004)
16. Pnueli, A.: On the extremely fair treatment of probabilistic algorithms. In: ACM (ed.) Pro-
ceedings of the 15th annual ACM Symposium on Theory of Computing, Boston, Mas-
sachusetts, April 25–27, pp. 278–290. ACM Press, New York (1983)
17. Wang, J., Zhang, X., Zhang, Y., Yang, H.: A probabilistic model for parametric fairness
in isabelle/hol. Technical Report 364/07, Department of Computer Science, University of
Kaiserslautern (2007)
Mind the Gap
A Veriﬁcation Framework for Low-Level C

Simon Winwood1,2 , Gerwin Klein1,2 , Thomas Sewell1 , June Andronick1 ,

David Cock1 , and Michael Norrish1,3

NICTA , Australia
1
2
School of Computer Science and Engineering, UNSW, Sydney, Australia
3
Computer Sciences Laboratory, ANU, Canberra, Australia
{first-name.last-name}@nicta.com.au

Abstract. This paper presents the formal Isabelle/HOL framework we

use to prove refinement between an executable, monadic specification and
the C implementation of the seL4 microkernel. We describe the refinement
framework itself, the automated tactics it supports, and the connection to
our previous C verification framework. We also report on our experience
in applying the framework to seL4. The characteristics of this microkernel
verification are the size of the target (8,700 lines of C code), the treatment
of low-level programming constructs, the focus on high performance, and
the large subset of the C programming language addressed, which includes
pointer arithmetic and type-unsafe code.

1 Introduction
The seL4 kernel [10] is a high-performance microkernel in the L4 family [18], tar-
geted at secure, embedded devices. In verifying such a complex and large – 8,700
lines of C – piece of software, scalability and separation of concerns are of the ut-
most importance. We show how to achieve both for low-level, manually optimised,
real-world C code.
Fig. 1 shows the layers and
Isabelle/HOL
proofs involved in the veriﬁcation
Abstract Specification
of seL4. The top layer is an ab-
stract, operational speciﬁcation of RA

seL4; the middle layer is an exe- Executable Specification Haskell Prototype

cutable speciﬁcation derived auto-
RC
matically [8, 11] from a working Automatic Translation

Haskell prototype of the kernel; High-Performance C Implementation

Refinement Proof
the bottom layer is a hand-written
and hand-optimised C implemen-
tation. The aim is to connect the
Fig. 1. Reﬁnement steps in L4.veriﬁed
three layers by formal proof in Is-
abelle/HOL [21].

NICTA is funded by the Australian Government as represented by the Department of
Broadband, Communications and the Digital Economy and the Australian Research
Council through the ICT Centre of Excellence program.

S. Berghofer et al. (Eds.): TPHOLs 2009, LNCS 5674, pp. 500–515, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Mind the Gap 501

Previously, we presented a veriﬁcation framework [5] for proving reﬁnement be-

tween the abstract and executable specifications. This paper presents the frame-
work for the second refinement step: the formal, machine-checked proof that the
high-performance C implementation of seL4 correctly implements the executable
specification.
With these two refinement steps, we manage to isolate two aspects of the
verification of seL4. In the first refinement step, which we call RA , we dealt
mostly with semantic concepts such as relationships between data structures
and system-global conditions for safe execution. We estimate that 80% of the
effort in RA was spent on such invariants. In the second refinement step, RC ,
the framework we present in this paper allows us to reduce our proof effort and
to reuse the properties shown in RA . The first refinement step established that
the kernel design works, the second closes the gap to C.

Paper Structure. We begin with an example that sketches the details of a typ-
ical kernel function. We then explain how the components of the verification
framework fit together, summarising relevant details of our earlier work on
the monadic, executable specification [5], and on our C semantics and mem-
ory model [25,26,27]. In particular, we describe the issues involved in converting
the C implementation into Isabelle/HOL. The main part of the paper shows the
refinement framework with its fundamental definitions, rules, and automated
tactics. We demonstrate the framework’s performance by reporting on our expe-
rience so far in applying it to the verification of substantial parts of the seL4 C
implementation (474 out of 518 functions, 91%).

2 Example

The seL4 kernel [10] provides the following operating system kernel services: inter-
process communication, threads, virtual memory, access control, and interrupt
control. In this section we present a typical function, cteMove, with which we
will illustrate the verification framework.
Access control in seL4 is based on capabilities. A capability contains an object
reference along with access rights. A capability table entry (CTE) is a kernel
data structure with two fields: a capability and an mdbNode. The latter is book-
keeping information and contains a pair of pointers which form a doubly linked
list.
The cteMove operation, shown in Fig. 2, moves a capability table entry from
src to dest. The left-hand side of the figure shows the executable specification in
Isabelle/HOL, while the right-hand side shows the corresponding C code.
The first 6 lines in Fig. 2 initialise the destination entry and clear the source
entry; the remainder of the function updates the pointers in the doubly linked
list. During the move, the capability in the entry may be diminished in access
rights. Thus, the argument cap is this possibly diminished capability, previously
retrieved from the entry at src.
502 S. Winwood et al.

cteMove cap src dest ≡ void cteMove (cap_t newCap,

do cte_t *srcSlot, cte_t *destSlot){
cte ← getCTE src; mdb_node_t mdb; uint32_t prev_ptr, next_ptr;
mdb ← return (cteMDBNode cte); mdb = srcSlot->cteMDBNode;
updateCap dest cap; destSlot->cap = newCap;
updateCap src NullCap; srcSlot->cap = cap_null_cap_new();
updateMDB dest (const mdb); destSlot->cteMDBNode = mdb;
updateMDB src (const nullMDBNode); srcSlot->cteMDBNode = nullMDBNode;
prev_ptr = mdb_node_get_mdbPrev(mdb);
updateMDB if(prev_ptr) mdb_node_ptr_set_mdbNext(
(mdbPrev mdb) &CTE_PTR(prev_ptr)->cteMDBNode,
(λm. m (| mdbNext := dest |)); CTE_REF(destSlot));
next_ptr = mdb_node_get_mdbNext(mdb);
updateMDB if(next_ptr) mdb_node_ptr_set_mdbPrev(
(mdbNext mdb) &CTE_PTR(next_ptr)->cteMDBNode,
(λm. m (| mdbPrev := dest |)) CTE_REF(destSlot));
od }

Fig. 2. cteMove: executable speciﬁcation and C implementation

In this example, the C source code is structurally similar to the executable

specification. This similarity is not accidental: the executable specification de-
scribes the low-level design with a high degree of detail. Most of the kernel
functions exhibit this property. Even so, the implementation here makes a small
optimisation: in the specification, updateMDB always checks that the given
pointer is not NULL. In the implementation this check is done for prev ptr
and next ptr – which may be NULL – but omitted for srcSlot and destSlot.
In verifying cteMove we will have to prove these checks are not required.

3 The Executable Speciﬁcation Environment

Operations in the executable specification of seL4, such as cteMove, are writ-
ten in a monadic style inspired by Haskell. The type constructor a kernel is a
monad representing computations returning a value of type a; such values can
be injected into the monad using the return :: a ⇒ a kernel operation. The
composition operator, bind :: a kernel ⇒ ( a ⇒ b kernel) ⇒ b kernel, evaluates
the first operation and makes the return value available to the second operation.
The ubiquitous do . . . od syntax seen in Fig. 2 is syntactic sugar for a sequence
of operations composed using bind. There are also operations for accessing and
mutating k-state, the underlying state.
The type a kernel is isomorphic to k-state ⇒ ( a × k-state) set × bool. The
motivation for, and formalisation of, this monad are detailed in earlier work [5].
In summary, we take a conventional state monad and add nondeterminism and
a failure flag. Nondeterminism, required to model some interactions between
kernel and hardware, is modelled by allowing a set of possible outcomes in the
return type. The boolean failure flag is used to indicate unrecoverable errors and
invalid assertions, and is set only by the fail :: a kernel operation. The destructors
mResults and mFailed access, respectively, the set of outcomes and the failure flag
of a monadic operation evaluated at a state.
The specification environment provides a verification condition generator
(VCG) for judgements of the form {|P |} a {|R|}, and a refinement calculus for
Mind the Gap 503

the monadic model. One feature of this calculus is that the refinement property
cannot hold if the failure flag is set by the executable specification, thus RA im-
plies non-failure of the executable level. In particular, this allows all assertions
in the executable specification to be taken as assumptions in the proof of RC .

4 Embedding C
In this section we describe our infrastructure for parsing C into Isabelle/HOL and
for reasoning about the result. The seL4 kernel is implemented almost entirely
in C99 [16]. Direct hardware accesses are encapsulated in machine interface func-
tions, some of which are implemented in ARMv6 assembly. In the veriﬁcation,
we axiomatise the assembly functions using Hoare triples.
Fig. 3 gives an overview of
the components involved. The C-SIMPL
C expressions, guards
C Code
right-hand side shows our in-
C memory model
stantiation of SIMPL [23], a
generic, imperative language
Parser SIMPL
inside Isabelle/HOL. The SIMPL
framework provides a program Operational Semantics

representation, a semantics, and C-SIMPL code VCG

a VCG. This language is generic generic imperative framework

in its expressions and state space. Isabelle/HOL

We instantiate both components

to form C-SIMPL, with a precise Fig. 3. C language framework
C memory model and C expres-
sions, generated by a parser. The left-hand side of Fig. 3 shows this process: the
parser takes a C program and produces a C-SIMPL program.

4.1 The SIMPL Framework

SIMPL provides a data type and semantics for statement forms; expressions
are shallowly embedded. The following is a summary of the relevant SIMPL
syntactic forms, where e represents an expression

c SKIP | ´v :== e | c 1 ; c 2 | IF e THEN c 1 ELSE c 2 FI | WHILE e DO c OD

| TRY c 1 CATCH c 2 END | THROW | Call f | Guard F P c
The semantics are canonical for an imperative language. The Guard F P c
statement throws the fault F if the condition P is false and executes c otherwise.
Program states in SIMPL are represented by Isabelle records. The record con-
tains a ﬁeld for each local variable in the program, and a ﬁeld globals containing
all global variables and the heap. Variables are then simply functions on the
state. SIMPL includes syntactic sugar for dealing with such functions: the term
´srcSlot refers to the local variable srcSlot in the current state. For example, the
set of program states where srcSlot is NULL is described by {|´srcSlot = NULL|}.
504 S. Winwood et al.

The semantics are represented by judgements of the form Γ c,x ⇒ x which

means that executing statement c in state x terminates and results in state x ;
the parameter Γ maps function names to function bodies. Both x and x are ex-
tended states: for normal program states, Normal s, the semantics are as expected;
abrupt termination states (Abrupt s) are propagated until a surrounding TRY . . .
CATCH . . . END statement is reached; and Stuck and Fault u states, generated
by calls to non-existent procedures and failed Guard statements respectively, are
passed through unchanged. Abrupt states are generated by THROW statements
and are used to implement the C statements return, break, and continue.
The SIMPL environment also provides a VCG for partial correctness triples;
Hoare-triples are represented by judgements of the form Γ /F P c C ,A, where
P is the precondition, C is the postcondition for normal termination, A is the
postcondition for abrupt termination, and F is the set of ignored faults; if F is
U, the universal set, then all Guard statements are eﬀectively ignored. Both A
and F may be omitted if empty.

4.2 The Memory Model

Our C subset allows type-unsafe operations including casts. To achieve this
soundly, the underlying heap model is a function from addresses to bytes. This
allows, for example, the C function memset, which sets each byte in a region of
the heap to a given value. We use the abbreviation H for the heap in the current
state; the expression H p reads the object at pointer p, while H(p → v ) updates
the heap at pointer p with value v.
While this model is required for such low-level memory accesses, it is too
cumbersome for routine verification. By extending the heap model with typing
information and using tagged pointers we can lift bytes in the heap into Isabelle
terms. Pointers, terms of type a ptr, are raw addresses wrapped by the polymor-
phic constructor Ptr; the phantom type a carries the type information. Pointers
may be unwrapped via the ptr-val function, which simply extracts the enclosed
address. Struct field addressing is also supported: the pointer &(p→[f ]) refers to
the field f at the address associated with pointer p. The details of this memory
model are described by Tuch et al [27, 26].

4.3 From C to C-SIMPL

The parser translates the C kernel into a C-SIMPL program. This process gen-
erally results in a C-SIMPL program that resembles the input. Here we describe
the C subset we translate, and discuss those cases where translation produces a
result that is not so close to the input.

Our C Subset. As mentioned above, local variables in SIMPL are represented

by record ﬁelds. It is therefore not meaningful to take their address in the frame-
work, and so the ﬁrst restriction of our C subset is that local variables may not
have their addresses taken. Global variables may, however, have their addresses
Mind the Gap 505

taken. As we translate all of the C source at once, the parser can determine ex-
actly which globals do have their addresses taken, and these variables are then
given addresses in the heap. Global variables that do not have their addresses
taken are, like locals, simply fields in the program state. The restriction on local
variables could be relaxed at the cost of higher reasoning overhead.
The other significant syntactic omissions in our C subset are union types, bit-
fields, goto statements, and switch statements that allow cases to fall-through.
We handle union types and bitfields with an automatic code generator [4], de-
scribed in Sect. 6, that implements these types with structs and casts. Further-
more, we do not allow function calls through function pointers and take care
not to introduce a more deterministic evaluation order than C prescribes. For
instance, we translate the side-effecting C expressions ++ and -- as statements.

Internal Function Calls and Automatic Modiﬁes Proofs. SIMPL does

not permit function calls within expressions. If a function call appears within
an expression in the input C, we lift it out and transform it into a function call
that will occur before the expression is evaluated. For example, given a global
variable x, the statement z = x + f(y) becomes tmp = f(y); z = x + tmp,
where tmp is a new temporary variable.
This translation is only sound when the lifted functions are side-effect free:
evaluation of the functions within the original expression is linearised, making
the translated code more deterministic than warranted by the C semantics. The
parser thus generates a VCG “modifies” proof for each function, stating which
global variables are modified by the function. Any function required to be side-
effect free, but not proved as such, is flagged for the verification team’s attention.

Guards and Short-Circuit Expressions. Our parser uses Guard statements

to force veriﬁers to show that potentially illegal conditions are avoided. For
example, expressions involving pointer dereferences are enclosed by guards which
require the pointer to be aligned and non-zero.
Guards are statement-level constructors, so whole expressions accumulate
guards for their sub-expressions. However, C’s short-circuiting expression forms
(&&, || and ?:) mean that sub-expressions are not always evaluated. We trans-
late such expressions into a sequence of if-statements, linearising the evaluation
of the expression. When no guards are involved, the expression in C can become
a C-SIMPL expression, using normal, non-short-circuiting, boolean operators.

Example. While we have shown the C implementation in the example Fig. 2,

reﬁnement is proven between the executable speciﬁcation and the imported
C-SIMPL code. For instance, the assignment mdb = srcSlot->cteMDBNode in
Fig. 2 is translated into the following statement in C-SIMPL
MemGuard &(´srcSlot→[cteMDBNode-C])
(´mdb :== H &(´srcSlot→[cteMDBNode-C]))

The MemGuard constructor abbreviates the alignment and non-NULL conditions

for pointers.
506 S. Winwood et al.

5 Refinement
Our verification goal is to prove refinement between the executable specification
and the C implementation. Specifically, this means showing that the C kernel
entry points for interrupts, page faults, exceptions, and system calls refine the
executable specification’s top-level function callKernel. We show refinement us-
ing a variation of forward simulation [7] we call correspondence: evaluation of
corresponding functions takes related states to related states.
In previous work [5], while proving RA , we found it useful to divide the proof
along the syntactic structure of both programs as far as possible, and then prove
the resulting subgoals semantically. Splitting the proof has two main benefits:
firstly, it is a convenient unit of proof reuse, as the same pairing of abstract
and concrete functions recurs frequently for low-level functions; and secondly, it
facilitates proof development by multiple people. One important feature of this
approach is that preconditions are discovered lazily à la Dijkstra [9]. Rules for
showing correspondence typically build preconditions from those of the premises.
In this section we describe the set of tools and techniques we developed to
ease the task of proving correspondence in RC . First, we give our definition of
correspondence, followed by a discussion of the use of the VCG. We then de-
scribe techniques for reusing proofs from RA to solve proof obligations from the
implementation. Next, we present our approach for handling operations with no
corresponding analogue. Finally, we describe our splitting approach and sketch
the proof of the example.

5.1 The Correspondence Statement

In practice, the definition of corre-
spondence is more complex than Monadic Operation
s P (s', rv)
simply linking related states, as:
(1) verification typically requires
preconditions to hold of the ini- S S r
tial states; (2) we allow early re-
turns from functions and breaks t P' t' xf t'
from loops; and (3) function re- C Operation
turn values must be related.
To deal with early return, we
extend the semantics to lists of Fig. 4. Correspondence
statements, using the judgement
Γ c·hs, s ⇒ x . The statement sequence hs is a handler stack ; it collects
the CATCH handlers which surround usages of the statements return, continue,
and break. If c terminates abruptly, each statement in hs is executed in sequence
until one terminates normally.
Relating the return values of functions is dealt with by annotating the cor-
respondence statement with a return value relation r. Although evaluating a
monadic operation results in both a new state and a return value, functions in
C-SIMPL return values by updating a function-specific local variable; because
Mind the Gap 507

local variables are ﬁelds in the state record, this is a function from the state. We
thus annotate the correspondence statement with an extraction function xf, a
function which extracts the return value from a program state.
The correspondence statement is illustrated in Fig. 4 and deﬁned below
ccorres r xf P P hs a c ≡
∀ (s, t )∈S. ∀ t . s ∈ P ∧ t ∈ P ∧ ¬ mFailed (a s) ∧ Γ c·hs, t ⇒ t

−→ ∃ (s ,rv )∈mResults (a s).
∃ t N . t = Normal t N ∧ (s , t N ) ∈ S ∧ r rv (xf t N )

The deﬁnition can be read as follows: given related states s and t with the
preconditions P and P respectively, if the abstract speciﬁcation a does not fail
when evaluated at state s, and the concrete statement c evaluates under handler
stack hs in extended state t to extended state t , then the following must hold:

1. evaluating a at state s returns some value rv and new abstract state s ;

2. the result of the evaluation of c is some extended state Normal t N , that is,
not Abrupt, Fault, or Stuck;
3. states s and t N are related by the state relation S; and
4. values rv and xf t N – the extraction function applied to the ﬁnal state of c
– are related by r, the given return value relation.

Note that a is non-deterministic: we may pick any suitable rv and s . As men-

tioned in Sect. 3, the proof of RA entails that the executable specification does
not fail. Thus, in the definition of ccorres, we may assume ¬ mFailed (a s).
In practice, this means assertions and other conditions for (non-)failure in the
executable specification become known facts in the proof. For example, the op-
eration getCTE srcSlot in the example in Fig. 2 will fail if there is no CTE at
srcSlot. We can therefore assume in the refinement proof that such an object
exists. Of course, these facts are only free because we have already proven them
in RA .

Example. To prove correspondence for cteMove, we must, after unfolding the

function bodies, show the statement in Fig. 5. The cteMove operation has no
return value, so our extraction function (xf in the definition of ccorres above)
and return relation (r above) are trivial. The specification precondition (P above)
is the system invariant invs, while the implementation precondition (P above)
relates the formal parameters destSlot, srcSlot, and newCap to the specification
arguments dest, src, and cap respectively. As all functions are wrapped in a TRY
. . . CATCH SKIP block to handle return statements, the handler stack is the
singleton list containing SKIP.

5.2 Proving Correspondence via the VCG

Data reﬁnement predicates can, in general [7], be rephrased and solved as Hoare
triples. We do this in our framework by using the VCG after applying the fol-
lowing rule
508 S. Winwood et al.

ccorres (λ- -. True) (λ-. ()) invs

{|´destSlot = Ptr dest ∧ ´srcSlot s = Ptr src ∧ ccap-relation
⎫ cap ´newCap|} [SKIP]
(do ⎪
⎪
cte ← getCTE src; ⎪
⎪
⎪
⎬
mdb ← return (cteMDBNode cte);
cteMove spec.
updateCap dest cap; ⎪
⎪
⎪
⎪
... ⎪
⎭
od) ⎫
(MemGuard &(´srcSlot→[cteMDBNode-C]) ⎪
⎪
(´mdb :== H &(´srcSlot→[cteMDBNode-C]); ⎪
⎬
MemGuard &(´destSlot→[cap-C]) cteMove impl.
(´globals :== H(&(´destSlot→[cap-C]) → ´newCap); ⎪ ⎪
⎪
⎭
. . .)

Fig. 5. The cteMove correspondence statement

∀ s. Γ {t | s ∈ P ∧ t ∈ P ∧ (s, t) ∈ S}
c
{t | ∃ (rv , s )∈mResults (a s). (s , t ) ∈ S ∧ r rv (xf t )}
ccorres r xf P P hs a c

In essence, this rule states that to show correspondence between a and c, for a
given initial specification state s, it is sufficient to show that executing c results
in normal termination where the final state is related to the result of evaluating
a at s. The VCG precondition can assume that the initial states are related and
satisfy the correspondence preconditions.
Use of this rule in verifying correspondence is limited by two factors. Firstly,
the verification conditions produced by the VCG may be excessively large or
complex. Our experience is that the output of a VCG step usually contains a
separate term for every possible path through the target code, and that the
complexity of these terms tends to increase with the path length. Secondly, the
specification return value and result state are existential, and thus outside the
range of our extensive automatic support for showing universal properties of
specification fragments. Fully expanding the specification is always possible, and
in the case of deterministic operations will yield a single state/return value pair,
but the resulting term structure may also be large.
In the case of our example, the goal produced by the VCG has 377 lines
before unfolding the specification and 800 lines afterward. Verifying such non-
trivial functions is made practical by the approach described in the remainder
of this section.

5.3 Local Variable Lifting

The feasibility of proving RC depends heavily on proof reuse from RA . Consider
the following rule for dealing with a guard introduced by the parser (see Sect. 4.3)
ccorres r xf G G hs a c
ccorres r xf (G ∩ cte-at (ptr-val p)) G hs a (MemGuard (λs. p) c)

This states that the proof obligation introduced by MemGuard at the CTE
pointer p can be discharged, assuming that there exists a CTE object on the
Mind the Gap 509

specification side (denoted cte-at (ptr-val p)); this rule turns a proof obligation
from the implementation into an assumption of the specification. There is, how-
ever, one major problem: the pointer p cannot depend on the C state, because
it is also used on the specification side.
To see why this is such a problem, recall that local variables in C-SIMPL are
fields in the state record; any pointer, apart from constants, in the program will
always refer to the state, making the above rule inapplicable; in the example,
the first guard refers to the local variable ´srcSlot.
All is not lost, however: the values in local variables generally correspond to
some value available in the specification. We have developed an approach that
automatically replaces such local variables with new HOL variables representing
their value. Proof obligations which refer to the local variable can then be solved
by facts about the related value from the specification precondition. We call this
process lifting.

Example. If we examine the preconditions to the example proof statement in

Fig. 5, we note the assumption ´srcSlot = Ptr src and observe that srcSlot
depends on the C state. By lifting this local variable and substituting the as-
sumption, we get the following implementation fragment

MemGuard &(Ptr src→[cteMDBNode-C])

(´mdb :== H &(Ptr src→[cteMDBNode-C]);
...

The pointer Ptr src no longer depends on the C state and is a value from the
speciﬁcation side, so the MemGuard can be removed with the above rule.

Lifting is only sound if the behaviour of the lifted code fragment is indistin-
guishable from that of the original code; the judgement d ∼ d[v/f ] states that
replacing applications of the function f in statement d with value v results in
the equivalent statement d . This condition is deﬁned as follows

d ∼ d[v/f ] ≡ ∀ t t . f t = v −→ Γ d ,Normal t ⇒ t = Γ d ,Normal t ⇒ t

This states that d and d must be semantically equivalent, assuming f has

the value v in the initial state. In practice, d depends on a locally bound HOL
variable; in such cases, it will appear as d v.
Lifting is accomplished through the following rule

∀ v . d v ∼ d[v/f ] ∀ v . P v −→ ccorres r xf G G hs a (d v )
ccorres r xf G (G ∩ {s | P (f s)}) hs a d

Note that d , the lifted fragment, appears only in the assumptions; proving the
ﬁrst premise involves inventing a suitable candidate. We have developed tactic
support for automatically calculating the lifted fragment and discharging such
proof obligations, based on a set of syntax-directed proof rules.
510 S. Winwood et al.

5.4 Symbolic Execution

The specification and implementation do not always match: there may be frag-
ments on either side that are artefacts of the particular model. In our example,
it is clear that the complex function getCTE has no direct analogue; the imple-
mentation accesses the heap directly.
In both cases we have rules to symbolically execute the code using the appro-
priate VCG, although we must also show that the fragment preserves the state re-
lation. On the implementation side this case occurs frequently; in the example we
have the cap null cap new, mdb node get mdbNext, and mdb node get mdbPrev
functions. We have developed a tactic which can symbolically execute any side-
effect free function which has a VCG specification. This tactic also takes advan-
tage of variable lifting: the destination local variable is replaced by a new HOL
variable and we gain the assumption that the variable satisfies the function’s
postcondition.

5.5 Splitting
If we examine our example, there is a clear match between most lines. Split-
ting allows us to take advantage of this structural similarity by considering each
match in isolation; formally, given the speciﬁcation fragment do rv ← a; b rv
od and the implementation fragment c; d, splitting entails proving a ﬁrst corre-
spondence between a and c and a second between b and d.
In the case where we can prove that c terminates abruptly, we discard d.
Otherwise, the following rule is used
ccorres r xf P P hs a c ∀ v . d v ∼ d[v/xf ]
∀ rv rv . r rv rv −→ ccorres r xf (Q rv ) (Q rv rv ) hs (b rv ) (d rv )

{|R|} a {|Q|} Γ /U R c {s | ∀ rv . r rv (xf s) −→ s ∈ Q rv (xf s)}

ccorres r xf (P ∩ R) (P ∩ R ) hs (do rv ← a; b rv od) (c; d )

In the second correspondence premise, d is the result of lifting xf in d ; this

enables the proof of the second correspondence to use the result relation from
the first correspondence. To calculate the final preconditions, the rule includes
VCG premises to move the preconditions from the second correspondence across
a and c. In the C-SIMPL VCG obligation, we may ignore any guard faults as
their absence is implied by the first premise. In fact, in most cases the C-SIMPL
VCG step can be omitted altogether, because the post condition collapses to
true after simplifications.
We have developed a tactic which assists in splitting: C-SIMPL’s encoding
of function calls and struct member updates requires multiple specialised rules.
The tactic symbolically executes and moves any guards if required, determines
the correct splitting rule to use, instantiates the extraction function, and lifts
the second correspondence premise.

Example. After lifting, moving the guard, and symbolically executing the getCTE
function, applying the above rule to the example proof statement in Fig. 5 gives
the following as the ﬁrst proof obligation
Mind the Gap 511

ccorres cmdb-relation mdb (. . .) {. . .} [SKIP]

(return (cteMDBNode cte))
(´mdb :== H &(Ptr src→[cteMDBNode-C])

This goal, proved using the VCG approach from Sect. 5.2, states that, apart
from the state correspondence, the return value from the speciﬁcation side
(cteMDBNode cte) and implementation side (H &(Ptr src→[cteMDBNode-C])
stored in mdb) are related through cmdb-relation, that is, the linked list pointers
in the returned speciﬁcation node are equal to those in the implementation.

5.6 Completing the Example

The proof of the example, cteMove, is 25 lines of Isabelle/HOL tactic style proof.
The proof starts by weakening the preconditions (here abbreviated P and P )
with new Isabelle schematic variables; this allows preconditions to be calculated
on demand in the correspondence proofs.
We then lift the function arguments and proceed to prove by splitting; the
leaf goals are proved as separate lemmas using the C-SIMPL VCG. Next,
the correspondence preconditions are moved back through the statements us-
ing the two VCGs on specification and implementation. The final step is to solve
the proof obligation generated by the initial precondition weakening: the stated
preconditions (our P and P ) must imply the calculated preconditions.
The lifting and splitting phase takes 9 lines, the VCG stage takes 1 line, using
tactic repetition, while the final step takes 15 lines and is typically the trickiest
part of any correspondence proof.

6 Experience

In this section we explore how our C subset influenced the kernel implementation
and performance. We then discuss our experience in applying the framework.
We chose to implement the C kernel manually, rather than synthesising it from
the executable specification. Initial investigations had shown that generated C
code would not meet the performance requirements of a real-world microkernel.
Message-passing (IPC) performance, even in the first hand-written version, com-
pleted after two person months, was slow, on the order of the Mach microkernel.
After optimisation, this operation is now comparable to that of the modern,
commercially deployed, OKL4 2.1 [22] microkernel: we measured 206 cycles for
OKL4’s hand-crafted assembly IPC path, and 756 cycles for its non-optimised C
version on the ARMv6 Freescale i.MX31 platform. On the same hardware, our C
kernel initially took over 3000 cycles, after optimisations 299. The fastest other
IPC implementation for ARMv6 in C we know of is 300 cycles.
The C subset and the implementation developed in parallel, influencing each
other. We extended the subset with new features such as multiple side-effect
free function calls in expressions, but we also needed to make trade-offs such
as for references to local variables. We avoided passing large structures on the
512 S. Winwood et al.

Table 1. Code and proof statistics. Changes for step RC .

Lines Changes
Haskell/C Isabelle Proof Bugs Convenience
Executable speciﬁcation 5,700 13,000 117,000 8 10
Implementation 8,700 15,000 50,000a 97 34

a
With 474 of 518 (91%) of the functions veriﬁed.

C stack across function boundaries. Instead, we stored these in global variables

and accessed them through pointers. Whilst the typical pattern was of conflicting
pressures from implementation and verification, in a few cases both sides could be
neatly satisfied by a single solution. We developed a code-generation tool [4] for
efficient, packed bitfields in tagged unions with a clean, uniform interface. This
tool not only generates the desired code, but also the associated Isabelle/HOL
proofs and specifications that integrate directly into our refinement framework.
The resulting compiled code is faster, more predictable, and more compact than
the bitfield code emitted by GCC on ARM.
Code and proof statistics are shown in Table 1. Of the 50,000 lines of proof in
RC , approximately 5,000 lines are framework related, 7,700 lines are automat-
ically generated by our bitfield tool, and the remaining 37,300 lines are hand-
written. We also have about 1,000 lines of tactic code. We spent just over 2
person years in 6 months of main activity on this proof and estimate another
two months until completion. We prove an average of 3–4 functions per person
per week.
One important aspect of the verification effort was our ability to change both
the specification and the implementation. These changes, included in Table 1,
fell into two categories: true bug fixes and proof convenience changes. In the
specification, bug fixes were not related to safety — the proof of RA guaranteed
this. Rather, they export implementation restrictions such as the number of bits
used for a specific argument encoding. Although both versions were safe, refine-
ment was only possible with the changed version. The implementation had not
been intensively tested, because it was scheduled for formal verification. It had,
however, been used in a number of student projects and was being ported to the
x86 architecture when verification started. The former activities found 16 bugs
in the ARMv6 code; the verification has so far found 97. Once the verification is
complete, the only reason to change the code will be for performance and new
features: C implementation defects will no longer exist.
A major aim in developing the framework presented in this paper was the
avoidance of invariant proofs on the implementation. We achieved this primarily
through proof reuse from RA : the detailed nature of the executable specifica-
tion’s treatment of kernel objects meant that the state relation fragment for
kernel objects was quite simple; this simplicity allowed proof obligations from
the implementation to be easily solved with facts from the specification. Further-
more, when new invariants were required we could prove them on the executable
Mind the Gap 513

specification. For example, the encoding of Isabelle’s option type using a default
value in C (such as NULL) required us to show that these default values never
occurred as valid values.
We discovered that the difficulty of verifying any given function in RC was
determined by the degree of difference between the function in C and its exe-
cutable specification, arising either from the control structures of C or its impure
memory model. Unlike the proof of RA , the semantic complexity of the function
seems mostly irrelevant. For instance, the operation which deletes a capability —
by far the most semantically complex operation in seL4 — was straightforward
to verify in RC . On the other hand, a simpler operation which employs an indis-
criminate memset over a number of objects was comparatively difficult to verify.
It is interesting to note that, even here, proofs from RA were useful in proving
facts about the implementation.
An important consequence of the way we split up proofs is that local reasoning
becomes possible. No single person needed a full, global understanding of the
whole kernel implementation.

7 Related Work

Earlier work on OS verification includes PSOS [12] and UCLA Secure Unix [28].
Later, Bevier [3] describes verification of process isolation properties down to
object code level, but for an idealised kernel (KIT) far simpler than modern
microkernels. We use the same general approach — refinement — as KIT and
UCLA Secure Unix, however the scale, techniques for each refinement step, and
level of detail we treat are significantly different.
The Verisoft project [24] is working towards verifying a whole system stack, in-
cluding hardware, compiler, applications, and a simplified microkernel VAMOS.
The VFiasco [15] project is attempting to verify the Fiasco kernel, another vari-
ant of L4 directly on the C++ level. For a comprehensive overview of operating
system verification efforts, we refer to Klein [17].
Deductive techniques to prove annotated C programs at the source code level
include Key-C [20], VCC [6], and Caduceus [13], recently integrated into the
Frama-C framework [14]. Key-C only focuses on a type-safe subset of C. VCC,
which also supports concurrency, appears to be heavily dependent on large ax-
iomatisations; even the memory model [6] axiomatises a weaker version of what
Tuch proves [26]. Caduceus supports a large subset of C, with extensions to han-
dle certain kinds of unions and casts [1, 19]. These techniques are not directly
applicable to refinement, although Caduceus has at least been used [2] to extract
a formal Coq specification for verifying security and safety properties.
We directly use the SIMPL verification framework [23] from the Verisoft
project, but we instantiate it differently. While Verisoft’s main implementation
language is fully formally defined from the ground up, with well-defined Pascal-
like semantics and C-style syntax, we treat a true, large subset of C99 [16] on
ARMv6 with all the realism and ugliness this implies. Our motivation for this is
our desire to use standard tool-chains and compilers for real-world deployment
514 S. Winwood et al.

of the kernel. Verisoft instead uses its own non-optimising compiler, which in
exchange is formally verified. Another difference is the way we exploit structural
similarities between our executable specification and C implementation. Verisoft
uses the standard VCG-based methodology for implementation verification. Our
framework allows us to transport invariant properties and Hoare-triples from
our existing proof on the executable specification [5] down to the C level. This
allowed us to avoid invariants on the C level, speeding up the overall proof effort
significantly.

8 Conclusion
We have presented a formal framework for verifying the refinement of a
large, monadic, executable specification into a low-level, manually performance-
optimised C implementation. We have demonstrated that the framework
performs well by applying it to the verification of the seL4 microkernel in Is-
abelle/HOL, and by completing a large part of this verification in a short time.
The framework allows us to take advantage of the large number of invariants
proved on the specification level, thus saving significant amounts of work. We
were able to conduct the semantic reasoning on the more pleasant monadic, shal-
lowly embedded specification level, and leave essentially syntactic decomposition
to the C level.
We conclude that our C verification framework achieves both the scalability
in terms of size, as well as the separation of concerns that is important for
distributing such a large proof over multiple people.

Acknowledgements. We thank Timothy Bourke and Philip Derrin for reading

and commenting on drafts of this paper.

References
1. Andronick, J.: Modélisation et Vérification Formelles de Systèmes Embarqués dans
les Cartes à Microprocesseur—Plate-Forme Java Card et Système d’Exploitation.
Ph.D thesis, Université Paris-Sud (March 2006)
2. Andronick, J., Chetali, B., Paulin-Mohring, C.: Formal verification of security prop-
erties of smart card embedded source code. In: Fitzgerald, J.S., Hayes, I.J., Tarlecki,
A. (eds.) FM 2005. LNCS, vol. 3582, pp. 302–317. Springer, Heidelberg (2005)
3. Bevier, W.R.: Kit: A study in operating system verification. IEEE Transactions on
Software Engineering 15(11), 1382–1396 (1989)
4. Cock, D.: Bitfields and tagged unions in C: Verification through automatic gen-
eration. In: Beckert, B., Klein, G. (eds.) Proc, 5th VERIFY, Sydney, Australia,
August 2008. CEUR Workshop Proceedings, vol. 372, pp. 44–55 (2008)
5. Cock, D., Klein, G., Sewell, T.: Secure microkernels, state monads and scalable
refinement. In: Mohamed, O.A., Muñoz, C., Tahar, S. (eds.) TPHOLs 2008. LNCS,
vol. 5170, pp. 167–182. Springer, Heidelberg (2008)
6. Cohen, E., Moskal, M., Schulte, W., Tobies, S.: A precise yet efficient memory
model for C (2008),
http://research.microsoft.com/apps/pubs/default.aspx?id=77174
Mind the Gap 515

7. de Roever, W.-P., Engelhardt, K.: Data Reﬁnement: Model-Oriented Proof Meth-

ods and their Comparison. Cambridge Tracts in Theoretical Computer Science,
vol. 47. Cambridge University Press (1998)
8. Derrin, P., Elphinstone, K., Klein, G., Cock, D., Chakravarty, M.M.T.: Running
the manual: An approach to high-assurance microkernel development. In: Proc.
ACM SIGPLAN Haskell WS, Portland, OR, USA (September 2006)
9. Dijkstra, E.W.: Guarded commands, nondeterminacy and formal derivation of pro-
grams. CACM 18(8), 453–457 (1975)
10. Elphinstone, K., Klein, G., Derrin, P., Roscoe, T., Heiser, G.: Towards a practical,
verified kernel. In: Proc. 11th Workshop on Hot Topics in Operating Systems (2007)
11. Elphinstone, K., Klein, G., Kolanski, R.: Formalising a high-performance micro-
kernel. In: Leino, R. (ed.) VSTTE, Microsoft Research Technical Report MSR-TR-
2006-117, Seattle, USA, August 2006, pp. 1–7 (2006)
12. Feiertag, R.J., Neumann, P.G.: The foundations of a provably secure operating
system (PSOS). In: AFIPS Conf. Proc., 1979 National Comp. Conf., New York,
NY, USA, June 1979, pp. 329–334 (1979)
13. Filliâtre, J.-C., Marché, C.: Multi-prover verification of C programs. In: Davies, J.,
Schulte, W., Barnett, M. (eds.) ICFEM 2004. LNCS, vol. 3308, pp. 15–29. Springer,
Heidelberg (2004)
14. Frama-C (2008), http://frama-c.cea.fr/
15. Hohmuth, M., Tews, H.: The VFiasco approach for a verified operating system. In:
Proc. 2nd ECOOP-PLOS Workshop, Glasgow, UK (October 2005)
16. Programming languages—C, ISO/IEC 9899:1999 (1999)
17. Klein, G.: Operating system verification—An overview. Sādhanā 34(1), 27–69
(2009)
18. Liedtke, J.: On μ-kernel construction. In: Proc. 15th SOSP (December 1995)
19. Moy, Y.: Union and cast in deductive verification. In: Proc. C/C++ Verification
Workshop, Technical Report ICIS-R07015. Radboud University Nijmegen (2007)
20. Mürk, O., Larsson, D., Hähnle, R.: KeY-C: A tool for verification of C programs. In:
Pfenning, F. (ed.) CADE 2007. LNCS, vol. 4603, pp. 385–390. Springer, Heidelberg
(2007)
21. Nipkow, T., Paulson, L.C., Wenzel, M.T.: Isabelle/HOL. LNCS, vol. 2283. Springer,
Heidelberg (2002)
22. Open Kernel Labs. OKL4 v2.1 (2008), http://www.ok-labs.com
23. Schirmer, N.: Verification of Sequential Imperative Programs in Isabelle/HOL.
Ph.D thesis, Technische Universität München (2006)
24. Schirmer, N., Hillebrand, M., Leinenbach, D., Alkassar, E., Starostin, A., Tsyban,
A.: Balancing the load — leveraging a semantics stack for systems verification.
JAR, special issue on Operating System Verification 42(2-4), 389–454 (2009)
25. Tuch, H.: Formal Memory Models for Verifying C Systems Code. Ph.D thesis,
School Comp. Sci. & Engin., University NSW, Sydney 2052, Australia (August
2008)
26. Tuch, H.: Formal verification of C systems code: Structured types, separation logic
and theorem proving. JAR, special issue on Operating System Verification 42(2–4),
125–187 (2009)
27. Tuch, H., Klein, G., Norrish, M.: Types, bytes, and separation logic. In: Hofmann,
M., Felleisen, M. (eds.) Proc. 34th POPL, pp. 97–108. ACM, New York (2007)
28. Walker, B., Kemmerer, R., Popek, G.: Specification and verification of the UCLA
Unix security kernel. CACM 23(2), 118–131 (1980)
Author Index

Andronick, June 500 Nakata, Keiko 375

Asperti, Andrea 84 Naumowicz, Adam 67
Norell, Ulf 73
Basin, David 1
Norrish, Michael 500
Bengtson, Jesper 99
Benton, Nick 115 Owens, Scott 391
Berghofer, Stefan 131, 147
Bove, Ana 73 Parrow, Joachim 99
Brown, Chad E. 164 Paşca, Ioana 408
Bulwahn, Lukas 131 Peña, Ricardo 196
Pichardie, David 212
Capkun, Srdjan 1
Cock, David 500 Reiter, Markus 147
Cohen, Ernie 23 Ricciotti, Wilmer 84
Rideau, Laurence 327
Dabrowski, Frédéric 212
Dahlweid, Markus 23 Sacerdoti Coen, Claudio 84
Dawson, Jeremy E. 180 Santen, Thomas 23
de Dios, Javier 196 Sarkar, Susmit 391
Dybjer, Peter 73 Schaller, Patrick 1
Garillot, François 327 Schimpf, Alexander 424
Gonthier, Georges 327 Schmidt, Benedikt 1
Gordon, Michael J.C. 359 Schulte, Wolfram 23
Schürmann, Carsten 79
Haftmann, Florian 131 Sewell, Peter 391
Harrison, John 43, 60 Sewell, Thomas 500
Hasan, Osman 228 Smaus, Jan-Georg 424
Hillebrand, Mark 23 Smolka, Gert 164
Homeier, Peter V. 244 Sternagel, Christian 452
Huﬀman, Brian 260 Swierstra, Wouter 440
Julien, Nicolas 408 Tahar, Soﬁène 228
Kennedy, Andrew 115 Tassi, Enrico 84
Khan Afshar, Sanaz 228 Thiemann, René 452
Klein, Gerwin 276, 500 Tiu, Alwen 180
Kolanski, Rafal 276 Tobies, Stephan 23
Kornilowicz, Artur 67 Tuerk, Thomas 469

Leinenbach, Dirk 23 Uustalu, Tarmo 375

Le Roux, Stéphane 293
Varming, Carsten 115
Lochbihler, Andreas 310
Mahboubi, Assia 327 Wang, Jinshuang 485
McCreight, Andrew 343 Winwood, Simon 500
Merz, Stephan 424 Yang, Huabing 485
Moskal, Michal 23
Myreen, Magnus O. 359 Zhang, Xingyuan 485

Lecture Notes in Computer Science 4098: Editorial Board
No ratings yet
Lecture Notes in Computer Science 4098: Editorial Board
425 pages
(Texts in Computer Science) Zhe Hou - Fundamentals of Logic and Computation - With Practical Automated Reasoning and Verification-Springer (2022)
100% (1)
(Texts in Computer Science) Zhe Hou - Fundamentals of Logic and Computation - With Practical Automated Reasoning and Verification-Springer (2022)
225 pages
2003 Anca Dinu, Liviu P. Dinu, 2003. On The Categorization Via Rank Distance
No ratings yet
2003 Anca Dinu, Liviu P. Dinu, 2003. On The Categorization Via Rank Distance
330 pages
Lecture Notes in Computer Science: Edited by G. Goos, J. Hartmanis and J. Van Leeuwen
No ratings yet
Lecture Notes in Computer Science: Edited by G. Goos, J. Hartmanis and J. Van Leeuwen
552 pages
Science of Concurrent Programs
No ratings yet
Science of Concurrent Programs
342 pages
Logic and Its Applications 5th Indian Conference Icla 2013 Chennai India January 10-12-2013 Proceedings Compress
No ratings yet
Logic and Its Applications 5th Indian Conference Icla 2013 Chennai India January 10-12-2013 Proceedings Compress
266 pages
Brauer Notes on Realizability
No ratings yet
Brauer Notes on Realizability
111 pages
Functional Algorithms Verified
No ratings yet
Functional Algorithms Verified
278 pages
Automata Logics, and Infinite Games. A Guide To Current Research PDF
100% (1)
Automata Logics, and Infinite Games. A Guide To Current Research PDF
355 pages
Science
No ratings yet
Science
328 pages
Program Equals Proof PDF
100% (1)
Program Equals Proof PDF
489 pages
Isabelle
No ratings yet
Isabelle
221 pages
Automatic Verification of Sequencial
No ratings yet
Automatic Verification of Sequencial
169 pages
Main
No ratings yet
Main
212 pages
Open Logic - Incompleteness and Computability
No ratings yet
Open Logic - Incompleteness and Computability
282 pages
LL Handbook Public
No ratings yet
LL Handbook Public
229 pages
Isabelle: A Proof Assistant For Higher-Order Logic
No ratings yet
Isabelle: A Proof Assistant For Higher-Order Logic
231 pages
Lecture Notes in Computer Science: Edited by G. Goos, J. Hartmanis and J. Van Leeuwen
No ratings yet
Lecture Notes in Computer Science: Edited by G. Goos, J. Hartmanis and J. Van Leeuwen
278 pages
Tutorial-Isabelle
No ratings yet
Tutorial-Isabelle
65 pages
Rodin
No ratings yet
Rodin
185 pages
HOL Light Tutorial
No ratings yet
HOL Light Tutorial
230 pages
Control Software v7.1 Release Notes: Foxboro Evo Process Automation System
100% (1)
Control Software v7.1 Release Notes: Foxboro Evo Process Automation System
164 pages
Notes On Computational Complexity Theory CPSC 468/568: Spring 2020
No ratings yet
Notes On Computational Complexity Theory CPSC 468/568: Spring 2020
229 pages
Fields of Logic and Computation Essays Dedicated To Yuri Gurevich On The Occasion of His 70th Birthday PDF
No ratings yet
Fields of Logic and Computation Essays Dedicated To Yuri Gurevich On The Occasion of His 70th Birthday PDF
636 pages
Ifr
No ratings yet
Ifr
87 pages
Gallier Theory of Computation
No ratings yet
Gallier Theory of Computation
398 pages
Foundation of Tcs
No ratings yet
Foundation of Tcs
142 pages
Buehler - Classical Metalogic PDF
No ratings yet
Buehler - Classical Metalogic PDF
158 pages
From LCF to Isabelle:HOL
No ratings yet
From LCF to Isabelle:HOL
25 pages
Jeff Erikson - Models of Computation
100% (1)
Jeff Erikson - Models of Computation
155 pages
Delftse Foundations of Computation
No ratings yet
Delftse Foundations of Computation
129 pages
1978 Fairchild Linear Interface Data Book
100% (1)
1978 Fairchild Linear Interface Data Book
538 pages
vol3
No ratings yet
vol3
71 pages
Introduction To Formal Reasoning (COMP2065) : Release 0.1
No ratings yet
Introduction To Formal Reasoning (COMP2065) : Release 0.1
79 pages
Thesis
No ratings yet
Thesis
225 pages
Tutorial
100% (1)
Tutorial
223 pages
Logic PDF
No ratings yet
Logic PDF
141 pages
Holy Quran Para 8
No ratings yet
Holy Quran Para 8
28 pages
Prog Prove
No ratings yet
Prog Prove
59 pages
All Models PDF
No ratings yet
All Models PDF
125 pages
Logic Colloquium 2006 S Barry Cooper Herman Geuvers Anand Pillay download
No ratings yet
Logic Colloquium 2006 S Barry Cooper Herman Geuvers Anand Pillay download
90 pages
000952616
No ratings yet
000952616
95 pages
Station Island
No ratings yet
Station Island
246 pages
Algorithms Esa 2010 16th Annual European Symposium Liverpool Uk September 2010 Proceedings Part Ii Mark De Berg download
No ratings yet
Algorithms Esa 2010 16th Annual European Symposium Liverpool Uk September 2010 Proceedings Part Ii Mark De Berg download
80 pages
Poem - Happiness
No ratings yet
Poem - Happiness
5 pages
Bruttomesso PHD Thesis
No ratings yet
Bruttomesso PHD Thesis
164 pages
Peter Smith - DIY Logic
No ratings yet
Peter Smith - DIY Logic
77 pages
Teach Yourself Logic 2016
No ratings yet
Teach Yourself Logic 2016
94 pages
Final3 PDF
No ratings yet
Final3 PDF
91 pages
The Theory of Languages and Computation: Preliminary Notes - Please Do Not Distribute
No ratings yet
The Theory of Languages and Computation: Preliminary Notes - Please Do Not Distribute
109 pages
Tell Me A Riddle - Tillie Olsen
88% (8)
Tell Me A Riddle - Tillie Olsen
360 pages
Notes on formal languages automata computability and complexity Gallier J - The full ebook version is available, download now to explore
No ratings yet
Notes on formal languages automata computability and complexity Gallier J - The full ebook version is available, download now to explore
45 pages
Coursang0 PDF
No ratings yet
Coursang0 PDF
13 pages
Theory of Computation
No ratings yet
Theory of Computation
120 pages
Teaching Preference of TESDA Students
No ratings yet
Teaching Preference of TESDA Students
54 pages
Staple Python Libraries For Data Science
No ratings yet
Staple Python Libraries For Data Science
26 pages
Code Generation Via Higher-Order Rewrite Systems: 1 Introduction and Related Work
No ratings yet
Code Generation Via Higher-Order Rewrite Systems: 1 Introduction and Related Work
15 pages
Package Quantmod': R Topics Documented
No ratings yet
Package Quantmod': R Topics Documented
103 pages
978 1 59636 869 9
No ratings yet
978 1 59636 869 9
22 pages
Bmgs Scheme of Work Ss3
No ratings yet
Bmgs Scheme of Work Ss3
84 pages
BEIJER - StartUp Ix (09 - 2014)
No ratings yet
BEIJER - StartUp Ix (09 - 2014)
362 pages
Arts Unit Plan: Exploring Tukutuku Panel Art: Level 3
No ratings yet
Arts Unit Plan: Exploring Tukutuku Panel Art: Level 3
12 pages
Norwegian Grammar rules
No ratings yet
Norwegian Grammar rules
14 pages
EF3e Beg Filetest 08 B
No ratings yet
EF3e Beg Filetest 08 B
6 pages
92939832
No ratings yet
92939832
85 pages
Child Development
No ratings yet
Child Development
13 pages
Fe16N2 Permanenr Magnet
No ratings yet
Fe16N2 Permanenr Magnet
13 pages
Lecture Notes in Artificial Intelligence 1328
No ratings yet
Lecture Notes in Artificial Intelligence 1328
9 pages
Math - Grade7-Q1-Module 1 - Sets
No ratings yet
Math - Grade7-Q1-Module 1 - Sets
12 pages
Osy Chapter 4 Notes
No ratings yet
Osy Chapter 4 Notes
17 pages
Religion and Cult in the Dodecanese
No ratings yet
Religion and Cult in the Dodecanese
10 pages
Log
No ratings yet
Log
11 pages
Lesson Plan in English Grade Vii
No ratings yet
Lesson Plan in English Grade Vii
4 pages
REL 120 Typical Exam Questions
No ratings yet
REL 120 Typical Exam Questions
2 pages
Value Education Revision Worksheet
No ratings yet
Value Education Revision Worksheet
5 pages
Connect 3 March revision
No ratings yet
Connect 3 March revision
4 pages
HFY-Checklist-14-07-01Telecommunication-PAGA SYSTEM喇叭系统
No ratings yet
HFY-Checklist-14-07-01Telecommunication-PAGA SYSTEM喇叭系统
2 pages
Best English Essays
100% (2)
Best English Essays
3 pages
Significado Forma Simple Pasado Participio Pdo. /ed/: Regular Verbs
No ratings yet
Significado Forma Simple Pasado Participio Pdo. /ed/: Regular Verbs
4 pages
UNIT 3 - FRIENDS - TN
No ratings yet
UNIT 3 - FRIENDS - TN
2 pages
Meaningful Learning Experiences
No ratings yet
Meaningful Learning Experiences
1 page
The Experience of Free Banking
From Everand
The Experience of Free Banking
Kevin Dowd
No ratings yet
Organized Hypocrisy
From Everand
Organized Hypocrisy
David Penklis
No ratings yet
Taking Turns with the Earth: Phenomenology, Deconstruction, and Intergenerational Justice
From Everand
Taking Turns with the Earth: Phenomenology, Deconstruction, and Intergenerational Justice
Matthias Fritsch
No ratings yet
The Death of Competition: Leadership and Strategy in the Age of Business Ecosystems
From Everand
The Death of Competition: Leadership and Strategy in the Age of Business Ecosystems
James F. Moore
4/5 (6)
6th International Workshop on Magnetic Particle Imaging (IWMPI 2016): Book of Abstracts
From Everand
6th International Workshop on Magnetic Particle Imaging (IWMPI 2016): Book of Abstracts
Thorsten Buzug
3/5 (1)
Modeling of Living Systems: From Cell to Ecosystem
From Everand
Modeling of Living Systems: From Cell to Ecosystem
Alain Pavé
No ratings yet
No Ordinary Mike: Michael Smith, Nobel Laureate
From Everand
No Ordinary Mike: Michael Smith, Nobel Laureate
Eric Damer
No ratings yet
3D Bioprinting: Printing Parts for Bodies
From Everand
3D Bioprinting: Printing Parts for Bodies
ARC Centre of Excellence for Electromaterials Science
4/5 (1)
Introduction to X-Ray Powder Diffractometry
From Everand
Introduction to X-Ray Powder Diffractometry
Ron Jenkins
No ratings yet

978-3-642-03359-9

Uploaded by

978-3-642-03359-9

Uploaded by

Lecture Notes in Computer Science 5674

Commenced Publication in 1973

22nd International Conference, TPHOLs 2009

Technische Universität München

Library of Congress Control Number: 2009931594

LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues

This volume constitutes the proceedings of the 22nd International Conference

1993 (Canada) Vol. 780 2001 (UK) Vol. 2152

We thank our sponsors: Microsoft Research Redmond, Galois, Verisoft XT,

June 2009 Stefan Berghofer

Matthieu Sozeau Emina Torlak Simon Winwood

VCC: A Practical System for Verifying Concurrent C . . . . . . . . . . . . . . . . . 23

Without Loss of Generality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

A Brief Overview of Mizar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

A Brief Overview of Agda – A Functional Language with Dependent

The Twelf Proof Assistant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Some Domain Theory and Denotational Semantics in Coq . . . . . . . . . . . . . 115

Turning Inductive into Equational Speciﬁcations . . . . . . . . . . . . . . . . . . . . . 131

Formalizing the Logic-Automaton Connection . . . . . . . . . . . . . . . . . . . . . . . 147

Formal Veriﬁcation of Exact Computations Using Newton’s Method . . . . 408

Construction of Büchi Automata for LTL Model Checking Veriﬁed in

A Hoare Logic for the State Monad: Proof Pearl . . . . . . . . . . . . . . . . . . . . . 440

Certiﬁcation of Termination Proofs Using CeTA . . . . . . . . . . . . . . . . . . . . . . . 452

A Formalisation of Smallfoot in HOL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469

Liveness Reasoning with Isabelle/HOL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485

Mind the Gap: A Veriﬁcation Framework for Low-Level C . . . . . . . . . . . . . 500

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517

David Basin, Srdjan Capkun, Patrick Schaller, and Benedikt Schmidt

ETH Zurich, 8092 Zurich, Switzerland

Abstract. Traditional security protocols are mainly concerned with key

authentication, but rather to establish a physical property. To achieve this, dis-

Organization. In Section 2, we provide background on Isabelle/HOL and the

2.2 Distance Bounding Protocols

Fig. 1. Pattern for Distance Bounding Protocols

In [4], Meadows et al. present a suite of distance bounding protocols, follow-

Agents & Geometric

Free Parameterized XOR

Ultrasound TESLA Meadows

Fig. 2. Dependency Graph of our Isabelle Theory Files

3.1 Agents and Environment

Location and Physical Distance. To support reasoning about physical proto-

when there are no obstacles), we deﬁne the line-of-sight communication distance

Transmitters, Receivers, and Communication Distance. To distinguish

We model the network topology using the uninterpreted function constant

Abstract Message Theory. Our Message Theory locale is parametric in

Nonce : agent → nat → ’msg Key : key → ’msg

X∈H X ∈ parts(H) G⊆H

parts(parts(H)) = parts(H) parts(H) ⊆ subterms(H)

parts(G) ∪ parts(H) = parts(G ∪ H) .

Similar properties are assumed to hold for the subterms function.

or MAC s. These assumptions are reasonable for message theories formalizing

Nonce B NB ∈ subterms(dm A H) A = B Key k ∈ parts(dm A H)

MAC k (m) ∈ subterms(dm A H)

3.3 Events and Traces

We distinguish between three types of events: an agent sending a message, re-

datatype ’msg event = Send transmitter ’msg (’msg list )

cdist Net (Tx iA , Rx jB ) = tAB tr ∈ Tr t ≥ maxtime(tr)

tr ∈ Tr t ≥ maxtime(tr) step ∈ proto

Fig. 3. Rules for Tr

Knowledge and Used Messages. Each agent A initially possesses some

knows A (tr) = {m |∃ k t.(t, Recv Tx kA m) ∈ tr} ∪ initKnows A

used(tr) = {n | ∃ A k t m L.(t, Send Tx kA m L) ∈ tr ∧ n ∈ subterms({m})}

3.4 Network, Intruder, and Protocols

Network Rule. The Net-rule models message transmission from transmitters

Intruder Rule. The Fake-rule in Figure 3 describes the intruders’ behavior. An

Protocols. In contrast to intruders who can send arbitrary derivable messages,

datatype ’msg action = SendA nat (’msg list ) | ClaimA

translateEv (A, SendA k L, m) = Send Tx kA m L

view (A, tr) = [(ctime(A, t), ev) |(t, ev) ∈ tr ∧ occursAt(ev) = A]

3.5 Protocol-Independent Results

3.6 XOR Message Theory

datatype fmsg = AGENT agent | INT int | REAL real

We deﬁne the corresponding equivalence relation =E as the reﬂexive, symmetric,

Agent Int Real

reduced a reduced b reduced m

reduced a reduced b standard a a < first b b = ZERO

Nonce B NB ∈ subterms(dm A H) A = B Key k ∈ parts(dm A H)