Formal Verification of Quantum Algorithms Using Quantum Hoare Logic
Formal Verification of Quantum Algorithms Using Quantum Hoare Logic
Computer Aided
Verification
31st International Conference, CAV 2019
New York City, NY, USA, July 15–18, 2019
Proceedings, Part II
Lecture Notes in Computer Science 11562
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Computer Aided
Verification
31st International Conference, CAV 2019
New York City, NY, USA, July 15–18, 2019
Proceedings, Part II
Editors
Isil Dillig Serdar Tasiran
University of Texas Amazon Web Services
Austin, TX, USA New York, NY, USA
© The Editor(s) (if applicable) and The Author(s) 2019, This book is an open access publication.
Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International
License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution
and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and
the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this book are included in the book’s Creative Commons license,
unless indicated otherwise in a credit line to the material. If material is not included in the book’s Creative
Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use,
you will need to obtain permission directly from the copyright holder.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, expressed or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
It was our privilege to serve as the program chairs for CAV 2019, the 31st International
Conference on Computer-Aided Verification. CAV 2019 was held in New York, USA,
during July 15–18, 2019. The tutorial day was on July 14, 2019, and the pre-conference
workshops were held during July 13–14, 2019. All events took place in The New
School in New York City.
CAV is an annual conference dedicated to the advancement of the theory and
practice of computer-aided formal analysis methods for hardware and software sys-
tems. The primary focus of CAV is to extend the frontiers of verification techniques by
expanding to new domains such as security, quantum computing, and machine
learning. This put CAV at the cutting edge of formal methods research, and this year’s
program is a reflection of this commitment.
CAV 2019 received a very high number of submissions (258). We accepted 13 tool
papers, two case studies, and 52 regular papers, which amounts to an acceptance rate of
roughly 26%. The accepted papers cover a wide spectrum of topics, from theoretical
results to applications of formal methods. These papers apply or extend formal methods
to a wide range of domains such as concurrency, learning, and industrially deployed
systems. The program featured invited talks by Dawn Song (UC Berkeley), Swarat
Chaudhuri (Rice University), and Ken McMillan (Microsoft Research) as well as
invited tutorials by Emina Torlak (University of Washington) and Ranjit Jhala (UC San
Diego). Furthermore, we continued the tradition of Logic Lounge, a series of discus-
sions on computer science topics targeting a general audience.
In addition to the main conference, CAV 2019 hosted the following workshops: The
Best of Model Checking (BeMC) in honor of Orna Grumberg, Design and Analysis of
Robust Systems (DARS), Verification Mentoring Workshop (VMW), Numerical
Software Verification (NSV), Verified Software: Theories, Tools, and Experiments
(VSTTE), Democratizing Software Verification, Formal Methods for ML-Enabled
Autonomous Systems (FoMLAS), and Synthesis (SYNT).
Organizing a top conference like CAV requires a great deal of effort from the
community. The Program Committee for CAV 2019 consisted of 79 members, a
committee of this size ensures that each member has to review a reasonable number of
papers in the allotted time. In all, the committee members wrote over 770 reviews while
investing significant effort to maintain and ensure the high quality of the conference
program. We are grateful to the CAV 2019 Program Committee for their outstanding
efforts in evaluating the submissions and making sure that each paper got a fair chance.
Like last year’s CAV, we made artifact evaluation mandatory for tool submissions
and optional but encouraged for the rest of the accepted papers. The Artifact Evaluation
Committee consisted of 27 reviewers who put in significant effort to evaluate each
artifact. The goal of this process was to provide constructive feedback to tool devel-
opers and help make the research published in CAV more reproducible. The Artifact
Evaluation Committee was generally quite impressed by the quality of the artifacts,
vi Preface
and, in fact, all accepted tools passed the artifact evaluation. Among regular papers,
65% of the authors submitted an artifact, and 76% of these artifacts passed the eval-
uation. We are also very grateful to the Artifact Evaluation Committee for their hard
work and dedication in evaluating the submitted artifacts.
CAV 2019 would not have been possible without the tremendous help we received
from several individuals, and we would like to thank everyone who helped make CAV
2019 a success. First, we would like to thank Yu Feng and Ruben Martins for chairing
the Artifact Evaluation Committee and Zvonimir Rakamaric for maintaining the CAV
website and social media presence. We also thank Oksana Tkachuk for chairing the
workshop organization process, Peter O’Hearn for managing sponsorship, and Thomas
Wies for arranging student fellowships. We also thank Loris D’Antoni, Rayna
Dimitrova, Cezara Dragoi, and Anthony W. Lin for organizing the Verification
Mentoring Workshop and working closely with us. Last but not least, we would like to
thank Kostas Ferles, Navid Yaghmazadeh, and members of the CAV Steering
Committee (Ken McMillan, Aarti Gupta, Orna Grumberg, and Daniel Kroening) for
helping us with several important aspects of organizing CAV 2019.
We hope that you will find the proceedings of CAV 2019 scientifically interesting
and thought-provoking!
Program Chairs
Isil Dillig The University of Texas at Austin, USA
Serdar Tasiran Amazon, USA
Workshop Chair
Oksana Tkachuk Amazon, USA
Publicity Chair
Zvonimir Rakamaric University of Utah, USA
Sponsorship Chair
Peter O’Hearn Facebook, USA
Fellowship Chair
Thomas Wies NYU, USA
Program Committee
Aws Albarghouthi University of Wisconsin-Madison, USA
Jade Alglave University College London, UK
Rajeev Alur University of Pennsylvania, USA
Christel Baier TU Dresden, Germany
Gilles Barthe Max Planck Institute for Security and Privacy,
Germany; IMDEA Software Institute, Spain
Osbert Bastani University of Pennsylvania, USA
Josh Berdine Facebook, USA
Per Bjesse Synopsys Inc., USA
Nikolaj Bjorner Microsoft, USA
Roderick Bloem Graz University of Technology, Austria
viii Organization
Steering Committee
Ken McMillan Microsoft, USA
Aarti Gupta Princeton, USA
Orna Grunberg Technion, Israel
Daniel Kroening University of Oxford, UK
Additional Reviewers
Numerical Programs
Verification
Concurrency
Synthesis
Model Checking
1 Introduction
Mission-time LTL (MLTL) [34] has the syntax of Linear Temporal Logic with the option
of integer bounds on the temporal operators. It was created as a generalization of the vari-
ations [3, 14, 25] on finitely-bounded linear temporal logic, ideal for specification of mis-
sions carried out by aircraft, spacecraft, rovers, and other vehicular or robotic systems.
MLTL provides the readability of LTL [32], while assuming, when a different duration is
not specified, that all requirements must be upheld during the (a priori known) length of
a given mission, such as during the half-hour battery life of an Unmanned Aerial System
(UAS). Using integer bounds instead of real-number or real-time bounds leads to more
generic specifications that are adaptable to model checking at different levels of abstrac-
tion, or runtime monitoring on different platforms (e.g., in software vs in hardware).
Integer bounds should be read as generic time units, referring to the basic temporal res-
olution of the system, which can generically be resolved to units such as clock ticks or
seconds depending on the mission. Integer bounds also allow generic specification with
respect to different granularities of time, e.g., to allow easy updates to model-checking
models, and re-usable specifications for the same requirements on different embedded
systems that may have different resource limits for storing runtime monitors. MLTL has
been used in many industrial case studies [18, 28, 34, 37, 42–44], and was the official logic
of the 2018 Runtime Verification Benchmark Competition [1]. Many specifications from
other case studies, in logics such as MTL [3] and STL [25], can be represented in MLTL.
We intuitively relate MLTL to LTL and MTL-over-naturals as follows: (1) MLTL formulas
are LTL formulas with bounded intervals over temporal operators, and interpreted over
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 3–22, 2019.
https://doi.org/10.1007/978-3-030-25543-5_1
4 J. Li et al.
finite traces. (2) MLTL formulas are MTL-over-naturals formulas without any unbounded
intervals, and interpreted over finite traces.
Despite the practical utility of MLTL, no model checker currently accepts this logic
as a specification language. The model checker nuXmv encodes a related logic for
use in symbolic model checking, where the and ♦ operators of an LTLSPEC can
have integer bounds [21], though bounds cannot be placed on the U or V (the Release
operator of nuXmv) operators.
We also critically need an MLTL satisfiability checker to enable specification debug-
ging. Specification is a major bottleneck to the formal verification of mission-based,
especially autonomous, systems [35], with a key part of the problem being the avail-
ability of good tools for specification debugging. Satisfiability checking is an integral
tool for specification debugging: [38, 39] argued that for every requirement ϕ we need to
check ϕ and ¬ϕ for satisfiability; we also need to check the conjunction of all require-
ments to ensure that they can all be true of the same system at the same time. Spec-
ification debugging is essential to model checking [39–41] because a positive answer
may not mean there is no bug and a negative answer may not mean there is a bug
if the specification is valid/unsatisfiable, respectively. Specification debugging is criti-
cal for synthesis and runtime verification (RV) since in these cases there is no model;
synthesis and RV are both entirely dependent on the specification. For synthesis, sat-
isfiability checking is the best-available specification-debugging technique, since other
techniques, such as vacuity checking (cf. [6, 10]) reference a model in addition to the
specification. While there are artifacts one can use in RV, specification debugging is
still limited outside of satisfiability checking yet central to correct analysis. A false pos-
itive due to RV of an incorrect specification can have disastrous consequences, such as
triggering an abort of an (otherwise successful) mission to Mars. Arguably, the biggest
challenge to creating an RV algorithm or tool is the dearth of benchmarks for check-
ing correctness or comparatively analyzing these [36], where a benchmark consists of
some runtime trace, a temporal logic formula reasoning about that trace, and some ver-
dict designating whether the trace at a given time satisfies the requirement formula. A
MLTL satisfiability solver is useful for RV benchmark generation [22].
Despite the critical need for an MLTL satisfiability solver, no such tool currently
exists. To the best of our knowledge, there is only one available solver (zot [8]) for check-
ing the satisfiability of MTL-over-naturals formulas, interpreted over infinite traces.
Since MLTL formulas are interpreted over finite traces and there is no trivial reduction
from one to another, zot cannot be directly applied to MLTL satisfiability checking.
Our approach is inspired by satisfiability-checking algorithms from other logics.
For LTL satisfiability solving, we observe that there are multiple efficient translations
from LTL satisfiability to model checking, using nuXmv [40]; we therefore consider
here translations to nuXmv model checking, both indirectly (as a translation to LTL),
and directly using the new KLIVE [13] back-end and the BMC back-end, taking advan-
tage of the bounded nature of MLTL. The bounded nature of MLTL enables us to also
consider a direct encoding at the word-level, suitable as input to an SMT solver. Our
contribution is both theoretic and experimental. We first consider the complexity of such
translations. We prove that the MLTL satisfiability checking problem is NEXPTIME-
complete and that satisfiability checking MLTL0 , the variant of MLTL where all inter-
vals start at 0, is PSPACE-complete. Secondly, we introduce translation algorithms
for MLTL-to-LTLf (LTL over finite traces [14]), MLTL-to-LTL, MLTL-to-SMV, and
Satisfiability Checking for Mission-Time LTL 5
MLTL-to-SMT, thus creating four options for MLTL satisfiability checking. Our results
show that the MLTL-to-SMT transition with the Z3 SMT solver offers the most scal-
able performance, though the MLTL-to-SMV translation with an SMV model checker
can offer the best performance when the intervals in the MLTL formulas are restricted
to small ranges less than 100.
2 Preliminaries
– π |= p iff p ∈ π[0];
– π |= ¬ϕ iff π |= ϕ;
– π |= ϕ ∧ ψ iff π |= ϕ and π |= ψ;
– π |= ϕ U[a,b] ψ iff |π| > a and, there exists i ∈ [a, b], i < |π| such that πi |= ψ and
for every j ∈ [a, b], j < i it holds that πj |= ϕ;
In the above reduction, ϕ is in BNF. Since the reduction is linear in the size of the
original LTLf formula and LTL-SAT is PSPACE-complete [45], LTLf -SAT is also a
PSPACE-complete problem [14].
Satisfiability Checking for Mission-Time LTL 7
3 Complexity of MLTL-SAT
It is known that the complexity of MITL (Metric Interval Temporal Logic) satisfiabil-
ity is EXPSPACE-complete, and the satisfiability complexity of the fragment of MITL
named MITL0,∞ is PSPACE-complete [2]. MLTL (resp. MLTL0 ) can be viewed as a
variant of MITL (resp. MITL0,∞ ) that is interpreted over the naturals. We show that
MLTL satisfiability checking is NEXPTIME-complete, via a reduction from MLTL to
LTLf .
Lemma 1. Let ϕ be an MLTL formula, and K be the maximal natural appearing in the
intervals of ϕ (K is set to 1 if there are no intervals in ϕ). There is an LTLf formula θ
that recognizes the same language as ϕ. Moreover, the size of θ is in O(K · |cl(ϕ)|).
Proof (Sketch). For an MLTL formula ϕ, we define the LTLf formula f (ϕ) recursively
as follows:
⎧
⎪
⎨X (f (ξ U[a−1,b−1] ψ)), if 0 < a ≤ b;
f (ϕ) = f (ψ) ∨ (f (ξ) ∧ X (f (ξU[a,b−1] ψ))), if a = 0 and 0 < b;
⎪
⎩
f (ψ), if a = 0 and b = 0;
X represents the neXt operator in LTLf . Let θ = f (ϕ); we can prove by induction
that ϕ and θ accept the same language. Moreover, the size of θ is at most linear to
K · |cl(ϕ)|, i.e., in O(K · |cl(ϕ)|), based on the aforementioned construction.
We use the construction shown in Lemma 1 to explore several useful properties of
MLTL. For instance, the LTLf formula translated from an MLTL formula contains only
the X temporal operator or its dual N , which represents weak Next [19, 23], and the
number of these operators is strictly smaller than K · |cl(ϕ)|. Every X or N subformula
in the LTLf formula corresponds to some temporal formula in cl∗ (ϕ). Notably, because
the natural-number intervals in ϕ are written in base 10 (decimal) notation, the blow-up
in the translation of Lemma 1 is exponential.
The next lower bound is reminiscent of the NEXPTIME-lower bound shown in [31]
for a fragment of Metric Interval Temporal Logic (MITL), but is different in the details
of the proof as the two logics are quite different.
Proof (Sketch). By Lemma 1, there is an LTLf formula θ that accepts the same traces
as MLTL formula ϕ, and the size of θ is in O(K · |cl(ϕ)|). The only temporal connec-
tives used in θ are X and N , since the translation to LTLf reduces all MLTL temporal
connectives in ϕ to nested X ’s or N ’s (produced by simplifying ¬X ). Thus, if θ is
satisfiable, then it is satisfiable by a trace whose length is bounded by the length of θ.
8 J. Li et al.
Thus, we can just guess a trace π of exponential length of θ and check that it satisfies
ϕ. As a result, the upper bound for MLTL-SAT is NEXPTIME.
Before proving the NEXPTIME lower bound, recall the PSPACE-lower bound
proof in [45] for LTL satisfiability. The proof reduces the acceptance problem for a
linear-space bounded Turing machine M to LTL satisfiability. Given a Turing machine
M and an integer k, we construct a formula ϕM such that ϕM is satisfiable iff M
accepts the empty tape using k tape cells. The argument is that we can encode such a
space-bounded computation of M by a trace π of length ck for some constant c, and
then use ϕM to force π to encode an accepting computation of M . The formula ϕM
has to match corresponding points in successive configurations of M , which can be
expressed using a O(k)-nested X ’s, since such points are O(k) points apart.
To prove a NEXPTIME-lower bound for MLTL, we reduce the acceptance problem
for exponentially bounded non-deterministic Turing machines to MLTL satisfiability.
Given a non-deterministic Turing machine M and an integer k, we construct an MLTL
formula ϕM of length O(k) such that ϕM is satisfiable iff M accepts the empty tape in
time 2k . Note that such a computation of a 2k -time bounded Turing machines consists of
2k many configurations of length 2k each, so the whole computation is of exponential
length – 4k , and can be encoded by a trace π of length 4k , where every point of π
encodes one cell in the computation of M . Unlike the reduction in [45], in the encoding
here corresponding points in successive configurations are exponentially far (2k ) from
each other, because each configuration has 2k cells, so the relationship between such
successive points cannot be expressed in LTL. Because, however, the constants in the
intervals of MLTL are written in base-10 (decimal) notation, we can write formulas of
size O(k), e.g., formulas of the form p U[0,2k ] q, that relate points that are 2k apart.
The key is to express the fact that one Turing machine configuration is a proper
successor of another configuration using a formula of size O(k). In the PSPACE-lower-
bound proof of [45], LTL formulas of size O(k) relate successive configurations of
k-space-bounded machines. Here MLTL formulas of size O(k) relate successive con-
figurations of 2k -time-bounded machines. Thus, we can write a formula ϕM of length
O(k) that forces trace π to encode a computation of M of length 2k .
Now we consider MLTL0 formulas, and prove that the complexity of checking the
satisfiability of MLTL0 formulas is PSPACE-complete. We first introduce the following
lemma to show an inherent feature of MLTL0 formulas.
Lemma 2. The conjunction of identical MLTL0 U-rooted formulas is equivalent to the
conjunct with the smallest interval range: (ξ U[0,a] ψ) ∧ (ξ U[0,b] ψ) ≡ (ξ U[0,a] ψ),
where b > a.
(ξ U[0,k+1] ψ) ∧ (ξ U[0,k+2] ψ)
≡ (f (ψ) ∨ (f (ξ) ∧ X (ξ U[0,k] ψ))) ∧ (f (ψ) ∨ (f (ξ) ∧ X (ξ U[0,k+1] ψ)))
≡ f (ψ) ∨ (f (ξ) ∧ X (ξ U[0,k] ψ ∧ ξ U[0,k+1] ψ))
≡ f (ψ) ∨ (f (ξ) ∧ X (ξ U[0,k] ψ))
≡ (ξ U[0,k+1] ψ).
Proof. According to [45], the satisfiability checking of X -free LTL formulas is still
PSPACE-complete. This also applies to the satisfiability checking of X -free LTLf for-
mulas. Given an X -free LTLf formula ϕ, we construct the corresponding MLTL formula
m(ϕ) recursively as follows:
Notably for the Until LTLf formula, we bound it with the interval [0, 2|ϕ| ], where
ϕ is the original X -free LTLf formula, in the corresponding MLTL formula, which is
motivated by the fact that every satisfiable LTLf formula has a finite model whose length
is less than 2|ϕ| [14]. The above translation has linear blow-up, because the integers in
intervals use the decimal notation. Now we prove by induction over the type of ϕ that
ϕ is satisfiable iff m(ϕ) is satisfiable. That is, we prove that (⇒) π |= ϕ implies
π |= m(ϕ) and (⇐) π |= m(ϕ) implies π |= ϕ, for some finite trace π.
We consider the Until formula η = ξ U ψ (noting that ϕ is fixed to the original
LTLf formula), and the proofs are trivial for other types. (⇒) η is satisfiable implies
there is a finite trace π such that π |= η and |π| ≤ 2|ϕ| [14]. Moreover, π |= η holds
iff there is 0 ≤ i such that πi |= ψ and for every 0 ≤ j < i, πj |= ξ is true (from
LTLf semantics). By the induction hypothesis, πi |= ψ implies πi |= m(ψ) and πj |= ξ
implies πj |= m(ξ). Also, i ≤ 2|ϕ| is true because of |π| ≤ 2|ϕ| . As a result, π |= η
implies that there is 0 ≤ i ≤ 2|ϕ| such that πi |= m(ψ) and for every 0 ≤ j < i,
πj |= m(ξ) is true. According to the MLTL semantics, π |= m(η) is true. (⇐) m(η)
is satisfiable implies there is a finite trace π such that π |= m(η). According to MLTL
semantics, there is 0 ≤ i ≤ 2|ϕ| such that πi |= m(ψ) and for every 0 ≤ j < i it
holds that πj |= m(ξ). By hypothesis assumption, πi |= m(ψ) implies πi |= ψ and
πj |= m(ξ) implies πj |= ξ. Also, 0 ≤ i ≤ 2|ϕ| implies 0 ≤ i. As a result, π |= m(η)
implies that there is 0 ≤ i such that πi |= ψ and for every 0 ≤ j < i it holds that
πj |= ξ. From LTLf semantics, it is true that π |= η.
10 J. Li et al.
Proof. Since Lemma 3 shows a linear reduction from X -free LTLf -SAT to MLTL0 -
SAT and X -free LTLf -SAT is PSPACE-complete [14], it directly implies that the lower
bound of MLTL0 -SAT is PSPACE-hard.
For the upper bound, recall from the proof of Theorem 1 that an MLTL formula ϕ is
translated to an LTLf formula θ of length K ·|cl(ϕ)|, which, as we commented, involved
an exponential blow-up in the notation for K. Following the automata-theoretic app-
roach for satisfiability, one would translate θ to an NFA and check its non-emptiness
[14]. Normally, such a translation would involve another exponential blow-up. We show
that this is not the case for MLTL0 . Recalling from the automaton construction in [14]
that every state of the automaton is a set of subformulas of θ, the size of a state is at
most K · |cl(ϕ)|. In the general case, if ψ1 , ψ2 are two subformulas of θ corresponding
to the MLTL formulas ξ UI1 ψ and ξ UI2 ψ, ψ1 and ψ2 can be in the same state of the
automaton, which implies that the size of the state can be at most K · |cl(ϕ)|. When the
formula ϕ is restricted to MLTL0 , we show that the exponential blow-up can be avoided.
Lemma 2 shows that either ψ1 or ψ2 in the state is enough, since assuming I1 ⊆ I2 ,
then (ψ1 ∧ ψ2 ) ≡ ψ1 , by Lemma 2. So the size of the state in the automaton for a
MLTL0 formula ϕ is at most |cl(ϕ)|. For each subformula in the state, there can be K
possible values (e.g., for ♦I ξ in the state, we can have ♦[0,1] ξ, ♦[0,2] ξ, etc.). Therefore
the size of the automaton is in O(2|cl(ϕ)| · K |cl(ϕ)| ) ≈ 2O(|cl(ϕ)|) . Therefore, MLTL0
satisfiability checking is a PSPACE-complete problem.
4 Implementation of MLTL-SAT
We first show how to reduce MLTL-SAT to the well-explored LTLf -SAT and LTL-
SAT. Then we introduce two new satisfiability-checking strategies based on the inherent
properties of MLTL formulas, which are able to leverage the state-of-art model-checking
and SMT-solving techniques.
For a formula ϕ from one logic, and ψ from another logic, we say ϕ and ψ are equi-
satisfiable when ϕ is satisfiable under its semantics iff ψ is satisfiable under its seman-
tics. Based on Lemma 1 and Theorem 1, we have the following corollary,
Corollary 1 (MLTL-SAT to LTLf -SAT). MLTL-SAT can be reduced to LTLf -SAT
with an exponential blow-up.
From Corollary 1, MLTL-SAT is reducible to LTLf -SAT, enabling use of the off-
the-shelf LTLf satisfiability solvers, cf. aaltaf [23]. It is also straightforward to consider
MLTL-SAT via LTL-SAT; LTL-SAT has been studied for more than a decade, and there
many off-the-shelf LTL solvers are available, cf. [24, 38, 40].
Satisfiability Checking for Mission-Time LTL 11
Theorem 3 (MLTL to LTL). For an MLTL formula ϕ, there is an LTL formula θ such
that ϕ and θ are equi-satisfiable, and the size of θ is in O(K · |cl(ϕ)|), where K is the
maximal integer in ϕ.
Proof. Lemma 1 provides a translation from the MLTL formula ϕ to the equivalent
LTLf formula ϕ , with a blow-up of O(K · |cl(ϕ)|). As shown in Sect. 2, there is a
linear translation from the LTLf formula ϕ to its equi-satisfiable LTL formula θ [14].
Therefore, the blow-up from ϕ to θ is in O(K · |cl(ϕ)|).
Corollary 2 (MLTL-SAT to LTL-SAT). MLTL-SAT can be reduced to LTL-SAT with
an exponential blow-up.
Since MLTL-SAT is reducible to LTL-SAT, MLTL-SAT can also benefit from the
power of LTL satisfiability solvers. Moreover, the reduction from MLTL-SAT to LTL-
SAT enables leveraging modern model-checking techniques to solve the MLTL-SAT
problem, due to the fact that LTL-SAT has been shown to be reducible to model check-
ing with a linear blow-up [38, 39].
Corollary 3 (MLTL-SAT to LTL-Model-checking). MLTL-SAT can be reduced to
LTL model checking with an exponential blow-up.
In our implementation, we choose the model checker nuXmv [12] for LTL sat-
isfiability checking, as it allows an LTL formula to be directly input as the temporal
specification together with a universal model as described in [38, 39].
– A Boolean formula e(ψ) is used to represent the formula ψ in cl∗ (ϕ) in the SMV
model, which is defined recursively as follows.
1. e(ψ) = ψ, if ψ is an Boolean atom;
2. e(ψ) = ¬e(ψ1 ), if ψ = ¬ψ1 ;
3. e(ψ) = e(ψ1 ) ∧ e(ψ2 ), if ψ = ψ1 ∧ ψ2 ;
4. e(ψ) = T ψ, if ψ is an U formula.
– Let the initial Boolean formula of the system Sys be e(ϕ).
– For each temporary variable T ψ, create a DEFINE statement according to the type
and interval of ψ, as follows.
⎧
⎪
⎨X (ψ1 U[a−1,b−1] ψ2 ), if 0 < a ≤ b;
Tψ1 U[a,b] ψ2 = e(ψ2 ) ∨ (e(ψ1 ) ∧ X (ψ1 U[0,b−1] ψ2 )), if a = 0 and 0 < b;
⎪
⎩
e(ψ2 ), if a = 0 and b = 0.
– Create the Boolean formula (X ψ ↔ (¬T ail ∧ next(e(ψ)))) for each X ψ in the
VAR list (the set V in Sys) of the SMV model.
– Finally, designate the LTL formula ¬T ail as the temporal specification of the SMV
model Mϕ (which implies that a counterexample trace satisfies ♦T ail).
Encoding Heuristics for MLTL0 Formulas. We also encode the rules shown in Lemma
2 to prune the state space for checking the satisfiability of MLTL0 formulas. These rules
are encoded using the INVAR constraint in the SMV model. Taking the U formula
as an example, we encode T (ψ1 U[0,a] ψ2 ) ∧ T (ψ1 U[0,a−1] ψ2 ) ↔ T (ψ1 U[0,a−1] ψ2 )
(a > 0) for each ψ1 U[0,a] ψ2 in cl∗ (ϕ). Similar encodings also apply to the R formulas
in cl∗ (ϕ). Theorem 4 below guarantees the correctness of the translation, and it can be
proved by induction over the type of ϕ and the construction of the SMV model.
Theorem 4. The MLTL formula ϕ is satisfiable iff the corresponding SMV model Mϕ
violates the LTL property ¬T ail.
There are different techniques that can be used for LTL model checking. Based
on the latest evaluation of LTL satisfiability checking [24], the KLIVE [13] back-end
implemented in the SMV model checker nuXmv [12] produces the best performance.
We thus choose KLIVE as our model-checking technique for MLTL-SAT.
Bounded MLTL-SAT. Although MLTL-SAT is reducible to the satisfiability problem of
other well-explored logics, with established off-the-shelf satisfiability solvers, a dedi-
cated solution based on inherent properties of MLTL may be superior. One intuition is,
since all intervals in MLTL formulas are bounded, the satisfiability of the formula can
be reduced to Bounded Model Checking (BMC) [9].
Theorem 5. Given an MLTL formula ϕ with K as the largest natural in the intervals
of ϕ, ϕ is satisfiable iff there is a finite trace π with |π| ≤ K · |cl(ϕ)| such that π |= ϕ.
Theorem 5 states that the satisfiability of a given MLTL formula can be reduced to
checking for the existence of a satisfying trace. To apply the BMC technique in nuXmv,
we compute and set the maximal depth of BMC to be the value of K · |cl(ϕ)| for a given
MLTL formula ϕ. The input SMV model for BMC is still Mϕ , as described in Sect. 4.2.
Satisfiability Checking for Mission-Time LTL 13
Proof. Let the alphabet of ϕ be Σ, and π ∈ (2Σ )∗ be a finite trace. For each p ∈ Σ,
we define the function fp : Int → Bool as follows: fp (k) = true iff p ∈ π[k] if
0 ≤ k < |π|. We now prove by induction over the type of ϕ and the construction
of fol(ϕ, k, len) with respect to ϕ that πk |= ϕ holds iff {fp |p ∈ Σ} is a model of
fol(ϕ, k, |π|): here |π| is the length of π. The cases when ϕ is true or false are trivial.
Theorem 7. The First-Order Logic formula ∃len.f ol(ϕ, 0, len) is satisfiable iff the
SMT solver returns SAT with the input SMT(ϕ).
An inductive proof for the theorem can be conducted according to the construc-
tion of SMT(ϕ). Notably, there is no difference between the SMT encoding for MLTL
formulas and that for MLTL0 formulas, as the SMT-based encoding does not require
unrolling the temporal operators in the formula.
5 Experimental Evaluations
All experiments were executed on Rice University’s NOTS cluster,4 running Red-
Hat 5, with 226 dual socket compute blades housed within HPE s6500, HPE Apollo
2000, and Dell PowerEdge C6400 chassis. All the nodes are interconnected with 10
GigE network. Each satisfiability check over one MLTL formula and one solver was
executed with exclusive access to one CPU and 8 GB RAM with a timeout of one hour,
as measured by the Linux time command. We assigned a time penalty of one hour to
benchmarks that segmentation fault or timeout.
Experimental Goals. We evaluate performance along three metrics. (1) Each satisfia-
bility check has two parts: the encoding time (consumed by MLTLconverter) and the
solving time (consumed by solvers). We evaluate how each encoding affects the per-
formance of both stages of MLTL-SAT. (2) We comparatively analyze the performance
and scalability of end-to-end MLTL-SAT via LTL-SAT, LTLf -SAT, LTL model check-
ing, and our new SMT-based approach. (3) We evaluate the performance and scalability
for MLTL0 satisfiability checking using MLTL0 -SAT encoding heuristics (Lemma 2).
Benchmarks. There are few MLTL (or even MTL-over-naturals) benchmarks available
for evaluation. Previous works on MTL-over-naturals [2–4] mainly focus on the theo-
retic exploration of the logic. To enable rigorous experimental evaluation, we develop
three types of benchmarks, motivated by the generation of LTL benchmarks [38].5
(1) Random MLTL Formulas (R): We generated 10,000 R formulas, varying the formula
length L (20, 40, 60, 80, 100), the number of variables N (1, 2, 3, 4, 5), and the prob-
ability of the appearance of the U operator P (0.33, 0,5, 0.7, 0.95); for each (L, N, P )
we generated 100 formulas. For every U operator, we randomly chose an interval [i, j]
where i ≥ 0 and j ≤ 100.
500 10000
LTL-SAT LTL-SAT
LTLf-SAT LTLf-SAT
SMV BMC
Accumulated Encoding Time (min)
300 6000
200 4000
100 2000
0 0
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000
Number of Formulas Number of Formulas
Fig. 2. Cactus plot for different MLTL encod- Fig. 3. Cactus plot for different MLTL solv-
ings on R formulas: LTL-SAT and LTLf -SAT ing approaches on R formulas: LTL-SAT and
lines overlap; SMV and SMT lines overlap. LTLf -SAT lines overlap.
4
https://docs.rice.edu/confluence/display/CD/NOTS+Overview.
5
All experimental materials are at http://temporallogic.org/research/CAV19/. The plots are best
viewed online.
Satisfiability Checking for Mission-Time LTL 17
(2) NASA-Boeing MLTL Formulas (NB): We use challenging benchmarks [15] created
from projects at NASA [17, 26] and Boeing [11]. We extract 63 real-life LTL require-
ments from the SMV models of the benchmarks, and then randomly generate an interval
for each temporal operator. (We replace each X with [1,1] .) We create 3 groups of such
formulas (63 in each) to test the scalability of different approaches, by restricting the
maximal number of the intervals to be 1,000, 10,000, and 100,000 respectively.
(3) Random MLTL0 Formulas (R0): We generated 500 R0 formulas in the same way
as the R formulas, except that every generated interval was restricted to start from 0;
we generated sets of five for each (L, N, P ). This small set of R benchmarks serve
to compare the performance on MLTL0 formulas whose SMV encodings were created
with/without heuristics.
Correctness Checking. We compared the verdicts from all solvers for every test
instance and found no inconsistencies, excluding segmentation faults. This exercise
aided with verification of our implementations of the translators, including diagnosing
the need for including FAIRNESS TRUE in BMC models.
Experimental Results. Figure 2 compares encoding times for the R benchmark for-
mulas. We find that (1) Encoding MLTL as either LTL and LTLf is not scalable even
when the intervals in the formula are small; (2) The cost of MLTL-to-SMV encoding is
comparable to that from MLTL to SMT-LIB v2. Although the cost of encoding MLTL
as LTL/LTLf and SMV are in O(K · |cl(ϕ)|), where K is the maximal interval length
in ϕ, the practical gap between the LTL/LTLf encodings and SMV encoding affirms
our conjecture that the SMV model is more compact in general than the corresponding
LTL/LTLf formulas. Also because K is kept small in the R formulas, the encoding cost
between SMV and SMT-LIB v2 becomes comparable.
Figure 3 shows total satisfiability checking times for R benchmarks. Recall that the
inputs of both BMC and KLIVE approaches are SMV models. The MLTL-SAT via
KLIVE is the fastest solving strategy for MLTL formulas with interval ranges of less
than 100. The portion of satisfiable/unsatisfiable formulas of this benchmark is approx-
imate 4/1. Although BMC is known to be good at detecting counterexamples with short
lengths, it does not perform as well as the KLIVE and SMT approaches on checking
satisfiable formulas since only longer counterexamples (with length greater than 1000)
exist for most of these formulas. While nuXmv successfully checked all such models,
Fig. 4 shows that increasing the interval range constraint results in segmentation faults;
more than half of our benchmarks produced this outcome for formulas with allowed
interval ranges of up to 600. Meanwhile, the solving solutions via LTL-SAT/LTLf -SAT
are definitely not competitive for any interval range.
18 J. Li et al.
300 10
BMC-1000 BMC
KLIVE-1000
Total Time with Encoding Heuristics (min)
KLIVE
Z3-1000
250 Z3-10000
Accumulated Total Time (min)
Z3-100000 8
200
6
150
4
100
2
50
0 0
0 10 20 30 40 50 60 0 2 4 6 8 10
Number of Formulas Total Time without Encoding Heuristics (min)
Fig. 5. Cactus plot for BMC,KLIVE and SMT- Fig. 6. Scatter plot for both the BMC and
solving approaches on the NB benchmarks; KLIVE approaches to checking MLTL0 formu-
BMC and KLIVE overlap. las ith/without encoding heuristics.
We summarize with three conclusions. (1) For satisfiability checking of MLTL for-
mulas, the new SMT-based approach is best. (2) For satisfiability checking of MLTL
formulas with interval ranges less than 100, the MLTL-SAT via KLIVE approach is
fastest. (3) The dedicated encoding heuristics for MLTL0 do not significantly improve
the satisfiability checking time of MLTL0 -SAT over MLTL-SAT. They do not solve the
nuXmv scalability problem.
Acknowledgment. We thank anonymous reviewers for their helpful comments. This work is
supported by NASA ECF NNX16AR57G, NSF CAREER Award CNS-1552934, NSF grants IIS-
1527668, IIS-1830549, and by NSF Expeditions in Computing project “ExCAPE: Expeditions in
Computer Augmented Program Engineering.”
20 J. Li et al.
References
1. Runtime Verification Benchmark Competition (2018). https://www.rv-competition.org/
2018-2/
2. Alur, R., Feder, T., Henzinger, T.A.: The benefits of relaxing punctuality. J. ACM 43(1),
116–146 (1996)
3. Alur, R., Henzinger, T.A.: Real-time logics: complexity and expressiveness. In: LICS, pp.
390–401. IEEE (1990)
4. Alur, R., Henzinger, T.A.: A really temporal logic. J. ACM 41(1), 181–204 (1994)
5. Alur, R., Henzinger, T.A.: Reactive modules. In: Proceedings of the 11th IEEE Symposium
on Logic in Computer Science, pp. 207–218 (1996)
6. Armoni, R., Fix, L., Flaisher, A., Grumberg, O., Piterman, N., Vardi, M.Y.: Enhanced vacuity
detection in linear temporal logic. In: Hunt, W.A., Somenzi, F. (eds.) CAV 2003. LNCS, vol.
2725, pp. 368–380. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45069-
6 35
7. Barrett, C., Stump, A., Tinelli, C.: The SMT-LIB standard: version 2.0. In: Workshop on
Satisfiability Modulo Theories (2010)
8. Bersani, M., Rossi, M., San Pietro, P.: An SMT-based approach to satisfiability checking of
MITL. Inf. Comput. 245(C), 72–97 (2015)
9. Biere, A., Cimatti, A., Clarke, E., Zhu, Y.: Symbolic model checking without BDDs. In:
Cleaveland, W.R. (ed.) TACAS 1999. LNCS, vol. 1579, pp. 193–207. Springer, Heidelberg
(1999). https://doi.org/10.1007/3-540-49059-0 14
10. Bloem, R., Chockler, H., Ebrahimi, M., Strichman, O.: Synthesizing non-vacuous sys-
tems. In: Bouajjani, A., Monniaux, D. (eds.) VMCAI 2017. LNCS, vol. 10145, pp. 55–72.
Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52234-0 4
11. Bozzano, M., et al.: Formal design and safety analysis of AIR6110 wheel brake system.
In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015, Part I. LNCS, vol. 9206, pp. 518–535.
Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21690-4 36
12. Cavada, R., et al.: The NU X MV symbolic model checker. In: Biere, A., Bloem, R. (eds.) CAV
2014. LNCS, vol. 8559, pp. 334–342. Springer, Cham (2014). https://doi.org/10.1007/978-
3-319-08867-9 22
13. Claessen, K., Sörensson, N.: A liveness checking algorithm that counts. In: FMCAD, pp.
52–59. IEEE (2012)
14. De Giacomo, G., Vardi, M.: Linear temporal logic and linear dynamic logic on finite traces.
In: IJCAI, pp. 2000–2007. AAAI Press (2013)
15. Dureja, R., Rozier, K.Y.: More scalable LTL model checking via discovering design-space
dependencies (D3 ). In: Beyer, D., Huisman, M. (eds.) TACAS 2018, Part I. LNCS, vol.
10805, pp. 309–327. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-89960-2 17
16. Furia, C.A., Spoletini, P.: Tomorrow and all our yesterdays: MTL satisfiability over the inte-
gers. In: Fitzgerald, J.S., Haxthausen, A.E., Yenigun, H. (eds.) ICTAC 2008. LNCS, vol.
5160, pp. 126–140. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85762-
49
17. Gario, M., Cimatti, A., Mattarei, C., Tonetta, S., Rozier, K.Y.: Model checking at scale:
automated air traffic control design space exploration. In: Chaudhuri, S., Farzan, A. (eds.)
CAV 2016, Part II. LNCS, vol. 9780, pp. 3–22. Springer, Cham (2016). https://doi.org/10.
1007/978-3-319-41540-6 1
18. Geist, J., Rozier, K.Y., Schumann, J.: Runtime observer pairs and bayesian network reasoners
on-board FPGAs: flight-certifiable system health management for embedded systems. In:
Bonakdarpour, B., Smolka, S.A. (eds.) RV 2014. LNCS, vol. 8734, pp. 215–230. Springer,
Cham (2014). https://doi.org/10.1007/978-3-319-11164-3 18
Satisfiability Checking for Mission-Time LTL 21
19. De Giacomo, G., Vardi, M.: Synthesis for LTL and LDL on finite traces. In: IJCAI, pp.
1558–1564 (2015)
20. Hustadt, U., Ozaki, A., Dixon, C.: Theorem proving for metric temporal logic over the natu-
rals. In: de Moura, L. (ed.) CADE 2017. LNCS (LNAI), vol. 10395, pp. 326–343. Springer,
Cham (2017). https://doi.org/10.1007/978-3-319-63046-5 20
21. Kessler, F.B.: nuXmv 1.1.0 (2016-05-10) Release Notes (2016). https://es-static.fbk.eu/tools/
nuxmv/downloads/NEWS.txt
22. Li, J., Rozier, K.Y.: MLTL benchmark generation via formula progression. In: Colombo, C.,
Leucker, M. (eds.) RV 2018. LNCS, vol. 11237, pp. 426–433. Springer, Cham (2018). https://
doi.org/10.1007/978-3-030-03769-7 25
23. Li, J., Zhang, L., Pu, G., Vardi, M.Y., He, J.: LTLf satisfibility checking. In: ECAI, pp. 91–98
(2014)
24. Li, J., Zhu, S., Pu, G., Vardi, M.Y.: SAT-based explicit LTL reasoning. In: Piterman, N. (ed.)
HVC 2015. LNCS, vol. 9434, pp. 209–224. Springer, Cham (2015). https://doi.org/10.1007/
978-3-319-26287-1 13
25. Maler, O., Nickovic, D.: Monitoring temporal properties of continuous signals. In: Lakhnech,
Y., Yovine, S. (eds.) FORMATS/FTRTFT 2004. LNCS, vol. 3253, pp. 152–166. Springer,
Heidelberg (2004). https://doi.org/10.1007/978-3-540-30206-3 12
26. Mattarei, C., Cimatti, A., Gario, M., Tonetta, S., Rozier, K.Y.: Comparing different functional
allocations in automated air traffic control design. In: Proceedings of Formal Methods in
Computer-Aided Design (FMCAD 2015), Austin, Texas, USA. IEEE/ACM, September 2015
27. McMillan, K.: Symbolic model checking: an approach to the state explosion problem. Ph.D.
thesis, Carnegie Mellon University, Pittsburgh, PA, USA (1992). UMI Order No. GAX92-
24209
28. Moosbrugger, P., Rozier, K.Y., Schumann, J.: R2U2: monitoring and diagnosis of security
threats for unmanned aerial systems. In: FMSD, pp. 1–31, April 2017
29. de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J.
(eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://
doi.org/10.1007/978-3-540-78800-3 24
30. Ouaknine, J., Worrell, J.: Some recent results in metric temporal logic. In: Cassez, F., Jard,
C. (eds.) FORMATS 2008. LNCS, vol. 5215, pp. 1–13. Springer, Heidelberg (2008). https://
doi.org/10.1007/978-3-540-85778-5 1
31. Pandya, P.K., Shah, S.S.: The unary fragments of metric interval temporal logic: bounded
versus lower bound constraints. In: Chakraborty, S., Mukund, M. (eds.) ATVA 2012. LNCS,
pp. 77–91. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33386-6 8
32. Pnueli, A.: The temporal logic of programs. In: IEEE FOCS, pp. 46–57 (1977)
33. Pradella, M., Morzenti, A., Pietro, P.: Bounded satisfiability checking of metric temporal
logic specifications. ACM Trans. Softw. Eng. Methodol. 22(3), 20:1–20:54 (2013)
34. Reinbacher, T., Rozier, K.Y., Schumann, J.: Temporal-logic based runtime observer pairs
for system health management of real-time systems. In: Ábrahám, E., Havelund, K. (eds.)
TACAS 2014. LNCS, vol. 8413, pp. 357–372. Springer, Heidelberg (2014). https://doi.org/
10.1007/978-3-642-54862-8 24
35. Rozier, K.Y.: Specification: the biggest bottleneck in formal methods and autonomy. In:
Blazy, S., Chechik, M. (eds.) VSTTE 2016. LNCS, vol. 9971, pp. 8–26. Springer, Cham
(2016). https://doi.org/10.1007/978-3-319-48869-1 2
36. Rozier, K.Y.: On the evaluation and comparison of runtime verification tools for hardware
and cyber-physical systems. In: RV-CUBES, vol. 3, pp. 123–137. Kalpa Publications (2017)
37. Rozier, K.Y., Schumann, J., Ippolito, C.: Intelligent hardware-enabled sensor and software
safety and health management for autonomous UAS. Technical Memorandum NASA/TM-
2015-218817, NASA Ames Research Center, Moffett Field, CA 94035, May 2015
22 J. Li et al.
38. Rozier, K.Y., Vardi, M.Y.: LTL satisfiability checking. In: Bošnački, D., Edelkamp, S. (eds.)
SPIN 2007. LNCS, vol. 4595, pp. 149–167. Springer, Heidelberg (2007). https://doi.org/10.
1007/978-3-540-73370-6 11
39. Rozier, K.Y., Vardi, M.Y.: LTL satisfiability checking. Int. J. Softw. Tools Technol. Transf.
12(2), 123–137 (2010)
40. Rozier, K.Y., Vardi, M.Y.: A multi-encoding approach for LTL symbolic satisfiability check-
ing. In: Butler, M., Schulte, W. (eds.) FM 2011. LNCS, vol. 6664, pp. 417–431. Springer,
Heidelberg (2011). https://doi.org/10.1007/978-3-642-21437-0 31
41. Rozier, K.Y., Vardi, M.Y.: Deterministic compilation of temporal safety properties in explicit
state model checking. In: Biere, A., Nahir, A., Vos, T. (eds.) HVC 2012. LNCS, vol. 7857,
pp. 243–259. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39611-3 23
42. Schumann, J., Moosbrugger, P., Rozier, K.Y.: R2U2: monitoring and diagnosis of security
threats for unmanned aerial systems. In: Bartocci, E., Majumdar, R. (eds.) RV 2015. LNCS,
vol. 9333, pp. 233–249. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23820-
3 15
43. Schumann, J., Moosbrugger, P., Rozier, K.Y.: Runtime analysis with R2U2: a tool exhibi-
tion report. In: Falcone, Y., Sánchez, C. (eds.) RV 2016. LNCS, vol. 10012, pp. 504–509.
Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46982-9 35
44. Schumann, J., Rozier, K.Y., Reinbacher, T., Mengshoel, O.J., Mbaya, T., Ippolito, C.:
Towards real-time, on-board, hardware-supported sensor and software health management
for unmanned aerial systems. IJPHM 6(1), 1–27 (2015)
45. Sistla, A.P., Clarke, E.M.: The complexity of propositional linear temporal logic. J. ACM 32,
733–749 (1985)
Open Access This chapter is licensed under the terms of the Creative Commons Attribution
4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use,
sharing, adaptation, distribution and reproduction in any medium or format, as long as you
give appropriate credit to the original author(s) and the source, provide a link to the Creative
Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder.
High-Level Abstractions for Simplifying
Extended String Constraints in SMT
1 Introduction
Satisfiability Modulo Theories (SMT) solvers [4,12]; some are built (externally)
on top of such solvers [9,16,19]; and others are independent of SMT solvers [23].
A major challenge in developing solvers for unbounded string constraints is
the complex semantics of extended string functions beyond the basic operations
of string concatenation and equality. Extended functions include replace, which
replaces a string in another string, and indexof, which returns the position of
a string in another string. Another challenge is that constraints using extended
functions are often combined with constraints over other theories, e.g. integer
constraints over string lengths or applications of indexof, which requires the
involvement of solvers for those theories. Current string solvers address these
challenges by reducing constraints with extended string functions to typically
more verbose constraints over basic functions. As with every reduction, some of
the higher level structure of the problem may be lost, with negative repercussions
on the performance and scalability.
To address this issue, we have developed new techniques that reason about
constraints with extended string operators before they are reduced to simpler
ones. This analysis of complex terms can often eliminate the need for expen-
sive reductions. The techniques are based on reasoning about relationships over
strings with high-level abstractions, such as their arithmetic relationships (e.g.,
reasoning about their length), their string containment relationships, and their
relationships as multisets of characters. We have implemented these techniques
in cvc4, an SMT solver with native support for string reasoning. An experimen-
tal evaluation with benchmarks from various applications shows that our new
techniques allows cvc4 to significantly outperform other state-of-the-art solvers
that target extended string constraints.
Our main contributions are:
bounded strings such as HAMPI [9]. Bjørner et al. [5] proposed native support
for extended string operators in string solvers for scaling symbolic execution
of .NET code. They reduce extended string functions to basic ones after get-
ting bounds for string lengths from an integer solver. They also showed that
constraints involving unbounded strings and replace are undecidable. PASS [11]
reduces string constraints over extended functions to arrays. Z3-str and its suc-
cessors [4,24,25] reduce extended string functions to basic functions eagerly dur-
ing preprocessing. S3 [18] reduces recursive functions such as replace incremen-
tally by splitting and unfolding. Its successor S3P [19] refines this reduction
by pruning the resulting subproblems for better performance. cvc4 [3] reduces
constraints with extended functions lazily and leverages context-dependent sim-
plifications to simplify the reductions [15]. Trau [1] reduces certain extended
functions, such as replace, to context-free membership constraints. Ostrich [7]
implements a decision procedure for a subset of constraints that include extended
string functions. The simplification techniques presented in this paper are agnos-
tic to the underlying solving procedure, so they can be combined with all of these
approaches.
2 Preliminaries
We work in the context of many-sorted first-order logic with equality and assume
the reader is familiar with the notions of signature, term, literal, formula, and
formal interpretation of formulas. We review a few relevant definitions in the
following. A theory is a pair T “ (Σ, I) where Σ is a signature and I is a class of
Σ-interpretations, the models of T . We assume Σ contains the equality predicate
«, interpreted as the identity relation, and the predicates J (for true) and K
(for false). A Σ-formula ϕ is satisfiable (resp., unsatisfiable) in T if it is satisfied
by some (resp., no) interpretation in I. We write |“T ϕ to denote that the
Σ-formula ϕ is T -valid, i.e., is satisfied in every model of T . Two Σ-terms t1
and t2 are equivalent in T if |“T t1 « t2 .
We consider an extended theory TS of strings and length equations, whose
signature ΣS is given in Fig. 1 and whose models differ only on how they inter-
pret variables.1 We assume a fixed finite alphabet A of characters which includes
the digits {0, . . . , 9}. The signature includes the sorts Bool, Int, and Str denot-
ing the Booleans, the integers (Z), and Kleene closure of A (A∗ ), respectively.
The top half of Fig. 1 includes the usual symbols of linear integer arithmetic,
interpreted as expected, a string literal l for each word/string of A∗ , a variadic
function symbol con, interpreted as word concatenation, and a function symbol
len, interpreted as the word length function. We write for the empty word and
abbreviate len(s) as |s|. We use words over the characters a, b, and c, as in abca,
as concrete examples of string literals.
We refer to the function symbols in the bottom half of the figure as extended
functions and refer to terms containing them as extended terms. A position in
1
Our implementation supports a larger set of symbols, but for brevity, we only show
the subset of the symbols used throughout this paper.
26 A. Reynolds et al.
Fig. 1. Functions in signature ΣS . Str and Int denote strings and integers respectively.
Various efficient solvers have been designed for the satisfiability problem for
quantifier-free TS -constraints, including cvc4 [3], s3# [20] and z3str3 [4]. In
this section, we give an overview of how these solvers process extended functions
in practice.
Generally speaking, constraints involving extended functions are converted to
basic ones through a series of reductions performed in an incremental fashion by
the solver. Operators whose reduction requires universal quantification are dealt
with by guessing upper bounds on the lengths of input strings or by lazily adding
constraints that block models that do not satisfy extended string constraints.
for a particular s and t if it can be inferred that |s| is strictly greater than |t|.
This section defines an inference system for such arithmetic relationships and
the simplifications that it enables.
We are interested in proving the TS -validity of formulas of the form u ě 0,
where u is a ΣS -term of integer type. We describe an inference system as a set of
rules for deriving judgments of the form u ě 0 and a specific rule application
strategy we have implemented. The inference system is sound in the sense that
|“TS u ě 0 whenever u ě 0 is derivable in it. It is, however, incomplete as it
may fail to derive u ě 0 in some cases when |“TS u ě 0. This incompleteness
is by design, since proving the TS -validity of inequalities is generally expensive
due to the NP-hardness of linear integer arithmetic. Without loss of generality,
we require that the term u be in a simplified form, where terms of the form
|l| with l a string literal of n characters are rewritten to n, terms of the form
|con(t1 , . . . , tn )| are rewritten to |t1 |`· · ·`|tn |, and like monomials in arithmetic
terms are combined in the usual way (e.g., 2 · |x| ` |x| is rewritten to 3 · |x|).
Definition 1 (Polynomial Form). An arithmetic term u is in polynomial
form if u “ m1 · u1 ` . . . mn · un ` m, where m1 , . . . , mn are non-zero integer
constants, m is an integer constant, and each u1 , . . . , un is a unique term and
one of the following:
1. an integer variable,
2. an application of length to a string variable, e.g. |x|,
3. an application of length to an extended function, e.g. |substr(t, v, w)|, or
4. an application of an extended function of integer type, e.g. indexof(t, s, v).
Given u in polynomial form, our inference system uses a set of over- and under-
approximations for showing that u ě 0 holds in all models of TS . We define two
auxiliary rewrite systems, denoted ÑO and ÑU . If u rewrites to v (in zero or
more steps) in ÑO , written u Ñ∗O v, we say that v is an over-approximation of
u. We can prove in that case that |“TS v ě u. Dually, if u rewrites to v in ÑU ,
written u Ñ∗U v, we say that v is an under-approximation of u and can prove
that |“TS u ě v. Based on these definitions, the core of our inference system can
be summarized by the single inference rule schema provided in Fig. 2 together
with the conditional rewrite systems ÑO and ÑU which are defined inductively
in terms of the inference system and each other.
A majority of the rewrite rules have side conditions requiring the derivability
of certain judgments in the same inference system. To improve their readability
we take some liberties with the notation and write u1 ě u2 , say, instead of
u1 − u2 ě 0. For example, |substr(t, v, w)| is under-approximated by w if it can
be inferred that the interval from v to v ` w is a valid range of positions in string
t, which is expressed by the side conditions v ě 0 and |t| ě v ` w. Note that
some arithmetic terms, such as |substr(t, v, w)|, can be approximated in multiple
ways—hence the need for a strategy for choosing the best approximation for
arithmetic string terms, described later. The rules for polynomials are written
modulo associativity of ` and state that a monomial m · v in them can be over-
or under-approximated based on the sign of the coefficient m. For simplicity,
High-Level Abstractions for Simplifying Extended String Constraints 29
Fig. 2. Rules for arithmetic entailment based on under- and over-approximations com-
puted for arithmetic terms containing extended string operators. We write t, s, r to
denote string terms, u, u , v, w to denote integer terms and m, n to denote integer con-
stants.
we silently assume in the figure that basic arithmetic simplifications are applied
after each rewrite step to put the right-hand side in polynomial form.
Example 3. Let u be |replace(x, aa, b)|. Because |aa| ě |b|, the first case of
the over-approximation rule for replace applies, and we get that u ÑO |x|. This
reflects that the result of replacing the first occurrence, if any, of aa in x with b
is no longer than x.
Example 4. Let u be the same as in the previous example and let v be −1 · u `
2 · |x|. Since u ÑO |x| and the coefficient of u in v is negative, we have that
v ÑU −1 · |x| ` 2 · |x|, which simplifies to |x|; moreover, |x| ÑU 0. Thus, v Ñ∗U 0
and so v ě 0. In other words, we can use the approximations to show that u
is at most 2 · |x|.
Example 5. Let u be 1 ` |t1 | ` |t2 | − |x1 |, where t1 is substr(x2 , 1, |x2 | ` |x4 |) and
t2 is replace(x1 , x2 , x3 ). Step 1 of Str-Arith-Approx considers the possible
approximations |t1 | ÑU |x2 | − 1 and |t2 | ÑU |x1 | − |x2 |. Note that under-
approximations are needed because the coefficients of |t1 | and |t2 | are positive.
The first approximation is an instance of the third rule in Fig. 2, noting that
both 1 ě 0 and 1 ` |x2 | ` |x4 | ě |x2 | are derivable by a basic strategy
that, wherever applicable, under-approximates string length terms as zero. Our
strategy chooses the first approximation since it introduces no new negative
coefficient terms, thus obtaining: u ÑU |x2 | ` |t2 | − |x1 |. We now choose the
approximation |t2 | ÑU |x1 | − |x2 |, noting that it introduces no new negative
coefficient terms and cancels an existing one, |x1 |. After arithmetic simplification,
we have derived u Ñ∗U 0, and hence u ě 0.
One can show that our strategy is sound, terminating, and deterministic. This
means that applying Str-Arith-Approx to completion produces a unique
rewrite chain of the form t ÑU u1 ÑU . . . ÑU un for a finite n, where each
step is an application of one of the rewrite rules from Fig. 2.
We use the inference system from the previous section for simplifications of string
terms with arithmetic side conditions. Figure 4 summarizes those simplifications.
The first rule rewrites a string equality to K if one of the two sides can be
inferred to be strictly longer than the other. In the second rule, if one side of
an equality, con(s, r, q), is such that the sum of lengths of s and q alone can be
shown to be greater than or equal to the length of the other side, then r must
be empty. The third rule recognizes that string containment reduces to string
equality when it can be inferred that string s is at least as long as the string t
that must contain it. The next rule captures the fact that substring simplifies to
the empty string if it can be inferred that its position v is not within bounds, or
its length w is not positive. In the figure, we write that rule with a disjunctive
side condition; this is a shorthand to denote that we can pick any disjunct and
show that it holds assuming the negation of the other disjuncts. We can use those
assumptions to perform substitutions to simplify the derivation. Concretely, to
show u1 ě u2 _ . . . _ u « u it is sufficient to infer (u1 ě u2 )[u Ñ u ]. We
demonstrate this with an example.
Example 6. Consider the term substr(t, |t| ` w, w). Our rules may simplify this
term to by inferring that its start position (|t| ` w) is not within the bounds
of t if we assume that its size (w) is positive. In detail, assume that w > 0 (the
negation of the last disjunct in the side condition of the fourth rule), which is
equivalent to w « |x| ` 1 where x is a fresh string variable and |x| denotes an
unknown non-negative quantity. It is sufficient to derive the formula obtained by
replacing all occurrences of w by |x| ` 1 in the disjunct |t| ` w ě |t| to show that
the start position of our term is out of bounds. After simplification, we obtain
|x| ` 1 ě 0, which is trivial to derive.
The next two rules in Fig. 4 apply if we can infer respectively that the start
position of the substring comes strictly after a prefix t or that the end position
of the substring comes strictly before a suffix t of the first argument string. In
either case, t can be dropped.
The final rule for substr shows that a prefix of a substring can be pulled upwards
if the start position is zero and we can infer that the substring is guaranteed to
include at least a prefix string t. Finally, if we can infer that the last position of s
in t starting from position v is at or beyond the end of t, then the indexof term can
be rewritten as an if-then-else (ite) term that checks whether s is a suffix of t.
Rules for inferring judgments of these forms are given in Fig. 5. Like our
rules for arithmetic, these rules are solely based on the syntactic structure
of terms, so inferences in this system can be computed statically. Both the
assumptions and conclusions of the rules assume associativity of string concate-
nation with identity element , that is, con(t, s) may refer to a term of the form
con(con(t1 , t2 ), s) “ con(t1 , t2 , s) or alternatively to con(, s) “ s. Most of the
rules are straightforward. The inference system has special rules for substring
terms substr(t, v, w), using arithmetic entailments from Sect. 3 to show prefix
and suffix relationships with the base string t. For negative containment, the
rules of the inference system together can show a (possibly non-constant) string
cannot occur in a constant string by reasoning that its characters cannot appear
in order in that string. We write l1 \ l2 to denote the empty string if l1 does
not contain l2 , or the result of removing the smallest prefix of l1 that contains
l2 from l1 otherwise.
Example 8. Let t be abcab and let s be con(b, x, a, y, c). String s is not contained
in t for any value of x, y. We derive t S s using two applications of the rightmost
rule for negative containment in Fig. 5, noting abcab \ b “ cab, cab \ a “ b, and
b does not contain c. In other words, the containment does not hold since the
characters b, a and c cannot be found in order in the constant abcad.
the first argument, then the suffix r (prefix t) can be dropped, where the start
position and the return value of the result are modified accordingly. If we know
s is a prefix of the first argument at position v, then the result is v if indeed
v is in the bounds of t. Notice that the latter condition is necessary to handle
the case where s is the empty string. The three rules for replace are analogous.
First, the replace rewrites to the first argument if we know it does not contain
the second argument s. If we know s is definitely contained in a prefix of the
first argument, then we can pull the remainder of that string upwards. Finally,
if we know s is a prefix of the first argument, then we can replace that pre-
fix with r while concatenating the remainder. We use the term substr(t, |s|) to
denote the remainder after the replacement for the sake of brevity, although
this term typically does not involve extended functions after simplification, e.g.
replace(con(x, y), x, z) Ñ con(z, y) noting that (substr(con(x, y), |x|))Ó “ y, or
replace(ab, a, x) Ñ con(x, b) noting that (substr(ab, |a|))Ó “ b.
how much of a constant string prefix l1 (resp., suffix) can be safely removed from
a string without impacting whether it contains another string.
6 Implementation
We implemented the above simplification rules and others in the DPLL-based
SMT solver cvc4, which implements a theory solver for a basic fragment of word
equations with length, several other theory solvers, and reduction techniques
for extended string functions as described in Sect. 2.1. Our simplification rules
are run in a preprocessing pass as well as an inprocessing pass during solving.
For the latter, we use a context-dependent simplification strategy that infers
when an extended string constraint, e.g., contains(t, s), simplifies to K based
on other assertions, e.g., s « . Our simplification techniques do not affect the
core procedure for the theory of strings, nor the compatibility of the string solver
with other theories. In total, our implementation is about 3,500 lines of C++
code. We cache the results of the simplifications and the approximation-based
arithmetic entailments to amortize their costs.
High-Level Abstractions for Simplifying Extended String Constraints 37
7 Evaluation
We evaluate the impact of each simplification technique as implemented in cvc4
on three benchmark sets that use extended string operators: CMU, a dataset
obtained from symbolic execution of Python code [15]; TermEq, a benchmark
set consisting of the verification of term equivalences over strings [14]; and Slog,
a benchmark set extracted from vulnerability testing of web applications [22].
The Slog set uses the replace function extensively but does not contain other
extended functions. We also evaluate the impact on Aplas, a set of handcrafted
benchmarks involving looping word equations [10] (string equalities whose left
and right sides have variables in common).
We compare cvc4 with z3 commit 9cb1a0f [8],4 a state-of-the-art string
solver. Additionally, we compare against Ostrich on the Slog benchmarks but
not other sets because it does not support some functions such as contains and
4
9cb1a0f is newer than the current release 4.8.4 and includes several fixes for critical
issues.
38 A. Reynolds et al.
Table 1. Number of solved problems per benchmark set. Best results are in bold. Gray
cells indicate benchmark sets not supported by a solver. “R%” indicates the reduction
of extended string functions during preprocessing. All benchmarks ran with a timeout
of 600 s.
Fig. 9. Scatter plots showing the impact of disabling simplification techniques in cvc4
on both satisfiable and unsatisfiable benchmarks. All benchmarks ran with a timeout
of 600 s.
non-trivial benchmarks, we omit the benchmarks that are solved in less than a
second by all solvers.
The arithmetic-based simplification techniques have the most significant per-
formance impact on the symbolic execution benchmarks CMU. The number of
solved benchmarks is significantly lower when disabling those techniques. The
scatter plot shows that for longer running satisfiable queries there is a large por-
tion of the benchmarks that are solved up to an order of magnitude faster with
the simplifications. These improvements in runtime on the CMU set are par-
ticularly compelling because they come from a symbolic execution application,
which involves a large number of queries with a short timeout. The improvements
are more pronounced for unsatisfiable benchmarks, where our results show that
simplifications often give the solver the ability to derive a refutation in a mat-
ter of seconds, something that is infeasible with configurations without these
techniques. The Aplas set contains no extended string operators and hence our
arithmetic-based simplification techniques have little impact on this set.
In contrast, both containment and multiset-based rewrites have a high impact
on the Aplas set, as -contain and -msets both solve 121 fewer benchmarks.
Additionally, -contain has a high impact on the TermEq set, where the sim-
plifications enable the best configuration to solve 61 out of 80 benchmarks.
Since these techniques apply most frequently to looping word equations, they
are less important for the CMU set, which does not have such equations. The
containment-based and multiset-based techniques primarily help on unsatisfiable
benchmarks, as shown in the scatter plots. On TermEq benchmarks, it tends
to be easier to find counterexamples, i.e. to solve the satisfiable ones, so there is
more to gain on unsatisfiable benchmarks.
On Slog, Ostrich solves two more instances than cvc4 but cvc4 is over 50
times faster on commonly solved instances while supporting a richer set of string
operators. On all benchmark sets, cvc4 solves at least as many benchmarks as
z3 and cvc4 has 12× fewer timeouts than z3. On the simplified benchmarks, z3
performs significantly better. On the CMU and the Aplas benchmarks, z3b out-
performs z3 by a large margin. Additionally simplifying the benchmarks with
40 A. Reynolds et al.
8 Conclusion
References
1. Abdulla, P.A., et al.: TRAU: SMT solver for string constraints. In: Bjørner, N.,
Gurfinkel, A. (eds.) 2018 Formal Methods in Computer Aided Design, FMCAD
2018, Austin, TX, USA, 30 October–2 November 2018, pp. 1–5. IEEE (2018)
2. Alur, R., et al.: Syntax-guided synthesis. In: Irlbeck, M., Peled, D.A., Pretschner,
A. (eds.) Dependable Software Systems Engineering. NATO Sciencefor Peace and
Security Series, D: Information and Communication Security, vol. 40, pp. 1–25.
IOS Press (2015)
3. Barrett, C., et al.: CVC4. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011.
LNCS, vol. 6806, pp. 171–177. Springer, Heidelberg (2011). https://doi.org/10.
1007/978-3-642-22110-1 14
4. Berzish, M., Ganesh, V., Zheng, Y.: Z3str3: a string solver with theory-aware
heuristics. In: Stewart, D., Weissenbacher, G. (eds.) 2017 Formal Methods in Com-
puter Aided Design, FMCAD 2017, Vienna, Austria, 2–6 October 2017, pp. 55–59.
IEEE (2017)
5. Bjørner, N., Tillmann, N., Voronkov, A.: Path feasibility analysis for string-
manipulating programs. In: Kowalewski, S., Philippou, A. (eds.) TACAS 2009.
LNCS, vol. 5505, pp. 307–321. Springer, Heidelberg (2009). https://doi.org/10.
1007/978-3-642-00768-2 27
High-Level Abstractions for Simplifying Extended String Constraints 41
22. Wang, H.E., Tsai, T.L., Lin, C.H., Yu, F., Jiang, J.H.R.: String analysis via
automata manipulation with logic circuit representation. In: Chaudhuri and Farzan
[6], pp. 241–260
23. Yu, F., Alkhalaf, M., Bultan, T.: Stranger: an automata-based string analysis
tool for PHP. In: Esparza, J., Majumdar, R. (eds.) TACAS 2010. LNCS, vol.
6015, pp. 154–157. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-
642-12002-2 13
24. Zheng, Y.: Z3str2: an efficient solver for strings, regular expressions, and length
constraints. Form. Methods Syst. Des. 50(2–3), 249–288 (2017)
25. Zheng, Y., Zhang, X., Ganesh, V.: Z3-str: a z3-based string solver for web appli-
cation analysis. In: Meyer, B., Baresi, L., Mezini, M. (eds.) Joint Meeting of the
European Software Engineering Conference and the ACM SIGSOFT Symposium
on the Foundations of Software Engineering, ESEC/FSE 2013, Saint Petersburg,
Russian Federation, 18–26 August 2013, pp. 114–124. ACM (2013)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Alternating Automata Modulo
First Order Theories
1 Introduction
Many results in automata theory rely on the finite alphabet hypothesis, which
guarantees, in some cases, the existence of determinization, complementation
and inclusion checking methods. However, this hypothesis prevents the use of
automata as models of real-time systems or even simple programs, whose input
and output are data values ranging over very large domains, typically viewed as
infinite mathematical abstractions.
Traditional attempts to generalize classical Rabin-Scott automata to infinite
alphabets, such as timed automata [1] and finite-memory automata [16] face
the complement closure problem: there exist automata for which the comple-
ment language cannot be recognized by an automaton in the same class. This
makes it impossible to encode a language inclusion problem L(A) ⊆ L(B) as
the emptiness of an automaton recognizing the language L(A) ∩ Lc (B), where
Lc (B) denotes the complement of L(B).
Even for finite alphabets, complementation of finite-state automata faces
inherent exponential blowup, due to nondeterminism. However, if we allow uni-
versal nondeterminism, in addition to the classical existential nondeterminism,
complementation is possible is linear time. Having both existential and univer-
sal nondeterminism defines the alternating automata model [4]. A finite-alphabet
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 43–63, 2019.
https://doi.org/10.1007/978-3-030-25543-5_3
44 R. Iosif and X. Xu
a
alternating automaton is described by a set of transition rules q − → φ, where q
is a state, a is an input symbol and φ is a boolean formula, whose propositional
variables denote successor states.
Our Contribution. We extend alternating automata to infinite data alphabets,
by defining a model of computation in which all boolean operations, including
complementation, can be done in linear time. The control states are given by k-
ary predicate symbols q(y1 , . . . , yk ), the input consists of an event a from a finite
alphabet and a tuple of data variables x1 , . . . , xn , ranging over an infinite domain,
a(x ,...,xn )
and transitions are of the form q(y1 , . . . , yk ) −−−1−−−−→ φ(x1 , . . . , xn , y1 , . . . , yk ),
where φ is a formula in the first-order theory of the data domain. In this model,
the arguments of a predicate atom q(y1 , . . . , yk ) represent the values of the inter-
nal variables associated with the state. Together with the input values x1 , . . . , xn ,
these values define the next configurations, but remain invisible in the input
sequence.
The tight coupling of internal values and control states, by means of unin-
terpreted predicate symbols, allows for linear-time complementation just as in
the case of classical propositional alternating automata. Complementation is,
moreover, possible when the transition formulae contain first-order quantifiers,
generating infinitely-branching execution trees. The price to be paid for this
expressivity is that emptiness of first-order alternating automata is undecidable,
even for the simplest data theory of equality [6].
The main contribution of this paper is an effective emptiness checking semi-
algorithm for first-order alternating automata, in the spirit of the IMPACT lazy
annotation procedure, originally developed for checking safety of nondetermin-
istic integer programs [20,21]. In a nutshell, a lazy annotation procedure unfolds
an automaton A trying to find an execution that recognizes a word from L(A).
If a path that reaches a final state does not correspond to a concrete run of
the automaton, the positions on the path are labeled with interpolants from the
proof of infeasibility, thus marking this path and all continuations as infeasible
for future searches. Termination of lazy annotation procedures is not guaran-
teed, but having a suitable coverage relation between the nodes of the search
tree may ensure convergence of many real-life examples. However, applying lazy
annotation to first-order alternating automata faces two nontrivial problems:
1.1 Preliminaries
def def
For two integers 0 ≤ i ≤ j, we define [i, j] = {i, . . . , j} and [i] = [0, i]. We consider
two disjoint sorts D and B, where D is an infinite domain and B = {, ⊥} is the
set of boolean values true () and false (⊥), respectively. The D sort is equipped
with countably many function symbols f : D#(f ) → D ∪ B, where #(f ) ≥ 0
denotes the number of arguments (arity) of f . A predicate is a function symbol
p : D#(p) → B that is, a #(p)-ary relation.
We consider the interpretation of all function symbols f : D#(f ) → D to be
fixed by the interpretation of the D sort, for instance if D is the set of integers
Z, these are zero, the successor function and the arithmetic operations of addi-
tion and multiplication. We extend this convention to several predicates over D,
such as the inequality relation over Z, and write Pred for the set of remaining
uninterpreted predicates.
Let Var = {x, y, z, . . .} be a countably infinite set of variables, ranging
over D. Terms are either constants of sort D, variables or function applications
f (t1 , . . . , t#(f ) ), where t1 , . . . , t#(f ) are terms. The set of first-order formulae is
defined by the syntax below:
The execution tree is not accepting, since its frontier is not labeled with final
configurations everywhere. Incidentally, here we have L(A) = ∅, which is proved
by our tool in ∼ 0.5 s on an average machine.
In the rest of this paper, we are concerned with the following problems:
1. boolean closure: given automata Ai = Σ, X, Qi , ιi , Fi , Δi , for i = 1, 2, do
there exist automata A∩ , A∪ and A1 such that L(A∩ ) = L(A1 ) ∩ L(A2 ),
L(A∪ ) = L(A1 ) ∪ L(A2 ) and L(A1 ) = Σ[X]∗ \ L(A1 )?
2. emptiness: given an automaton A, is L(A) = ∅?
For technical reasons, we address the following problem next: given an automaton
A and an input sequence α ∈ Σ ∗ , does there exists a word w ∈ L(A) such that
wΣ = α ? By solving this problem first, we develop the machinery required to
prove that first-order alternating automata are closed under complement and,
further, set up the ground for developping a practical semi-algorithm for the
emptiness problem.
forests. For this reason, we first introduce path formulae Θ(α), which are for-
mulae defining the executions of an automaton, over words that share a given
sequence α of input events. Second, we restrict a path formula Θ(α) to an
acceptance formula Υ (α), which defines only those executions that are accepting
among Θ(α). Consequently, the automaton accepts a word w such that wΣ = α
if and only if Υ (α) is satisfiable.
Let A = Σ, X, Q, ι, F, Δ be an automaton for the rest of this section. For
any i ∈ N, we denote by Q(i) = {q (i) | q ∈ Q} and X (i) = {x(i) | x ∈ X} the sets
of time-stamped predicate symbols and variables, respectively. We also define
def def
Q(≤n) = {q (i) | q ∈ Q, i ∈ [n]} and X (≤n) = {x(i) | x ∈ X, i ∈ [n]}. For a formula
def
ψ and i ∈ N, we define ψ (i) = ψ[X (i) /X, Q(i) /Q] the formula in which all input
variables and state predicates (and only those symbols) are replaced by their
time-stamped counterparts. Moreover, we write q(y) for q(y1 , . . . , y#(q) ), when
no confusion arises.
Given a sequence of input events α = a1 . . . an ∈ Σ ∗ , the path formula of α is:
def n
Θ(α) = ι(0) ∧ i=1 ai (X) ∀y1 . . . ∀y#(q) . q (i−1) (y) → ψ (i) (1)
q(y)−−−→ ψ∈Δ
The automaton A, to which Θ(α) refers, will always be clear from the context.
To formalize the relation between the low-level configuration-based execution
semantics and path formulae, consider a word w = (a1 , ν1 ) . . . (an , νn ) ∈ Σ[X]∗ .
Any execution T of A over w has an associated interpretation IT of time-
stamped predicates Q(≤n) :
def
IT (q (i) ) = {(d1 , . . . , d#(q) ) | (q, d1 , . . . , d#(q) ) labels a node on level i in T }, ∀q ∈ Q ∀i ∈ [n]
The top-level universal quantifiers from a subformula ∀y1 . . . ∀y#(q) . q (i) (y) → ψ
of Υ (α) will be referred to as path quantifiers, in the following. Notice that path
quantifiers are distinct from the transition quantifiers that occur within a formula
a(X)
ψ of a transition rule q(y1 , . . . , y#(q) ) −−−→ ψ of A. The relation between the
words accepted by A and the acceptance formula above, is formally captured by
the following lemma:
The result of eliminating the path quantifiers, in prenex normal form, is shown
below:
Notice that the transition quantifiers ∃z1 and ∀z2 from Υ (α) range now over
Υ(α).
Example 2 (Contd. from Example 1). The result of the elimination of predicate
atoms from the acceptance formula in Example 1 is shown below:
At this point, we prove the formal relation between the satisfiability of the
formulae Υ(α) and Υ (α). Since there are no occurrences of predicates in Υ (α),
for each valuation ν : X (≤n) → D, there exists an interpretation I such that
I, ν |= Υ (α) if and only if J, ν |= Υ (α), for every interpretation J. In this case
we omit I and simply write ν |= Υ (α).
Finally, we define the acceptance of a word with a given input event sequence
by means of a quantifier-free formula in which no predicate atom occurs.
sequence labeling the path from the root node to n. If Υ (α(n)) is satisfiable, the
sequence α(n) is feasible, in which case a model of Υ (α(n)) is obtained and a
word w ∈ L(A) is returned. Otherwise, α(n) is an infeasible input sequence and
the procedure enters the refinement phase (lines 9–19). The GLI for α(n) is used
to strenghten the labels of all the ancestors of n, by conjoining the formulae of
the interpolant, changed according to Lemma 7, to the existing labels.
In this process, the nodes on the path between r and n, including n, might
become eligible for coverage, therefore we attempt to close each ancestor of n
that is impacted by the refinement (line 19). Observe that, in this case the call
to Close must uncover each node which is covered by a successor of n (line 30
of the Close function). This is required because, due to the over-approximation
of the sets of reachable configurations, the covering relation is not transitive, as
explained in [20]. If Close adds a covering edge (ni , m) to , it does not have to
be called for the successors of ni on this path, which is handled via the boolean
flag b. Finally, if n is still uncovered (it has not been previously covered during
the refinement phase) we expand n (lines 21–25) by creating a new node for each
successor s via the input event a ∈ Σ and inserting it into the worklist.
4 Interpolant Generation
3
E.g., the arithmetic operators of addition and multiplication, when D is the set of
integers.
56 R. Iosif and X. Xu
The following proposition states the existence of local GLI for the theories in
which Lyndon’s Interpolation Theorem holds.
Proposition 1. If there exists a Lyndon interpolant for any two formulae φ and
ψ, in the first-order theory of data with uninterpreted predicate symbols, such that
φ ∧ ψ is unsatisfiable, then any sequence of input events α = a1 . . . an ∈ Σ ∗ , such
that Υ (α) is unsatisfiable, has a local GLI (I0 , . . . , In ).
because (I) the formula Υ (α), obtained by repeated substitutions loses track of
the steps of the execution, and (II) quantifiers that occur nested in Υ (α) make it
difficult to write Υ (α) as an unsatisfiable quantifier-free conjunction of formulae
from which interpolants are extracted (Definition 4).
The solution we adopt for the first issue (I) consists in partially recovering
the time-stamped structure of the acceptance formula Υ (α) using the formula
Υ(α), in which only transition quantifiers occur. The second issue (II) is solved
under the additional assuption that the theory of the data domain D has witness-
producing quantifier elimination. More precisely, we assume that, for each for-
mula ∃x . φ(x), there exists an effectively computable term τ , in which x does
not occur, such that ∃x . φ and φ[τ /x] are equisatisfiable. These terms, called
witness terms in the following, are actual definitions of the Skolem function
symbols from the following folklore theorem:
Example 3 (Contd. from Examples 1 and 2). The formula Υ (α) (Example 2) is
def
unsatisfiable and let τ2 = z1 be the witness term for the universally quantified
58 R. Iosif and X. Xu
We formalize and prove the correctness for the above construction of non-
local GLI. A function ξ : N → N is monotonic iff for each n < m we have
ξ(n) ≤ ξ(m) and finite-range iff for each n ∈ N the set {m | ξ(m) = n} is finite.
−1
If ξ is finite-range, we denote by ξmax (n) ∈ N the maximal value m such that
ξ(m) = n.
Lemma 6. Given a non-empty input event sequence α = a1 . . . an ∈ Σ ∗ , such
be a prenex form of Υ(α) and
that Υ (α) is unsatisfiable, let Q1 x1 . . . Qm xm . Φ
let ξ : [1, m] → [n] be a monotonic finite-range function mapping each transition
quantifier to the minimal index from the sequence Θ(α 0 ), . . . , Θ(α
n ) where it
occurs. Then one can effectively build:
1. witness terms τi1 , . . . , τi , where {i1 , . . . , i } = {j ∈ [1, m] | Qj = ∀}
and V(τij ) ⊆ X (≤ξ(ij )) ∪ {xk | k < ij , Qk = ∃}, ∀j ∈ [1, ] such that
i /xi , . . . , τi /xi ] is unsatisfiable, and
Φ[τ 1 1
2. a GLI (I0 , . . . , In ) for α, such that V(Ik ) ⊆ Q(k) ∪ X (≤k) ∪ {xj | j <
−1
ξmax (k), Qj = ∃}, for all k ∈ [n].
Consequently, under two assumptions about the first-order theory of the
data domain, namely (i) witness-producing quantifier elimination, and (ii) Lyn-
don interpolation for the quantifier-free fragment with uninterpreted functions,
we developed a generic method that produces GLIs for unfeasible input event
sequences. Moreover, each formula in the interpolant refers only to the current
predicate symbols, the current and past input variables and the existentially
quantified transition variables introduced at the previous steps. The remaining
questions are how to use these GLIs to label the sequences in the unfolding of
an automaton (Definition 2) and compute coverage (Definition 3) between nodes
of the unfolding.
– U (αk ) = U (αk ) ∧ Jk , for all k ∈ [n], where Jk is the formula obtained from
Ik by removing the time stamp of each predicate symbol q (k) and existentially
quantifying each free variable, and
– U (β) = U (β) if β ∈ dom(U ) and β α,
Moreover, α is safe in U .
Observe that, by Lemma 6(2), the set of free variables of a GLI formula Ik
consists of (i) variables X (≤k) keeping track of data values seen in the input
at some earlier moment in time, and (ii) variables that track past choices made
within the transition rules. Basically, it is not important when exactly in the past
a certain input has been read or when a choice has been made, because only
the relation between the values of these and the current variables determines
the future behavior of the automaton. Quantifying these variables existentially
does the job of ignoring when exactly in the past these values have been seen.
Moreover, the last point of Lemma 7 ensures that the refined path is safe in the
new unfolding and will stay safe in all future refinements of this unfolding.
The last ingredient of the lazy annotation semi-algorithm based on unfold-
ings consist in the implementation of the coverage check, when the unfolding of
an automaton is labeled with conjunctions of existentially quantified formulae
with predicate symbols, obtained from interpolation. By Definition 3, checking
whether a given node α ∈ dom(U ) is covered amounts to finding a prefix α α
and a node β ∈ dom(U ) such that U (α ) |= U (β), or equivalently, the formula
U (α ) ∧ ¬U (β) is unsatisfiable. However, the latter formula, in prenex form, has
quantifier prefix in the language ∃∗ ∀∗ and, as previously mentioned, the satis-
fiability problem for such formulae becomes undecidable when the data theory
subsumes Presburger arithmetic [10].
Nevertheless, if we require just a yes/no answer (i.e. not an interpolant)
recently developed quantifier instantiation heuristics [25] perform rather well
in answering a large number of queries in this class. Observe, moreover, that
coverage does not need to rely on a complete decision procedure. If the prover
fails in answering the above satisfiability query, then the semi-algorithm assumes
that the node is not covered and continues exploring its successors. Failure to
compute complete coverage may lead to divergence (non-termination) and ulti-
mately, to failure to prove emptiness, but does not affect the soundness of the
semi-algorithm (real counterexamples will still be found).
5 Experimental Results
Example |A| (bytes) Predicates Variables Transitions L(A) = ∅? Nodes expanded Nodes visited Time (msec)
incdec.pa 499 3 1 12 No 21 17 779
localdec.pa 678 4 1 16 No 49 35 1814
ticket.pa 4250 13 1 73 No 229 91 9543
count thread0.pa 9767 14 1 126 No 154 128 8553
R. Iosif and X. Xu
queries and also for interpolant generation. Table 1 reports the size of the input
automaton in bytes, the numbers of Predicates, Variables and Transitions, the
result of emptiness check, the number of Expanded and Visited Nodes during
the unfolding and the Time in miliseconds. The experiments were carried out on
a MacOS x64 - 1.3 GHz Intel Core i5 - 8 GB 1867 MHz LPDDR3 machine.
The test cases shown in Table 1, come from several sources, namely pred-
icate automata models (*.pa) [6,7] available online [23], timed automata
inclusion problems (abp.ada, train.ada, rr-crossing.foada), array logic
entailments (array rotation.ada, array simple.ada, array shift.ada) and
hardware circuit verification (hw1.ada, hw2.ada), initially considered in [13],
with the restriction that local variables are made visible in the input. The
train-simpleN. foada and fischer-mutexN. foada examples are parametric
N
verification problems in which one checks inclusions of the form i=1 L(Ai ) ⊆
L(B), where Ai is the i-th copy of the template automaton.
The advantage of using FOADA over the INCLUDER [12] tool from [13] is the
possibility of having automata over infinite alphabets with local variables, whose
values are not visible in the input. In particular, this is essential for checking
inclusion of timed automata that use internal clocks to control the computation.
6 Conclusions
We present first-order alternating automata, a model of computation that gener-
alizes classical boolean alternating automata to first-order theories. Due to their
expressivity, first-order alternating automata are closed under union, intersec-
tion and complement. However the emptiness problem is undecidable even in
the most simple case, of the quantifier-free theory of equality with uninterpreted
predicate symbols. We deal with the emptiness problem by developping a prac-
tical semi-algorithm that always terminates, when the automaton is not empty.
In case of emptiness, termination of the semi-algorithm occurs in most practical
test cases, as shown by a number of experiments.
References
1. Alur, R., Dill, D.L.: A theory of timed automata. Theor. Comput. Sci. 126(2),
183–235 (1994)
2. Barringer, H., Rydeheard, D., Havelund, K.: Rule systems for run-time monitoring:
from Eagle to RuleR. In: Sokolsky, O., Taşıran, S. (eds.) RV 2007. LNCS, vol.
4839, pp. 111–125. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-
540-77395-5 10
3. Börger, E., Grädel, E., Gurevich, Y.: The Classical Decision Problem: Perspectives
in Mathematical Logic. Springer, Heidelberg (1997)
4. Chandra, A.K., Kozen, D.C., Stockmeyer, L.J.: Alternation. J. ACM 28(1), 114–
133 (1981)
5. D’Antoni, L., Kincaid, Z., Wang, F.: A symbolic decision procedure for sym-
bolic alternating finite automata. Electron. Notes Theor. Comput. Sci. 336, 79–99
(2018). The Thirty-third Conference on the Mathematical Foundations of Pro-
gramming Semantics (MFPS XXXIII)
62 R. Iosif and X. Xu
6. Farzan, A., Kincaid, Z., Podelski, A.: Proof spaces for unbounded parallelism.
SIGPLAN Not. 50(1), 407–420 (2015)
7. Farzan, A., Kincaid, Z., Podelski, A.: Proving liveness of parameterized programs.
In: Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer
Science, LICS 2016, pp. 185–196. ACM (2016)
8. First Order Alternating Data Automata (FOADA). https://github.com/cathiec/
FOADA
9. Grebenshchikov, S., Lopes, N.P., Popeea, C., Rybalchenko, A.: Synthesizing soft-
ware verifiers from proof rules. SIGPLAN Not. 47(6), 405–416 (2012)
10. Halpern, J.Y.: Presburger arithmetic with unary predicates is π11 complete. J.
Symb. Log. 56(2), 637–642 (1991)
11. Hojjat, H., Rümmer, P.: Deciding and interpolating algebraic data types by reduc-
tion. Technical report. CoRR abs/1801.02367 (2018). http://arxiv.org/abs/1801.
02367
12. Includer. http://www.fit.vutbr.cz/research/groups/verifit/tools/includer/
13. Iosif, R., Rogalewicz, A., Vojnar, T.: Abstraction refinement and antichains for
trace inclusion of infinite state systems. In: Chechik, M., Raskin, J.-F. (eds.)
TACAS 2016. LNCS, vol. 9636, pp. 71–89. Springer, Heidelberg (2016). https://
doi.org/10.1007/978-3-662-49674-9 5
14. Iosif, R., Xu, X.: Abstraction refinement for emptiness checking of alternating data
automata. In: Beyer, D., Huisman, M. (eds.) TACAS 2018. LNCS, vol. 10806, pp.
93–111. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-89963-3 6
15. JavaSMT. https://github.com/sosy-lab/java-smt
16. Kaminski, M., Francez, N.: Finite-memory automata. Theor. Comput. Sci. 134(2),
329–363 (1994)
17. Kincaid, Z.: Parallel proofs for parallel programs. Ph.D. thesis, University of
Toronto (2016)
18. Kuncak, V., Mayer, M., Piskac, R., Suter, P.: Software synthesis procedures. Com-
mun. ACM 55(2), 103–111 (2012)
19. Lyndon, R.C.: An interpolation theorem in the predicate calculus. Pacific J. Math.
9(1), 129–142 (1959)
20. McMillan, K.L.: Lazy abstraction with interpolants. In: Ball, T., Jones, R.B. (eds.)
CAV 2006. LNCS, vol. 4144, pp. 123–136. Springer, Heidelberg (2006). https://doi.
org/10.1007/11817963 14
21. McMillan, K.L.: Lazy annotation revisited. In: Biere, A., Bloem, R. (eds.) CAV
2014. LNCS, vol. 8559, pp. 243–259. Springer, Cham (2014). https://doi.org/10.
1007/978-3-319-08867-9 16
22. Nelson, G., Oppen, D.C.: Fast decision procedures based on congruence closure. J.
ACM 27(2), 356–364 (1980)
23. Predicate Automata. https://github.com/zkincaid/duet/tree/ark2/regression/
predicateAutomata
24. Presburger, M.: Über die Vollstandigkeit eines gewissen Systems der Arithmetik.
Comptes rendus du I Congrés des Pays Slaves, Warsaw (1929)
25. Reynolds, A., King, T., Kuncak, V.: Solving quantified linear arithmetic by
counterexample-guided instantiation. Form. Methods Syst. Des. 51(3), 500–532
(2017)
26. Rybalchenko, A., Sofronie-Stokkermans, V.: Constraint solving for interpolation.
J. Symb. Comput. 45(11), 1212–1233 (2010)
27. Z3 SMT Solver. https://rise4fun.com/z3
Alternating Automata Modulo First Order Theories 63
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Q3B: An Efficient BDD-based SMT
Solver for Quantified Bit-Vectors
Abstract. We present the first stable release of our tool Q3B for decid-
ing satisfiability of quantified bit-vector formulas. Unlike other state-of-
the-art solvers for this problem, Q3B is based on translation of a formula
to a bdd that represents models of the formula. The tool also employs
advanced formula simplifications and approximations by effective bit-
width reduction and by abstraction of bit-vector operations. The paper
focuses on the architecture and implementation aspects of the tool, and
provides a brief experimental comparison with its competitors.
1 Introduction
Advances in solving formula satisfiability modulo theories (smt) achieved during
the last few decades enabled significant progress and practical applications in the
area of automated analysis, testing, and verification of various systems. In the
case of software and hardware systems, the most relevant theory is the theory
of fixed-sized bit-vectors, as these systems work with inputs expressed as bit-
vectors (i.e., sequences of bits) and perform bitwise and arithmetic operations
on bit-vectors. The quantifier-free fragment of this theory is supported by many
general-purpose smt solvers, such as CVC4 [1], MathSAT [7], Yices [10], or Z3 [9]
and also by several dedicated solvers, such as Boolector [21] or STP [12]. How-
ever, there are some use-cases where quantifier-free formulas are not natural or
expressive enough. For example, formulas containing quantifiers arise naturally
when expressing loop invariants, ranking functions, loop summaries, or when
checking equivalence of two symbolically described sets of states [8,13,17,18,24].
In the following, we focus on smt solvers for quantified bit-vector formulas. In
particular, this paper describes the state-of-the-art smt solver Q3B including its
implementation and the inner workings.
Solving of quantified bit-vector formulas was first supported by Z3 in 2013 [25]
and for a limited set of exists/forall formulas with only a single quantifier alter-
nation by Yices in 2015 [11]. Both of these solvers decide quantified formulas by
quantifier instantiation, in which universally quantified variables in the Skolem-
ized formula are repeatedly instantiated by ground terms until the resulting
quantifier-free formula is unsatisfiable or a model of the original formula is found.
This work has been supported by Czech Science Foundation, grant GA18-02177S.
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 64–73, 2019.
https://doi.org/10.1007/978-3-030-25543-5_4
Q3B: An Efficient BDD-based SMT Solver for Quantified Bit-Vectors 65
the formula, and check whether it is a model of the original formula. If the answer is
positive, the original formula is satisfiable. In the other case, we build a more pre-
cise overapproximating bdd. Underapproximating bdds are utilized analogously.
The only difference is that for unsatisfiable underapproximating bdd, we check the
validity of a countermodel, i.e., an assignment to the top-level universal variables
that makes the formula unsatisfiable. The approach is depicted in Fig. 1.
Simplify
Set Set
Increase Increase
precision precision
precision precision
to low to low
SAT UNSAT
Fig. 1. High-level overview of the smt solving approach used by Q3B. The three shaded
areas are executed in parallel and the first result is returned.
Q3B currently supports two ways of computing the approximating bdds from
the input formula. First of these are variable bit-width approximations in which the
effective bit-width of some variables is reduced. In other words, some of the vari-
ables are represented by fewer bits and the rest of the bits is set to zero bits, one
bits, or the sign bit of the reduced variable. This approach was originally used by the
smt solvers uclid [6] and Boolector [21]. Q3B extends this approach to quantified
formulas: if bit-widths of only existentially quantified variables are reduced, the
Q3B: An Efficient BDD-based SMT Solver for Quantified Bit-Vectors 67
Q3B
Unconstrained
Formula Simplifier
Variable Simplifier
antlr parser
smt-lib
Interpreter Solver
Formula to bdd
Cache
Transformer
output Bit-Vector
cudd
Operations Library
Fig. 2. Architecture of Q3B. Components in the shaded box are parts of Q3B, the
other components are external.
2 Architecture
This section describes the internal architecture of Q3B. The overall structure
including internal and external components and the interactions between them
is depicted in Fig. 2. We explain the purpose of the internal components:
particular, it maintains the assertion stack and the options set by the user,
calls solver when check-sat command is issued, and queries Solver if the
user requires the model with the command get-model.
Formula Simplifier (implemented in FormulaSimplifier.cpp) provides inter-
face for all applied formula simplifications, in particular miniscoping, conver-
sion to negation normal form, pure literal elimination, equality propagation,
constructive equality resolution (cer) [14], destructive equality resolution
(der) [25], simple theory-related rewriting, and simplifications using uncon-
strained variables. Most of these simplifications are implemented directly in
this component; only cer, der, and majority of the theory-related rewritings
are performed by calling Z3 api and simplifications using unconstrained vari-
ables are implemented in a separate component of Q3B. The simplifier also
converts top-level existential variables to uninterpreted constants, so their
values are also included in a model. Some simplifications that could change
models of the formula are disabled if the user enables model generation, i.e.,
sets :produce-models to true.
Unconstrained Variable Simplifier (implemented in UnconstrainedVari-
ableSimplifier.cpp) provides simplifications of formulas that contain
unconstrained variables, i.e., variables that occur only once in the formula.
Besides previously published unconstrained variable simplifications [15],
which were present in the previous versions of Q3B, this component now
also provides new goal-directed simplifications of formulas with unconstrained
variables. In these simplifications, we aim to determine whether a subterm
containing an unconstrained variable should be minimized, maximized, sign
minimized, or sign maximized in order to satisfy the formula. If the subterm
should be minimized and contains an unconstrained variable, the term is
replaced by a simpler term that gives the minimal result that can be achieved
by any value of the unconstrained variable. Similarly for maximization, sign
minimization, and sign maximization.
Solver (implemented in Solver.cpp) is the central component of our tool. It calls
formula simplifier and then creates three threads for the precise solver, the
underapproximating solver, and the overapproximating solver. It also controls
the approximation refinement loops of the approximating solvers. Finally, it
returns the result of the fastest thread and stores the respective model, if the
result was sat.
Formula to BDD Transformer (implemented in the file ExprToBDDTrans-
former.cpp) performs the actual conversion of a formula to a bdd. Each
subterm of the input formula is converted to a vector of bdds (if the sub-
term’s sort is a bit-vector of width n then the constructed vector contains
n bdds, each bdd represents one bit of the subterm). Further, each subfor-
mula of the input formula is converted to a bdd. These conversions proceed by
a straightforward bottom-up recursion on the formula syntax tree. The trans-
former component calls an external library to compute the effect of logical
and bit-vector operations on bdds and vectors of bdds, respectively. Besides
the precise conversion, the transformer can also construct overapproximat-
Q3B: An Efficient BDD-based SMT Solver for Quantified Bit-Vectors 69
3 Implementation
Q3B is implemented in C++17, is open-source and available under MIT license
on GitHub: https://github.com/martinjonas/Q3B. The project development
process includes continuous integration and automatic regression tests.
Q3B relies on several external libraries and tools. For representation and
manipulation with bdds, Q3B uses the open-source library cudd 3.0 [23].
Since cudd does not support bit-vector operations, we use the library by Peter
Navrátil [19] that implements bit-vector operations on top of cudd. The algo-
rithms in this library are inspired by the ones in the bdd library BuDDy1 and
they provide a decent performance. Nevertheless, we have further improved its
performance by several modifications. In particular, we added a specific code for
handling expensive operations like bit-vector multiplication and division when
arguments contain constant bdds. This for example considerably speeds up mul-
tiplication whenever one argument contains many constant zero bits, which is a
frequent case when we use the variable bit-width approximation fixing some bits
to zero. Further, we have fixed few incorrectly implemented bit-vector operations
in the original library. Finally, we have extended the library with the support
for do-not-know bits in inputs of the bit-vector operations and we have imple-
mented abstract versions of arithmetic operations that can produce do-not-know
bits when the result exceeds a given number of bdd nodes.
For parsing the input formulas in smt-lib format, Q3B uses antlr parser
generated from the grammar2 for smt-lib 2.6 [2]. We have modified the gram-
mar to correctly handle bit-vector numerals and to support push and pop com-
mands without numerical argument. The parser allows Q3B to support all bit-
vector operations and almost all smt-lib commands except get-assertions,
get-assignment, get-proof, get-unsat-assumptions, get-unsat-core, and
all the commands that work with algebraic data-types. This is in sharp contrast
with the previous experimental versions of Q3B, which only collected all the
assertions from the input file and performed the satisfiability check regardless
of the rest of the commands and of the presence of the check-sat command.
The reason for this was that the older versions parsed the input file using the
Z3 C++ api, which can provide only the list of assertions, not the rest of the
smt-lib script. Thanks to the new parser, Q3B 1.0 can also provide the user
1
https://sourceforge.net/projects/buddy/.
2
https://github.com/julianthome/smtlibv2-grammar.
70 M. Jonáš and J. Strejček
4 Experimental Evaluation
We have evaluated the performance of QB3 1.0 and compared it to the lat-
est versions of smt solvers Boolector (v3.0), CVC4 (v1.6), and Z3 (v4.8.4). All
tools were used with their default settings except for CVC4, where we used the
same settings as in the paper that introduces quantified bit-vector solving in
CVC4 [20], since they give better results than the default CVC4 settings. As
the benchmark set, we have used all 5751 quantified bit-vector formulas from
the smt-lib repository. The benchmarks are divided into 8 distinct families of
formulas. We have executed each solver on each benchmark with cpu time limit
20 min and ram limit of 8 GiB. All the experiments were performed in a Ubuntu
16.04 virtual machine within a computer equipped with Intel(R) Core(TM)
i7-8700 CPU @ 3.20 GHz cpu and 32 GiB of ram. For reliable benchmarking
we employed BenchExec [4], a tool that allocates specified resources for a pro-
gram execution and precisely measures their usage. All scripts used for running
benchmarks and processing their results, together with detailed descriptions and
some additional results not presented in the paper, are available online3 .
Table 1 shows the numbers of benchmarks in each benchmark family solved
by the individual solvers. Q3B is able to solve the most benchmarks in
benchmark families 2017-Preiner-scholl-smt08, 2017-Preiner-tptp, 2017-Preiner-
UltimateAutomizer, 2018-Preiner-cav18, and wintersteiger, and it is competitive
in the remaining families. In total, Q3B also solves more formulas than each of
the other solvers: 116 more than Boolector, 83 more than CVC4, and 139 more
than Z3. Although the numbers of solved formulas for the solvers seem fairly
similar, the cross-comparison in Table 2 shows that the differences among the
individual solvers are actually larger. For each other solver, there are at least
3
https://github.com/martinjonas/q3b-artifact.
Q3B: An Efficient BDD-based SMT Solver for Quantified Bit-Vectors 71
Table 1. For each solver and benchmark family, the table shows the number of bench-
marks from the given family solved by the given solver. The column Total shows the
total number of benchmarks in the given family. The last line provides the total cpu
times for the benchmarks solved by all four solvers.
Table 2. For all pairs of the solvers, the table shows the number of benchmarks
that were solved by the solver in the corresponding row, but not by the solver in the
corresponding column. The column Uniquely solved shows the number of benchmarks
that were solved only by the given solver.
143 benchmarks that can be solved by Q3B but not by the other solver. We
think this shows the importance of developing an smt solver based on bdds and
approximations besides the solvers based on quantifier instantiation.
be used even if the user wants to get a model of the formula. We would also like
to implement production of unsatisfiable cores since they are also valuable for
software verification.
References
1. Barrett, C., et al.: CVC4. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011.
LNCS, vol. 6806, pp. 171–177. Springer, Heidelberg (2011). https://doi.org/10.
1007/978-3-642-22110-1 14
2. Barrett, C., Stump, A., Tinelli, C.: The SMT-LIB Standard: Version 2.6. Technical
report, Department of Computer Science, The University of Iowa (2017). www.
SMT-LIB.org
3. CBarrett, C., Stump, A., Tinelli, C.: The SMT-LIB standard: version 2.0. In:
Gupta, A., Kroening, D. (eds.) Proceedings of the 8th International Workshop on
Satisfiability Modulo Theories, Edinburgh, UK (2010)
4. Beyer, D., Löwe, S., Wendler, P.: Benchmarking and resource measurement. In:
Fischer, B., Geldenhuys, J. (eds.) SPIN 2015. LNCS, vol. 9232, pp. 160–178.
Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23404-5 12
5. Bryant, R.E.: On the complexity of VLSI implementations and graph representa-
tions of boolean functions with application to integer multiplication. IEEE Trans.
Comput. 40(2), 205–213 (1991)
6. Bryant, R.E., Kroening, D., Ouaknine, J., Seshia, S.A., Strichman, O., Brady, B.A.:
An abstraction-based decision procedure for bit-vector arithmetic. STTT 11(2),
95–104 (2009)
7. Cimatti, A., Griggio, A., Schaafsma, B.J., Sebastiani, R.: The MathSAT5 SMT
solver. In: Piterman, N., Smolka, S.A. (eds.) TACAS 2013. LNCS, vol. 7795, pp.
93–107. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36742-7 7
8. Cook, B., Kroening, D., Rümmer, P., Wintersteiger, C.M.: Ranking function syn-
thesis for bit-vector relations. Form. Methods Syst. Des. 43(1), 93–120 (2013)
9. de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R.,
Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg
(2008). https://doi.org/10.1007/978-3-540-78800-3 24
10. Dutertre, B.: Yices 2.2. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp.
737–744. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08867-9 49
11. Dutertre, B.: Solving exists/forall problems with Yices. In: Workshop on satisfia-
bility Modulo Theories (2015)
12. Ganesh, V., Dill, D.L.: A decision procedure for bit-vectors and arrays. In: Damm,
W., Hermanns, H. (eds.) CAV 2007. LNCS, vol. 4590, pp. 519–531. Springer,
Heidelberg (2007). https://doi.org/10.1007/978-3-540-73368-3 52
13. Gulwani, S., Srivastava, S., Venkatesan, R.: Constraint-based invariant inference
over predicate abstraction. In: Jones, N.D., Müller-Olm, M. (eds.) VMCAI 2009.
LNCS, vol. 5403, pp. 120–135. Springer, Heidelberg (2008). https://doi.org/10.
1007/978-3-540-93900-9 13
14. Jonáš, M., Strejček, J.: Solving quantified bit-vector formulas using binary decision
diagrams. In: Creignou, N., Le Berre, D. (eds.) SAT 2016. LNCS, vol. 9710, pp.
267–283. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40970-2 17
15. Jonáš, M., Strejček, J.: On simplification of formulas with unconstrained variables
and quantifiers. In: Gaspers, S., Walsh, T. (eds.) SAT 2017. LNCS, vol. 10491, pp.
364–379. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66263-3 23
Q3B: An Efficient BDD-based SMT Solver for Quantified Bit-Vectors 73
16. Jonáš, M., Strejček, J.: Abstraction of bit-vector operations for BDD-based SMT
solvers. In: Fischer, B., Uustalu, T. (eds.) ICTAC 2018. LNCS, vol. 11187, pp.
273–291. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02508-3 15
17. Kroening, D., Lewis, M., Weissenbacher, G.: Under-approximating loops in C pro-
grams for fast counterexample detection. In: Sharygina, N., Veith, H. (eds.) CAV
2013. LNCS, vol. 8044, pp. 381–396. Springer, Heidelberg (2013). https://doi.org/
10.1007/978-3-642-39799-8 26
18. Mrázek, J., Bauch, P., Lauko, H., Barnat, J.: SymDIVINE: tool for control-explicit
data-symbolic state space exploration. In: Bošnački, D., Wijs, A. (eds.) SPIN 2016.
LNCS, vol. 9641, pp. 208–213. Springer, Cham (2016). https://doi.org/10.1007/
978-3-319-32582-8 14
19. Navrátil, P.: Adding support for bit-vectors to BDD libraries CUDD and Sylvan.
Bachelor’s thesis, Masaryk University, Faculty of Informatics, Brno (2018)
20. Niemetz, A., Preiner, M., Reynolds, A., Barrett, C., Tinelli, C.: Solving quantified
bit-vectors using invertibility conditions. In: Chockler, H., Weissenbacher, G. (eds.)
CAV 2018. LNCS, vol. 10982, pp. 236–255. Springer, Cham (2018). https://doi.
org/10.1007/978-3-319-96142-2 16
21. Niemetz, A., Preiner, M., Wolf, C., Biere, A.: Btor2, BtorMC and Boolector 3.0.
In: Chockler, H., Weissenbacher, G. (eds.) CAV 2018. LNCS, vol. 10981, pp. 587–
595. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96145-3 32
22. Preiner, M., Niemetz, A., Biere, A.: Counterexample-guided model synthesis. In:
Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10205, pp. 264–280.
Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54577-5 15
23. Somenzi, F.: CUDD: CU Decision Diagram Package Release 3.0.0. University of
Colorado at Boulder (2015)
24. Srivastava, S., Gulwani, S., Foster, J.S.: From program verification to program
synthesis. In: Proceedings of the 37th ACM SIGPLAN-SIGACT Symposium on
Principles of Programming Languages, POPL 2010, Madrid, Spain, 17–23 January
2010, pp. 313–326 (2010)
25. Wintersteiger, C.M., Hamadi, Y., de Moura, L.M.: Efficiently solving quantified
bit-vector formulas. Form. Methods Syst. Des. 42(1), 3–23 (2013)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
CVC 4 SY: Smart and Fast Term Enumeration
for Syntax-Guided Synthesis
1 Introduction
Syntax-guided synthesis (SyGuS) [3] is a recent paradigm for program synthesis, suc-
cessfully used for applications in formal verification and programming languages. Most
SyGuS solvers perform counterexample-guided inductive synthesis (CEGIS) [16]: a
refinement loop in which a learner proposes solutions, and a verifier, generally a satisfi-
ability modulo theories (SMT) solver [8, 9], checks them and provides counterexamples
for failures. Generally, the learner enumerates some set of terms, while pruning spuri-
ous ones [17]. The simplicity and efficacy of enumerative SyGuS have made it the de
facto approach for SyGuS, although alternatives exist for restricted fragments [4, 14].
In previous work [14], we have shown how the SMT solver CVC4 [5] can itself act as
an efficient synthesizer. This tool paper focuses on recent advances in the enumerative
subsolver of CVC4, culminating in the current SyGuS solver CVC4SY. Figure 1 shows
its main components. The term enumerator is parameterized by an enumeration strategy
chosen before solving: CVC4SY S, whose constraint-based (smart) enumeration allows
for numerous optimizations (Sect. 2); CVC4SY F, based on a new approach for (fast)
enumerative synthesis (Sect. 3), which has significant advantages with respect to the
enumerative solver CVC4SY S and other state-of-the-art approaches; and CVC4SY H,
based on a hybrid approach combining smart and fast enumeration (Sect. 4). All strate-
gies are fully integrated in CVC4, meaning they support inputs in many background
theories, including arithmetic, bit-vectors, strings, and floating point. We evaluate these
approaches on a large set of benchmarks (Sect. 5).
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 74–83, 2019.
https://doi.org/10.1007/978-3-030-25543-5_5
CVC 4 SY : Smart and Fast Term Enumeration for Syntax-Guided Synthesis 75
As an example, suppose that the term enumerator previously generated x+y and that
d’s current value is the datatype term representing y + x, where, however, (x + y)Ó =
(y + x)Ó. We first generate a blocking constraint template R[z] of the form isplus (z) _
isy (selI1 (z))_isx (selI2 (z)), where z is a fresh variable. This template is subsequently
instantiated with z ÞÑ u for any shared selector chain u of type I that currently (or
later) appears in F , starting with d itself. This has the effect of ruling out all candidate
solutions that have y + x as a subterm, which is justified by the fact that each such term
is equivalent to one in which all occurrences of y + x are replaced by x + y.
We employ a refinement of this technique, which we call theory rewriting with
structural generalization, which searches for and then blocks only the minimal skeleton
of the term under test that is sufficient for determining its rewritten form. For example,
consider the if-then-else term t = ite(x « 0 ^ y ě 0, 0, x), This term is equivalent to
x, regardless of the value of predicate y ě 0. This can be confirmed by the rewriter by
computing that ite(x « 0 ^ w, 0, x)Ó = x where w is a fresh Boolean variable. Then,
instead of generating a constraint that blocks only (the datatype value corresponding
to) t, we generate a stronger constraint that does not depend on the subterm y ě 0. In
other words, this blocking constraint rules out all candidate solutions that contain the
subterm ite(x « 0^w, 0, x), for any term w. We compute these generalizations using a
recursive algorithm that iteratively replaces each subterm of the current candidate with
a fresh variable, and checks whether its rewritten form remains the same.
Blocking via CEGIS with Structural Generalization. Synthesis solvers based on CEGIS
maintain a list of refinement points that witness the infeasibility of previous candidate
solutions. That is, given a synthesis conjecture Df. ∀x̄. ϕ[f, x̄], the solver maintains a
growing list p̄1 , . . . , p̄n of values for x̄ that witness the infeasibility of previous can-
didates u1 , . . . , un for f . Then, when a new candidate u is generated, we first check
whether ϕ[u, p̄i ] is false for some i ď n. When a candidate u fails to satisfy ϕ[u, p̄i ],
CVC4 SY S further applies a form of generalization analogous to the structural general-
ization described above. We call this CEGIS with structural generalization, where the
goal is to find the minimal skeleton of u that also fails to satisfy some refinement point.
For example, suppose f is the function to synthesize, ϕ includes the constraint
f (x, y) ď x − 1, and p1 = (3, 3) is a refinement point. Then, the candidate term
u[x, y] = ite(x ě 0, x, y + 1) will be discarded, because ite(3 ě 0, 3, 4) ę 2. Notice,
however, that any candidate u = ite(x ě 0, x, w) is falsified by p1 , regardless of what
w is, since u [3, 3] ď 2 is equivalent to 3 ď 2. This indicates that we can block all ite
candidate terms with condition x ě 0 and true branch x. We can express this constraint
in CVC4SY S by dropping the disjuncts that relate to the false branch of the ite term.
This form of blocking is particularly useful when synthesizing multiple functions (f1 ,
. . . , fn ), since it is often the case that a candidate for a single fi is already sufficient to
falsify the specification, regardless of what the candidates for the other functions are.
Evaluation Unfolding. This technique uses evaluation functions to encode the rela-
tionship between the datatype terms assigned to d and their analogs in the theory T .
For example, the evaluation function for the datatype I defined in (3) is a function
EI : I ˆ Int ˆ Int ÞÑ Int defined axiomatically so that EI (d, m, n) denotes the result of
evaluating d by interpreting any occurrences of x and y in d respectively as m and n and
78 A. Reynolds et al.
indicating that the evaluation of d on point (2, 1) indeed behaves like an ite term when
d has top symbol ite. Our implementation adds these constraints for all terms t whose
top symbols correspond to ite or Boolean connectives. For terms t whose top symbol is
any of the other operators, we add constraints corresponding to their total evaluation of t
when the value of t is fully determined, for example, t « plus(x, y) ñ EI (t, 2, 1) « 3.
Notice this constraint with t = d along with the refinement constraint EI (d, 2, 1) ď 0
suffices to show that d cannot be plus(x, y).
Algorithm. To generate terms up to a given size k, we maintain a set Sτk of terms of type
τ and size k for each datatype τ corresponding to a non-terminal symbol of our input
grammar R. First, we compute for each such τ the set Cτ of its constructor classes,
an equivalence relation over the constructors of τ that groups them by their type. For
example, the constructor classes for I are {x, y, 0, 1}, {plus, minus} and {ite}. Then,
we use the following procedure for generating all terms of size k for type τ :
FAST E NUM(τ , k):
For all:
– Constructor classes C ∈ Cτ , whose elements have type τ1 ˆ . . . ˆ τn Ñ τ ,
– Tuple of naturals (k1 , . . . kn ) such that k1 + . . . + kn + ite(n > 0, 1, 0) = k,
(a) Run FAST E NUM(τi , ki ) for each i = 1, . . . , n,
(b) Add C(t1 , . . . , tn ) to Sτk for all tuples (t1 , . . . , tn ) with ti ∈ Sτkii and all
constructors C ∈ C.
CVC 4 SY : Smart and Fast Term Enumeration for Syntax-Guided Synthesis 79
The recursive procedure FAST E NUM(τ , k) populates the set Sτk of all terms of type τ
with size k. These sets are cached globally. We incorporate an optimization that only
adds terms C(t1 , . . . , tn ) to Sτk whose corresponding terms in the theory T are unique
up to rewriting. This mimics the effect of blocking via theory rewriting as described in
Sect. 2. For example, plus(y, x) is not added to SI1 if that set already contains plus(x, y),
noting that (x + y)Ó = (y + x)Ó. By construction of Sτk for k ě 1, this has the cascad-
ing effect of excluding all terms having y + x as a subterm.
We observe that theory rewriting with structural generalization cannot be easily
incorporated into this scheme since it requires the use of a constraint solver, something
that the above algorithm seeks to avoid.
5 Evaluation
We evaluated the above techniques in CVC4SY on four benchmark sets: invariant syn-
thesis benchmarks from the verification of Lustre [11] models; a set from work on
synthesizing invertibility conditions for bit-vector operators [12] (IC-BV); a set of
bit-vector invariant synthesis problems [2] (CegisT); and the SyGuS-COMP 2018 [1]
benchmarks from five tracks: assorted problems (General), conditional linear arithmetic
80 A. Reynolds et al.
Table 1. Summary of number of problems solved per benchmark set. Best results are in bold.
Fig. 2. Cactus plot on commonly supported benchmark sets. The first scatter plot is for the Lustre
set, the second for the Gen-Crci set, and the latter two for the 862 benchmarks from the PBE sets.
see the biggest gains of s with respect to s-eu; cg is more helpful for IC-BV, with a few
harder benchmarks only solved due to this technique.
The first scatter plot in Fig. 2 shows the advantage of h over s on Lustre, a bench-
mark set containing invariant synthesis problems with dozens of variables. We remark
this configuration excels at quickly finding small solutions for problems with many vari-
ables, although solves fewer problems overall. The second scatter plot shows that while
s takes significantly longer on easy problems, it outperforms f in the long run. The last
two plots show that f significantly outperforms the state of the art on PBE benchmarks.
For all benchmark sets, the auto strategy a chooses the best enumerative strategy
of CVC4SY with only a few exceptions, and hence it is the default configuration of
CVC4 SY . Due to specialized synthesis techniques [4, 14], both a+si and EUS outperform
the purely enumerative strategies of CVC4. This is reflected in the cactus plot on the
commonly supported benchmark sets, where a and f solve more benchmarks than EUS
for lower times but then EUS solves more benchmarks in the end. For a+si, the cactus
plot shows that it outperforms EUS significantly. Nevertheless, we remark that a+si is
able to solve only 393 (16%) of the overall benchmarks using only single invocation
techniques. Hence, we conclude that both smart and fast enumerative strategies are
critical subcomponents in our approach to syntax-guided synthesis.
Acknowledgments. This work was partially supported by the National Science Foundation
under award 1656926 and by the Defense Advanced Research Projects Agency under award
FA8650-18-2-7854.
82 A. Reynolds et al.
References
1. SyGuS-COMP 2018 (2018). http://sygus.seas.upenn.edu/SyGuS-COMP2018.html
2. Abate, A., David, C., Kesseli, P., Kroening, D., Polgreen, E.: Counterexample guided induc-
tive synthesis modulo theories. In: Chockler, H., Weissenbacher, G. (eds.) CAV 2018. LNCS,
vol. 10981, pp. 270–288. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96145-
3 15
3. Alur, R., et al.: Syntax-guided synthesis. In: Irlbeck, M., Peled, D.A., Pretschner, A., (eds.)
Dependable Software Systems Engineering. NATO Science for Peace and Security Series,
D: Information and Communication Security, vol. 40, pp. 1–25. IOS Press (2015)
4. Alur, R., Radhakrishna, A., Udupa, A.: Scaling enumerative program synthesis via divide
and conquer. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10205, pp. 319–
336. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54577-5 18
5. Barrett, C., et al.: CVC4. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol.
6806, pp. 171–177. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-
1 14
6. Barrett, C., Nieuwenhuis, R., Oliveras, A., Tinelli, C.: Splitting on demand in SAT modulo
theories. In: Hermann, M., Voronkov, A. (eds.) LPAR 2006. LNCS (LNAI), vol. 4246, pp.
512–526. Springer, Heidelberg (2006). https://doi.org/10.1007/11916277 35
7. Barrett, C., Shikanian, I., Tinelli, C.: An abstract decision procedure for satisfiability in the
theory of recursive data types. Electr. Notes Theor. Comput. Sci. 174(8), 23–37 (2007)
8. Barrett, C., Tinelli, C.: Satisfiability Modulo Theories. Handbook of Model Checking, pp.
305–343. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-10575-8 11
9. Barrett, C.W., Sebastiani, R., Seshia, S.A., Tinelli, C.: Satisfiability modulo theories. In:
Biere, A., Heule, M., van Maaren, H., Walsh, T. (eds.) Handbook of Satisfiability. Frontiers
in Artificial Intelligence and Applications, vol. 185, pp. 825–885. IOS Press (2009)
10. Gulwani, S.: Programming by examples: applications, algorithms, and ambiguity resolution.
In: Olivetti, N., Tiwari, A. (eds.) IJCAR 2016. LNCS (LNAI), vol. 9706, pp. 9–14. Springer,
Cham (2016). https://doi.org/10.1007/978-3-319-40229-1 2
11. Halbwachs, N., Caspi, P., Raymond, P., Pilaud, D.: The synchronous data flow programming
language LUSTRE. Proc. IEEE 79(9), 1305–1320 (1991)
12. Niemetz, A., Preiner, M., Reynolds, A., Barrett, C., Tinelli, C.: Solving quantified bit-vectors
using invertibility conditions. In: Chockler, H., Weissenbacher, G. (eds.) CAV 2018, Part II.
LNCS, vol. 10982, pp. 236–255. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-
96142-2 16
13. Reynolds, A., Blanchette, J.C.: A decision procedure for (co)datatypes in SMT solvers. In:
Felty, A.P., Middeldorp, A. (eds.) CADE 2015. LNCS (LNAI), vol. 9195, pp. 197–213.
Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21401-6 13
14. Reynolds, A., Deters, M., Kuncak, V., Tinelli, C., Barrett, C.: Counterexample-guided quan-
tifier instantiation for synthesis in SMT. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015,
Part II. LNCS, vol. 9207, pp. 198–216. Springer, Cham (2015). https://doi.org/10.1007/978-
3-319-21668-3 12
CVC 4 SY : Smart and Fast Term Enumeration for Syntax-Guided Synthesis 83
15. Reynolds, A., Viswanathan, A., Barbosa, H., Tinelli, C., Barrett, C.: Datatypes with shared
selectors. In: Galmiche, D., Schulz, S., Sebastiani, R. (eds.) IJCAR 2018. LNCS (LNAI),
vol. 10900, pp. 591–608. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94205-
6 39
16. Solar-Lezama, A., Tancau, L., Bodı́k, R., Seshia, S.A., Saraswat, V.A.: Combinatorial sketch-
ing for finite programs, pp. 404–415. ACM (2006)
17. Udupa, A., Raghavan, A., Deshmukh, J.V., Mador-Haim, S., Martin, M.M.K., Alur, R.:
TRANSIT: specifying protocols with concolic snippets. In: ACM SIGPLAN Conference on
Programming Language Design and Implementation, PLDI 2013, Seattle, 16–19 June 2013,
pp. 287–296 (2013)
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropri-
ate credit to the original author(s) and the source, provide a link to the Creative Commons license
and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder.
Incremental Determinization
for Quantifier Elimination and Functional
Synthesis
Markus N. Rabe(B)
1 Introduction
Given a Boolean formula ∃Y. ϕ with free variables X, quantifier elimination
(also called projection) is the problem to find a formula ψ ≡ ∃Y. ϕ that only
contains variables X. Closely related, the functional synthesis problem is to find
a function fy : 2X → B for all y ∈ Y , such that ϕ[Y → fy (X)] ≡ ∃Y. ϕ.
Quantifier elimination and functional synthesis are fundamental operations in
automated reasoning, computer-aided design, and verification. Hence, progress
in algorithms for these problems benefits a broad range of applications of for-
mal methods. For example, typical algorithms for reactive synthesis reduce to
computing the safe region of a safety game through repeated quantifier elimi-
nations [1–3] or directly employ functional synthesis [4]. Until today, algorithms
for quantifier elimination often involve (reduced ordered) Binary Decision Dia-
grams (BDDs) [5]. However, BDDs often grow exponentially for applications in
verification, and extracting formulas (or strategies, etc.) from BDDs typically
results in huge expressions. The search for alternatives resulted in CEGAR-style
algorithms [6–10].
In this work, we take look at the closely related field of QBF solving. There
pure CEGAR solving [11–13] on the CNF representation is not competitive any-
more [14], and it has been augmented by preprocessing [15,16], circuit repre-
sentations [17–21], and Incremental Determinization (ID) [22]. It may hence be
fruitful to leverage some of the recent developments of QBF.
The contribution of this work is a simple modification of ID to enable quanti-
fier elimination and functional synthesis. Incremental Determinization (ID) is an
algorithm for solving quantified Boolean formulas of the shape ∀X. ∃Y. ϕ, where
M.N. Rabe–Work partially done at University of California at Berkeley.
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 84–94, 2019.
https://doi.org/10.1007/978-3-030-25543-5_6
Incremental Determinization for Quantifier Elimination 85
2 Related Work
Functional Synthesis. Early works on functional synthesis tried to exploit Craig
interpolation, but did not scale well enough [24]. This was followed by first
attempts to use CEGAR [6], which failed, however, to surpass the performance
of BDDs [7]. More recent works revisited the use of BDDs, e.g. the tools SSyft [25]
and RSynth [26,27]. This motivated the search for alternatives to BDDs [8–10].
At their core, these new algorithms all rely on counter-example guided abstrac-
tion refinement (CEGAR) [28], but they apply it in clever, compositional ways.
However, they still inherit the well-known weaknesses of CEGAR (as, for exam-
ple, discussed in the QBF literature): For the simple formula ϕ = i<n xi ↔ yi ,
where n = |X| = |Y | and xi ∈ X and yi ∈ Y , CEGAR needs to browse through
2n satisfying assignments just to recover that the function we were looking for
is f (x) = x.
The Back-and-Forth algorithm explores stronger abstraction using MaxSAT
solvers as a means to reduce the number of assignments that CEGAR needs
to explore [8]. ParSyn attempts to combat the problem with parallel compute
power and a compositional approach [9]. This compositional approach has later
been refined using a wDNNF decomposition [10].
QBF Certification. Some solvers and preprocessors for QBF have the ability to
not only provide a yes/no answer, but also produce a certificate (i.e. Skolem func-
tions) for their result [13,22,29,30]. While most QBF approaches suffer heavy
86 M. N. Rabe
3 Preliminaries
Boolean formulas over a finite set of variables x ∈ X with domain B = {0, 1}
are generated by the following grammar:
ϕ := 0 | 1 | x | ¬ϕ | (ϕ) | ϕ ∨ ϕ | ϕ ∧ ϕ
Other logical operations, such as implication, XOR, and equality, are considered
syntactic sugar with the usual definitions.
An assignment x to a set of variables X is a function x : X → B that maps
each variable x ∈ X to either 1 or 0. We denote the space of assignments to
some set of variables X with 2X .
Given formulas ϕ and ϕ , and a variable x, we denote the substitution of x
by ϕ in ϕ as ϕ[x → ϕ ]. We lift substitutions to sets of variables ϕ[X → tx ]
when tx maps each x ∈ X to a formula ϕ .
A literal l is either a variable x ∈ X, or its negation ¬x. We use l to denote
the literal that is the logical negation of l. A disjunction of literals (l1 ∨ . . . ∨ ln )
is called a clause and their conjunction (l1 ∧ . . . ∧ ln ) is called a cube. We denote
the variable of a literal by var (l) and lift the notion to clauses var (l1 ∨ · · · ∨ ln ) =
{var (l1 ), . . . , var (ln )}.
A formula is in conjunctive normal form (CNF), if it is a conjunction of
clauses. Throughout this exposition, we assume that the input formula is given
in CNF. (The output, however, can be a non-CNF formula.) It is trivial to lift the
approach to general Boolean formulas: Given a Boolean formula ϕ over variables
X, the Tseitin transformation provides us a formula ψ with ϕ ≡ ∃Z.ψ, where Z
are fresh variables [31]. Note that eliminating a group of variables X ⊆ X in ϕ
is then the same as eliminating X ∪ Z in ψ.
Resolution is a well-known proof rule that allows us to merge two clauses
as follows. Given two clauses C1 ∨ v and C2 ∨ ¬v, we call C1 ⊗v C2 = C1 ∨ C2
their resolvent with pivot v. The resolution rule states that C1 ∨ v and C2 ∨ ¬v
imply their resolvent. Resolution is refutationally complete for Boolean formulas
in CNF, i.e. given a formula in CNF that is equivalent to false, we can derive
the empty clause using only resolution.
5 Experimental Evaluation
We implemented the modifications to ID in CADET,1 a competitive 2QBF
solver [22]. In this section, we compare CADET experimentally with existing
1
CADET is available at https://github.com/MarkusRabe/cadet.
90 M. N. Rabe
100
Time in seconds
10
1
CADET+
CADET
0.1 BFSS
BaFSyn
0.01
0 50 100 150 200 250
Fig. 1. Log-scale cactus plot comparing the performance over all instances.
7 Conclusions
In this work, we extended ID with the ability to solve functional synthesis and
quantifier elimination problems. The extension is very simple—we only need
to add the clauses of the original formula to its conflict check. The resulting
algorithm significantly outperforms previous algorithms for functional synthesis.
Acknowledgements. The author wants to thank to Shubham Goel, Shetal Shah, and
Lucas Tabajara for insightful discussions and for their assistance with running their
functional synthesis tools. In particular, I want to express my gratitude to Supratik
Chakraborty for inspiring me to work on the topic in a discussion in the summer of
2016.
92 M. N. Rabe
References
1. Ehlers, R.: Symbolic bounded synthesis. In: Touili, T., Cook, B., Jackson, P. (eds.)
CAV 2010. LNCS, vol. 6174, pp. 365–379. Springer, Heidelberg (2010). https://
doi.org/10.1007/978-3-642-14295-6 33
2. Brenguier, R., Pérez, G.A., Raskin, J., Sankur, O.: AbsSynthe: abstract synthesis
from succinct safety specifications. In: Proceedings of SYNT, pp. 100–116 (2014)
3. Jacobs, S., et al.: The 4th reactive synthesis competition (syntcomp 2017): bench-
marks, participants & results. arXiv preprint arXiv:1711.11439 (2017)
4. Zhu, S., Tabajara, L.M., Li, J., Pu, G., Vardi, M.Y.: Symbolic LTLf synthesis. In:
Proceedings of IJCAI, IJCAI 2017, pp. 1362–1369. AAAI Press (2017)
5. Bryant, R.E.: Symbolic Boolean manipulation with ordered binary-decision dia-
grams. ACM Comput. Surv. 24(3), 293–318 (1992)
6. Goldberg, E., Manolios, P.: Quantifier elimination by dependency sequents. Formal
Methods Syst. Des. 45(2), 111–143 (2014). https://doi.org/10.1007/s10703-014-
0214-z
7. Goldberg, E., Manolios, P.: Quantifier elimination via clause redundancy. In: For-
mal Methods in Computer-Aided Design, pp. 85–92, October 2013
8. Chakraborty, S., Fried, D., Tabajara, L.M., Vardi, M.Y.: Functional synthesis via
input-output separation. In: Proceedings of FMCAD, pp. 1–9. IEEE (2018)
9. Akshay, S., Chakraborty, S., John, A.K., Shah, S.: Towards parallel Boolean func-
tional synthesis. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol.
10205, pp. 337–353. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-
662-54577-5 19
10. Akshay, S., Chakraborty, S., Goel, S., Kulal, S., Shah, S.: What’s hard about
boolean functional synthesis? In: Chockler, H., Weissenbacher, G. (eds.) CAV 2018.
LNCS, vol. 10981, pp. 251–269. Springer, Cham (2018). https://doi.org/10.1007/
978-3-319-96145-3 14
11. Janota, M., Klieber, W., Marques-Silva, J., Clarke, E.: Solving QBF with coun-
terexample guided refinement. In: Cimatti, A., Sebastiani, R. (eds.) SAT 2012.
LNCS, vol. 7317, pp. 114–128. Springer, Heidelberg (2012). https://doi.org/10.
1007/978-3-642-31612-8 10
12. Janota, M., Marques-Silva, J.: Solving QBF by clause selection. In: Proceedings of
IJCAI, pp. 325–331. AAAI Press (2015)
13. Rabe, M.N., Tentrup, L.: CAQE: a certifying QBF solver. In: Proceedings of
FMCAD, pp. 136–143 (2015)
14. QBFEVAL: QBF solver evaluation portal. http://www.qbflib.org/index eval.php.
Accessed Jan 2018
15. Biere, A., Lonsing, F., Seidl, M.: Blocked clause elimination for QBF. In: Bjørner,
N., Sofronie-Stokkermans, V. (eds.) CADE 2011. LNCS (LNAI), vol. 6803, pp.
101–115. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22438-
6 10
16. Wimmer, R., Reimer, S., Marin, P., Becker, B.: HQSpre – an effective preprocessor
for QBF and DQBF. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol.
10205, pp. 373–390. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-
662-54577-5 21
17. Klieber, W., Sapra, S., Gao, S., Clarke, E.: A non-prenex, non-clausal QBF solver
with game-state learning. In: Strichman, O., Szeider, S. (eds.) SAT 2010. LNCS,
vol. 6175, pp. 128–142. Springer, Heidelberg (2010). https://doi.org/10.1007/978-
3-642-14186-7 12
Incremental Determinization for Quantifier Elimination 93
18. Jordan, C., Klieber, W., Seidl, M.: Non-CNF QBF solving with QCIR. In: AAAI
Workshop: Beyond NP (2016)
19. Balabanov, V., Jiang, J.-H.R., Scholl, C., Mishchenko, A., Brayton, R.K.: 2QBF:
challenges and solutions. In: Creignou, N., Le Berre, D. (eds.) SAT 2016. LNCS,
vol. 9710, pp. 453–469. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-
40970-2 28
20. Tentrup, L.: Non-prenex QBF solving using abstraction. In: Creignou, N., Le Berre,
D. (eds.) SAT 2016. LNCS, vol. 9710, pp. 393–401. Springer, Cham (2016). https://
doi.org/10.1007/978-3-319-40970-2 24
21. Janota, M.: Circuit-based search space pruning in QBF. In: Beyersdorff, O., Win-
tersteiger, C.M. (eds.) SAT 2018. LNCS, vol. 10929, pp. 187–198. Springer, Cham
(2018). https://doi.org/10.1007/978-3-319-94144-8 12
22. Rabe, M.N., Seshia, S.A.: Incremental determinization. In: Creignou, N., Le Berre,
D. (eds.) SAT 2016. LNCS, vol. 9710, pp. 375–392. Springer, Cham (2016). https://
doi.org/10.1007/978-3-319-40970-2 23
23. Rabe, M.N., Tentrup, L., Rasmussen, C., Seshia, S.A.: Understanding and extend-
ing incremental determinization for 2QBF. In: Chockler, H., Weissenbacher, G.
(eds.) CAV 2018. LNCS, vol. 10982, pp. 256–274. Springer, Cham (2018). https://
doi.org/10.1007/978-3-319-96142-2 17
24. Jiang, J.-H.R.: Quantifier elimination via functional composition. In: Bouajjani, A.,
Maler, O. (eds.) CAV 2009. LNCS, vol. 5643, pp. 383–397. Springer, Heidelberg
(2009). https://doi.org/10.1007/978-3-642-02658-4 30
25. Zhu, S., Tabajara, L.M., Li, J., Pu, G., Vardi, M.Y.: A symbolic approach to safety
ltl synthesis. In: Strichman, O., Tzoref-Brill, R. (eds.) Hardware and Software:
Verification and Testing. LNCS, vol. 10629, pp. 147–162. Springer, Cham (2017).
https://doi.org/10.1007/978-3-319-70389-3 10
26. Tabajara, L.M., Vardi, M.Y.: Factored Boolean functional synthesis. In: Proceed-
ings of FMCAD, pp. 124–131. IEEE (2017)
27. Fried, D., Tabajara, L.M., Vardi, M.Y.: BDD-based Boolean functional synthesis.
In: Chaudhuri, S., Farzan, A. (eds.) CAV 2016. LNCS, vol. 9780, pp. 402–421.
Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41540-6 22
28. Clarke, E., Grumberg, O., Jha, S., Lu, Y., Veith, H.: Counterexample-guided
abstraction refinement. In: Emerson, E.A., Sistla, A.P. (eds.) CAV 2000. LNCS,
vol. 1855, pp. 154–169. Springer, Heidelberg (2000). https://doi.org/10.1007/
10722167 15
29. Lonsing, F., Biere, A.: DepQBF: a dependency-aware QBF solver. JSAT 7(2–3),
71–76 (2010)
30. Heule, M.J.H., Seidl, M., Biere, A.: A unified proof system for QBF preprocess-
ing. In: Demri, S., Kapur, D., Weidenbach, C. (eds.) IJCAR 2014. LNCS (LNAI),
vol. 8562, pp. 91–106. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-
08587-6 7
31. Tseitin, G.S.: On the complexity of derivation in propositional calculus. Stud.
Constructive Math. Math. Log. 2(115–125), 10–13 (1968)
32. Buning, H., Karpinski, M., Flogel, A.: Resolution for quantified Boolean formulas.
Inf. Comput. 117(1), 12–18 (1995)
33. Solar-Lezama, A., Rabbah, R.M., Bodı́k, R., Ebcioglu, K.: Programming by sketch-
ing for bit-streaming programs. In: Proceedings of PLDI, pp. 281–294 (2005)
34. Cook, B., Kroening, D., Rümmer, P., Wintersteiger, C.M.: Ranking function syn-
thesis for bit-vector relations. In: Esparza, J., Majumdar, R. (eds.) TACAS 2010.
LNCS, vol. 6015, pp. 236–250. Springer, Heidelberg (2010). https://doi.org/10.
1007/978-3-642-12002-2 19
94 M. N. Rabe
35. Wintersteiger, C.M., Hamadi, Y., De Moura, L.: Efficiently solving quantified bit-
vector formulas. Proc. FMSD 42(1), 3–23 (2013)
36. Jordan, C., Kaiser, L
.: Experiments with reduction finding. In: Järvisalo, M., Van
Gelder, A. (eds.) SAT 2013. LNCS, vol. 7962, pp. 192–207. Springer, Heidelberg
(2013). https://doi.org/10.1007/978-3-642-39071-5 15
37. Vazquez-Chanlatte, M.: mvcisback/py-aiger, August 2018. https://doi.org/10.
5281/zenodo.1326224
38. Lederman, G., Rabe, M.N., Lee, E.A., Seshia, S.A.: Learning heuristics for
automated reasoning through deep reinforcement learning. arXiv preprint
arXiv:1807.08058 (2018)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Numerical Programs
Loop Summarization with Rational
Vector Addition Systems
1 Introduction
Modern software verification techniques employ a number of heuristics for rea-
soning about loops. While these heuristics are often effective, they are unpre-
dictable. For example, an abstract interpreter may fail to find the most precise
invariant expressible in the language of its abstract domain due to imprecise
widening, or a software-model checker might fail to terminate because it gen-
erates interpolants that are insufficiently general. This paper presents a loop
summarization technique that is capable of generating loop invariants in an
expressive and decidable language and provides theoretical guarantees about
invariant quality.
The key idea behind our technique is to leverage reachability results of vector
addition systems (VAS) for invariant generation. Vector addition systems are a
class of infinite-state transition systems with decidable reachability, classically
used as a model of parallel systems [12]. We consider a variation of VAS, rational
VAS with resets ( Q-VASR), wherein there is a finite number of rational-typed
variables and a finite set of transitions that simultaneously update each variable
in the system by either adding a constant value or (re)setting the variable to
a constant value. Our interest in Q-VASRs stems from the fact that there is
(polytime) procedure to compute a linear arithmetic formula that represents a
Q-VASR’s reachability relation [8].
Since the reachability relation of a Q-VASR is computable, the dynamics
of Q-VASR can be analyzed without relying on heuristic techniques. However,
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 97–115, 2019.
https://doi.org/10.1007/978-3-030-25543-5_7
98 J. Silverman and Z. Kincaid
there is a gap between Q-VASR and the loops that we are interested in summa-
rizing. The latter typically use a rich set of operations (memory manipulation,
conditionals, non-constant increments, non-linear arithmetic, etc) and cannot be
analyzed precisely. We bridge the gap with a procedure that, for any loop, syn-
thesizes a Q-VASR that simulates it. The reachability relation of the Q-VASR
can then be used to over-approximate the behavior of the loop. Moreover, we
prove that if a loop is expressed in linear rational arithmetic (LRA), then our
procedure synthesizes a best Q-VASR abstraction, in the sense that it simulates
any other Q-VASR that simulates the loop. That is, imprecision in the analysis
is due to inherent limitations of the Q-VASR model, rather heuristic algorithmic
choices.
One limitation of the model is that Q-VASRs over-approximate multi-path
loops by treating the choice between paths as non-deterministic. We show that
Q-VASRS, Q-VASR extended with control states, can be used to improve our
invariant generation scheme by encoding control flow information and inter-
path control dependencies that are lost in the Q-VASR abstraction. We give an
algorithm for synthesizing a Q-VASRS abstraction of a given loop, which (like
our Q-VASR abstraction algorithm) synthesizes best abstractions under certain
assumptions.
Finally, we note that our analysis techniques extend to complex control struc-
tures (such as nested loops) by employing summarization compositionally (i.e.,
“bottom-up”). For example, our analysis summarizes a nested loop by first sum-
marizing its inner loops, and then uses the summaries to analyze the outer loop.
As a result of compositionality, our analysis can be applied to partial programs,
is easy to parallelize, and has the potential to scale to large code bases.
The main contributions of the paper are as follows:
1.1 Outline
Fig. 1. A persistent queue and integer model. back len and front len models the
lengths of the lists front and back; mem ops counts the number of memory operations
in the computation.
before and after executing a computation (respectively). For any given program
P , a transition formula TFP can be computed by recursion on syntax:1
TFx := e x = e ∧ y = y
y=x∈Var
w = w + k ∧ x = x − k ∧ y = y + 3k ∧ z = z. (†)
To capture information about the pre-condition of the loop, we can project the
primed variables to obtain back len > 0; similarly, for the post-condition, we can
project the unprimed variables to obtain back len ≥ 0. Finally, combining (†)
Loop Summarization with Rational Vector Addition Systems 101
(translated back into the vocabulary of the program) and the pre/post-condition,
we form the following approximation of the dequeue loop’s behavior:
⎛ ⎞
front len = front len + k
⎜∧back len = back len − k ⎟ back len > 0
∃k.k ≥ 0∧⎜⎝∧mem ops = mem ops + 3k
⎟∧ k > 0 ⇒
⎠ .
∧back len ≥ 0)
∧size = size
Using this summary for the dequeue loop, we proceed to compute a transition
formula for the body of the harness loop (omitted for brevity). Just as with the
dequeue loop, we analyze the harness loop by synthesizing a Q-VASR that sim-
ulates it, Vhar (below), where the correspondence between the state space of the
harness loop and Vhar is given by the transformation Shar :
⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎛ ⎞
v 0 0 0 1 0 front len size = v
⎢w⎥ ⎢0 1 0 0 0⎥ ⎢ back len ⎥ ⎜ ∧ back len = w ⎟
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎜ ⎟
⎢ x ⎥ = ⎢0 3 1 0 0⎥ ⎢ mem ops ⎥ ; i.e., ⎜ ∧ mem ops + 3back len = x ⎟ .
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎜ ⎟
⎣ y ⎦ ⎣1 1 0 0 0⎦ ⎣ size ⎦ ⎝ ∧ back len + front len = y ⎠
z 00001 nb ops ∧ nb ops = z
⎧ Shar ⎫
⎪
⎪ ⎪
⎪
⎪ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎪
⎪
⎪
⎪
⎪ v v + 1 v v − 1 v v − 1 ⎪
⎪
⎪
⎪ ⎪
⎨⎢
⎪ ⎥ ⎢w + 1⎥ ⎢w⎥ ⎢ w ⎥ ⎢w ⎥ ⎢ 0 ⎥⎪ ⎪
w
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎬
Vhar = ⎢ x
⎢ ⎥
⎥ → ⎢ x + 4 ⎥, ⎢ x ⎥ → ⎢x + 2⎥, ⎢ x ⎥ → ⎢x + 2⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎪
⎪⎣ y ⎦ ⎣y + 1⎦ ⎣y ⎦ ⎣y − 1⎦ ⎣ y ⎦ ⎣y − 1⎦⎪ ⎪
⎪
⎪ ⎪
⎪
⎪
⎪ z ⎪
⎪
⎪
⎪ z + 1 z z + 1 z z + 1 ⎪
⎪
⎪
⎩ ⎪
⎭
enqueue dequeue fast dequeue slow
we can prove (supposing that we start in a state where all variables are zero)
that mem ops is at most 4 times nb ops (i.e., enqueue and dequeue use O(1)
amortized memory operations).
2 Background
The syntax of ∃LIRA, the existential fragment of linear integer/rational arith-
metic, is given by the following grammar:
s, t ∈ Term ::= c | x | s + t | c · t
F, G ∈ Formula ::= s < t | s = t | F ∧ G | F ∨ G | ∃x ∈ Q.F | ∃x ∈ Z.F
where x is a (rational sorted) variable symbol and c is a rational constant.
Observe that (without loss of generality) formulas are free of negation. ∃LRA
(linear rational arithmetic) refers to the fragment of ∃LIRA that omits quantifi-
cation over the integer sort.
A transition system is a pair (S, →) where S is a (potentially infinite) set
of states and →⊆ S × S is a transition relation. For a transition relation →, we
use →∗ to denote its reflexive, transitive closure.
A transition formula is a formula F (x, x ) whose free variables range over
x = x1 , ..., xn and x = x1 , ..., xn (we refer to the number n as the dimension
of F ); these variables designate the state before and after a transition. In the
following, we assume that transition formulas are defined over ∃LIRA. For a
transition formula F (x, x ) and vectors of terms s and t, we use F (s, t) to denote
the formula F with each xi replaced by si and each xi replaced by ti . A transition
formula F (x, x ) defines a transition system (SF , →F ), where the state space SF
is Qn and which can transition u →F v iff F (u, v) is valid.
For two rational vectors a and d b of the same dimension d, we use a · b to
denote the inner product a·b = i=1 ai bi and a∗b to denote the pointwise (aka
Hadamard) product (a∗b)i = ai bi . For any natural number i, we use ei to denote
the standard basis vector in the ith direction (i.e., the vector consisting of all
zeros except the ith entry, which is 1), where the dimension of ei is understood
from context. We use In to denote the n × n identity matrix.
Definition 1. A rational vector addition system with resets (Q-VASR)
of dimension d is a finite set V ⊆ {0, 1}d ×Qd of transformers. Each transformer
(r, a) ∈ V consists of a binary reset vector r, and a rational addition vector a,
both of dimension d. V defines a transition system (SV , →V ), where the state
space SV is Qd and which can transition u →V v iff v = r ∗ u + a for some
(r, a) ∈ V .
Definition 2. A rational vector addition system with resets and states
(Q-VASRS) of dimension d is a pair V = (Q, E), where Q is a finite set of
control states, and E ⊆ Q × {0, 1}d × Qd × Q is a finite set of edges labeled
by (d-dimensional) transformers. V defines a transition system (SV , →V ), where
the state space SV is Q × Qn and which can transition (q1 , u) →V (q2 , v) iff there
is some edge (q1 , (r, a), q2 ) ∈ E such that v = r ∗ u + a.
Loop Summarization with Rational Vector Addition Systems 103
Note that Q-VASR can be realized as Q-VASRS with a single control state,
so this theorem also applies to Q-VASR.
Algorithm 1. abstract-VASR(F)
input : Transition formula F of dimension n
output: Q-VASR abstraction of F ; Best Q-VASR abstraction if F in ∃LRA
1 Skolemize existentials of F ;
2 (S, V ) ← (In , ∅); // (In , ∅) is least in order
3 Γ ← F;
4 while Γ is satisfiable do
5 Let M be a model of Γ ;
6 C ← cube of the DNF of F with M |= C;
7 (S, V ) ← (S, V ) α̂(C);
8 Γ ← Γ ∧ ¬γ(S, V )
9 return (S, V )
The proof of this theorem as well as the proofs to all subsequent theorems,
lemmas, and propositions are in the extended version of this paper [20].
This section shows how to compute a Q-VASR abstraction for a consistent conjunc-
tive formula. When the input formula is in ∃LRA, the computed Q-VASR abstrac-
tion will be a best Q-VASR abstraction of the input formula. The intuition is that,
since ∃LRA is a convex theory, a best Q-VASR abstraction consists of a single tran-
sition. For ∃LIRA formulas, our procedure produces a Q-VASR abstract that is not
guaranteed to be best, precisely because ∃LIRA is not convex.
Let C be consistent, conjunctive transition formula. Observe that the set
ResC {s, a : C |= s · x = a}, which represents linear combinations of vari-
ables that are reset across C, forms a vector space. Similarly, the set IncC =
{s, a : C |= s · x = s · x + a}, which represents linear combinations of vari-
ables that are incremented across C, forms a vector space. We compute bases for
both ResC and IncC , say {s1 , a1 , ..., sm , am } and {sm+1 , am+1 , ..., sd , ad },
respectively. We define α̂(C) to be the Q-VASR abstraction α̂(C) (S, {(r, a)}),
where ⎡ ⎤ ⎡ ⎤
s1 (d−m) times a1
⎢ .. ⎥ ⎢ .. ⎥
S ⎣ . ⎦ r [ 0 ·· · 0 1 · · · 1 ] a ⎣ . ⎦ .
sd m times ad
In particular, notice that since the term z − w is both incremented and reset, it
is represented by two different dimensions in α̂(C).
This section shows how to compute least upper bounds w.r.t. the order.
By definition of the order, if (S, V ) is an upper bound of (S 1 , V 1 ) and
(S , V ), then there must exist matrices T 1 and T 2 such that T 1 S 1 = S = T 2 S 2 ,
2 2
Algorithm 2. (S 1 , V 1 ) (S 2 , V 2 )
input : Normal Q-VASR abstractions (S 1 , V 1 ) and (S 2 , V 2 ) of equal concrete
dimension
output: Least upper bound (w.r.t. ) of (S 1 , V 2 ) and (S 1 , V 2 )
1 S, T 1 , T 2 ← empty matrices;
2 foreach coherence class C 1 of V 1 do
3 foreach coherence class C 2 of V 2 do
4 (U 1 , U 2 ) ← pushout(ΠC 1 S 1 , ΠC 2 S 2 );
S 1 T1 T2
5 S← 1 1 ; T ← 1 ; T2 ← 2 ;
U ΠC 1 S U ΠC 1 U ΠC 2
6 V ← image(V 1 , T 1 ) ∪ image(V 2 , T 2 );
7 return (S, V )
We now consider how to compute a best such T1 and T2 . Observe that conditions
(1), (2), and (3) hold exactly when for each row i, (Ti1 , Ti2 ) belongs to the set
Since a row vector ti is coherent w.r.t. V i iff its non-zero positions belong to the
same coherence class of V i (equivalently,
! ti = uΠC i for some coherence class
C and vector u), we have T = C 1 ,C 2 T (C 1 , C 2 ), where the union is over all
i
T (C 1 , C 2 ) {(u1 ΠC 1 , u2 ΠC 2 ) : u1 ΠC 1 S 1 = u2 ΠC 2 S 2 }.
that, starting with the configuration x = 0 ∧ i = 1, the loop maintains the invari-
ant that 2x ≤ i. The (best) Q-VASR abstraction of the loop, pictured in Fig. 2b,
over-approximates the control flow of the loop by treating the conditional branch
in the loop as a non-deterministic branch. This over-approximation may violate
the invariant 2x ≤ i by repeatedly executing the path where both variables are
incremented. On the other hand, the Q-VASRS abstraction of the loop pictured
in Fig. 2c captures the understanding that the loop must oscillate between the
two paths. The loop summary obtained from the reachability relation of this
Q-VASRS is powerful enough to prove the invariant 2x ≤ i holds (under the
precondition x = 0 ∧ i = 1).
Algorithm 3. abstract-VASRS(F, P )
input : Transition formula F (x, x ), set of pairwise-disjoint predicates P over
x such that for all u, v with u →F v, there exists p, q ∈ P with p(u)
and q(v) both valid
output: Best Q-VASRS abstraction of F with control states P
1 For all p, q ∈ P , let (Sp,q , Vp,q ) ← abstract-VASR(p(x) ∧ F (x, x ) ∧ q(x ));
2 (S, V ) ← least upper bound of all (Sp,q , Vp,q );
3 For all p, q ∈ P , let Tp,q ← the simulation matrix from (Sp,q , Vp,q ) to (S, V );
4 E = {(p, r, a, q) : p, q ∈ P, (r, a) ∈ image(Vp,q , Tp,q )};
5 return (S, (P, E))
Q-VASRS that share the same set of control states, then best abstractions do
exist and can be computed using Algorithm 3.
Algorithm 3 works as follows: first, for each pair of formulas p, q ∈ P , compute
a best Q-VASR abstraction of the formula p(x) ∧ F (x, x ) ∧ q(x ) and call it
(Sp,q , V p,q ). (Sp,q , V p,q ) over-approximates the transitions of F that begin in a
program state satisfying p and end in a program state satisfying q. Second, we
compute the least upper bound of all Q-VASR abstractions (Sp,q , V p,q ) to get
a Q-VASR abstraction (S, V ) for F . As a side-effect of the least upper bound
computation, we obtain a linear simulation Tp,q from (Sp,q , Vp,q ) to (S, V ) for
each p, q. A best Q-VASRS abstraction of F (x, x ) with control states P has S
as its simulation matrix and has the image of Vp,q under Tp,q as the edges from
p to q.
Algorithm 4. iter-VASRS(F )
input : Transition formula F (x, x )
output: Over-approximation of the transitive closure of F
1 P ← topological closure of DNF of ∃x .F (see [17]);
2 /* Compute connected regions */
3 while ∃p1 , p2 ∈ P with p1 ∧ p2 satisfiable do
4 P ← (P \ {p1 , p2 }) ∪ {p1 ∨ p2 }
5 (S, V) ← abstract-VASRS(F, P );
6 return reach(V)(Sx, Sx ) ◦ (x = x ∨ F )
5 Evaluation
The goals of our evaluation is the answer the following questions:
– Are Q-VASR sufficiently expressive to be able to generate accurate loop sum-
maries?
– Does the Q-VASRS technique improve upon the precision of Q-VASR?
– Are the Q-VASR/Q-VASRS loop summarization algorithms performant?
We implemented our loop summarization procedure and the compositional
whole-program summarization technique described in Sect. 1.1. We ran on a suite
of 165 benchmarks, drawn from the C4B [2] and HOLA [4] suites, as well as the
safe, integer-only benchmarks in the loops category of SV-Comp 2019 [22]. We
ran each benchmark with a time-out of 5 min, and recorded how many bench-
marks were proved safe by our Q-VASR-based technique and our Q-VASRS-
based technique. For context, we also compare with CRA [14] (a related loop
112 J. Silverman and Z. Kincaid
6 Related Work
Compositional Analysis. Our analysis follows the same high-level structure as
compositional recurrence analysis (CRA) [5,14]. Our analysis differs from CRA
in the way that it summarizes loops: we compute loop summaries by over-
approximating loops with vector addition systems and computing reachability
relations, whereas CRA computes loop summaries by extracting recurrence rela-
tions and computing closed forms. The advantage of our approach is that is that
we can use Q-VASR to accurately model multi-path loops and can make theo-
retical guarantees about the precision of our analysis; the advantage of CRA is
its ability to generate non-linear invariants.
Vector Addition Systems. Our invariant generation method draws upon Haase
and Halfon’s polytime procedure for computing the reachability relation of inte-
ger vector addition systems with states and resets [8]. Generalization from the
integer case to the rational case is straightforward. Continuous Petri nets [3] are
a related generalization of vector addition systems, where time is taken to be
continuous (Q-VASR, in contrast, have rational state spaces but discrete time).
Reachability for continuous Petri nets is computable polytime [6] and definable
in ∃LRA [1].
Loop Summarization with Rational Vector Addition Systems 113
Sinn et al. present a technique for resource bound analysis that is based on
modeling programs by lossy vector addition system with states [21]. Sinn et al.
model programs using vector addition systems with states over the natural num-
bers, which enables them to use termination bounds for VASS to compute upper
bounds on resource usage. In contrast, we use VASS with resets over the rationals,
which (in contrast to VASS over N) have a ∃LIRA-definable reachability relation,
enabling us to summarize loops. Moreover, Sinn et al.’s method for extracting
VASS models of programs is heuristic, whereas our method gives precision guar-
antees.
References
1. Blondin, M., Finkel, A., Haase, C., Haddad, S.: Approaching the coverability prob-
lem continuously. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol.
9636, pp. 480–496. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-
662-49674-9 28
2. Carbonneaux, Q., Hoffmann, J., Shao, Z.: Compositional certified resource bounds.
In: PLDI (2015)
3. David, R., Alla, H.: Continuous Petri nets. In: Proceedings of 8th European Work-
shop on Applications and Theory Petri Nets, pp. 275–294 (1987)
4. Dillig, I., Dillig, T., Li, B., McMillan, K.: Inductive invariant generation via abduc-
tive inference. In: OOPSLA (2013)
5. Farzan, A., Kincaid, Z.: Compositional recurrence analysis. In: FMCAD (2015)
6. Fraca, E., Haddad, S.: Complexity analysis of continuous Petri nets. Fundam. Inf.
137(1), 1–28 (2015)
7. Gurfinkel, A., Kahsai, T., Komuravelli, A., Navas, J.A.: The SeaHorn verification
framework. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206,
pp. 343–361. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21690-
4 20
114 J. Silverman and Z. Kincaid
8. Haase, C., Halfon, S.: Integer vector addition systems with states. In: Ouaknine,
J., Potapov, I., Worrell, J. (eds.) RP 2014. LNCS, vol. 8762, pp. 112–124. Springer,
Cham (2014). https://doi.org/10.1007/978-3-319-11439-2 9
9. Heizmann, M., et al.: Ultimate automizer and the search for perfect interpolants.
In: Beyer, D., Huisman, M. (eds.) TACAS 2018. LNCS, vol. 10806, pp. 447–451.
Springer, Cham (2018). https://doi.org/10.1007/978-3-319-89963-3 30
10. Hrushovski, E., Ouaknine, J., Pouly, A., Worrell, J.: Polynomial invariants for
affine programs. In: Logic in Computer Science, pp. 530–539 (2018)
11. Humenberger, A., Jaroschek, M., Kovács, L.: Invariant Generation for Multi-
Path Loops with Polynomial Assignments. In: Verification, Model Checking, and
Abstract Interpretation. LNCS, vol. 10747, pp. 226–246. Springer, Cham (2018).
https://doi.org/10.1007/978-3-319-73721-8 11
12. Karp, R.M., Miller, R.E.: Parallel program schemata. J. Comput. Syst. Sci. 3(2),
147–195 (1969)
13. Kincaid, Z., Breck, J., Forouhi Boroujeni, A., Reps, T.: Compositional recurrence
analysis revisited. In: PLDI (2017)
14. Kincaid, Z., Cyphert, J., Breck, J., Reps, T.: Non-linear reasoning for invariant
synthesis. PACMPL 2(POPL), 1–33 (2018)
15. Kovács, L.: Reasoning algebraically about P-solvable loops. In: Ramakrishnan,
C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 249–264. Springer,
Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3 18
16. Li, Y., Albarghouthi, A., Kincaid, Z., Gurfinkel, A., Chechik, M.: Symbolic opti-
mization with SMT solvers. In: POPL, pp. 607–618 (2014)
17. Monniaux, D.: A quantifier elimination algorithm for linear real arithmetic. In:
Cervesato, I., Veith, H., Voronkov, A. (eds.) LPAR 2008. LNCS (LNAI), vol.
5330, pp. 243–257. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-
540-89439-1 18
18. Reps, T., Sagiv, M., Yorsh, G.: Symbolic implementation of the best transformer.
In: Steffen, B., Levi, G. (eds.) VMCAI 2004. LNCS, vol. 2937, pp. 252–266.
Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24622-0 21
19. Rodrı́guez-Carbonell, E., Kapur, D.: Automatic generation of polynomial loop
invariants: algebraic foundations. In: ISSAC, pp. 266–273 (2004)
20. Silverman, J., Kincaid, Z.: Loop summarization with rational vector addition sys-
tems (extended version). arXiv e-prints. arXiv:1905.06495, May 2019
21. Sinn, M., Zuleger, F., Veith, H.: A simple and scalable static analysis for bound
analysis and amortized complexity analysis. In: Biere, A., Bloem, R. (eds.) CAV
2014. LNCS, vol. 8559, pp. 745–761. Springer, Cham (2014). https://doi.org/10.
1007/978-3-319-08867-9 50
22. 8th International Competition on Software Verification (SV-COMP 2019) (2019).
https://sv-comp.sosy-lab.org/2019/
23. Tarjan, R.E.: A unified approach to path problems. J. ACM 28(3), 577–593 (1981)
24. Thakur, A., Reps, T.: A method for symbolic computation of abstract operations.
In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 174–192.
Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31424-7 17
Loop Summarization with Rational Vector Addition Systems 115
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Invertibility Conditions for Floating-Point
Formulas
1 Introduction
Satisfiability Modulo Theories (SMT) formulas including either the theory of
floating-point numbers [12] or universal quantifiers [24,32] are widely regarded
as some of the hardest to solve. Problems that combine universal quantification
over floating-points are rare—experience to date has suggested they are hard for
solvers and would-be users should either give up or develop their own incomplete
techniques. However, progress in theory solvers for floating-point [11] and the
use of expression synthesis for handling universal quantifiers [27,29] suggest that
these problems may not be entirely out of reach after all, which could potentially
impact a number of interesting applications.
This paper makes substantial progress towards a scalable approach for solv-
ing quantified floating-point constraints directly in an SMT solver. Developing
procedures for quantified floating-points requires considerable effort, both foun-
dationally and in practice. We focus primarily on establishing a foundation for
lifting to quantified floating-point formulas a procedure for solving quantified
bit-vector formulas by Niemetz et al. [26]. That procedure relies on so-called
This work was supported in part by DARPA (award no. FA8650-18-2-7861), ONR
(award no. N68335-17-C-0558) and NSF (award no. 1656926).
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 116–136, 2019.
https://doi.org/10.1007/978-3-030-25543-5_8
Invertibility Conditions for Floating-Point Formulas 117
2 Preliminaries
We assume the usual notions and terminology of many-sorted first-order logic with
equality (denoted by ≈). Let Σ be a signature consisting of a set Σ s of sort symbols
and a set Σ f of interpreted (and sorted) function symbols. Each function symbol f
has a sort τ1 × ... × τn → τ , with arity n ≥ 0 and τ1 , ..., τn , τ ∈ Σ s . We assume that
Σ includes a Boolean sort Bool and the Boolean constants (true) and ⊥ (false).
118 M. Brain et al.
We further assume the usual definition of well-sorted terms, literals, and (quanti-
fied) formulas with variables and symbols from Σ, and refer to them as Σ-terms,
Σ-atoms, and so on. For a Σ-term or Σ-formula e, we denote the free variables
of e (defined as usual) as F V(e) and use e[x] to denote that the variable x occurs
free in e. We write e[t] for the term or formula obtained from e by replacing each
occurrence of x in e by t.
A theory T is a pair (Σ, I), where Σ is a signature and I is a non-empty class
of Σ-interpretations (the models of T ) that is closed under variable reassignment,
i.e., every Σ-interpretation that only differs from an I ∈ I in how it interprets
variables is also in I. A Σ-formula ϕ is T -satisfiable (resp. T -unsatisfiable) if it
is satisfied by some (resp. no) interpretation in I; it is T -valid if it is satisfied by
all interpretations in I. We will sometimes omit T when the theory is understood
from context.
We briefly recap the terminology and notation of Brain et al. [12] which
defines an SMT-LIB theory TFP of floating-point numbers based on the IEEE-
754 2008 standard [3]. The signature of TFP includes a parametric family of
sorts Fε,σ where ε and σ are integers greater than or equal to 2 giving the
number of bits used to store the exponent e and significand s, respectively.
Each of these sorts contains five kinds of constants: normal numbers of the form
σ−1
1.s ∗ 2e , subnormal numbers of the form 0.s ∗ 2−2 −1 , two zeros (+0 and −0),
two infinities (+∞ and −∞) and a single not-a-number (NaN). We assume a
map vε,σ for each sort, which maps these constants to their value in the set
R∗ = R ∪ {+∞, −∞, NaN}. The theory also provides a rounding-mode sort RM,
which contains five elements {RNE, RNA, RTP, RTN, RTZ}.
Table 1 lists all considered operators and predicate symbols of theory TFP .
√
The theory contains a full set of arithmetic operations {|. . .|, +, −, ·, ÷, , max,
min} as well as rem (remainder), rti (round to integral) and fma (combined mul-
tiply and add with just one rounding). The precise semantics of these operators
is given in [12] and follows the same general pattern: vε,σ is used to project the
arguments to R∗ , the normal arithmetic is performed in R∗ , then the rounding
mode and the result are used to select one of the adjoints of vε,σ to convert
the result back to Fε,σ . Note that the full theory in [12] includes several addi-
tional operators which we omit from discussion here, such as floating-point min-
imum/maximum, equality with floating-point semantics (fp.eq), and conversions
between sorts.
Theory TFP further defines a set of ordering predicates {<, >, ≤, ≥} and a
set of classification predicates {isNorm, isSub, isInf, isZero, isNaN, isNeg, isPos}. In
the following, we denote the rounding mode of an operation above the operator
RTZ
symbol, e.g., a + b adds a and b and rounds the result towards zero. We use the
infix operator style for isInf (. . . ≈ ±∞), isZero (. . . ≈ ±0), and isNaN (. . . ≈
NaN) for conciseness. We further use minn /maxn and mins /maxs for floating-
point constants representing the minimum/maximum normal and subnormal
numbers, respectively. We will omit rounding mode and floating-point sorts if
they are clear from the context.
Invertibility Conditions for Floating-Point Formulas 119
Table 2 lists the invertibility conditions for equality with the operators
√
{+, −, ·, ÷, rem, , |. . .|, −, rti}, parameterized over a rounding mode R (one of
RNE, RNA, RTP, RTN, or RTZ). Note that operators {+, ·} and the multiplica-
tive step of fma are commutative, and thus the invertibility conditions for both
variants are identical.
Each of the first six invertibility conditions in this table follows a pattern. The
first two disjuncts are instances of the literal to solve for, where a term involving
rounding modes RTP and RTN is substituted for x. These disjuncts are then
followed by disjuncts for handling special cases for infinity and zero. From the
structure of these conditions, e.g., for +, we can derive the insight that if there
R
is a solution for x in the equation x + s ≈ t and we are not in a corner case where
RTP RTN
s = t, then either t − s or t − s must be a solution. Based on extensive runs of our
syntax-guided synthesis procedure, we believe this condition is close to having
minimal term size. From this, we conclude that an efficient yet complete method
R
for solving x + s ≈ t checks whether t − s rounding towards positive or negative
is a solution in the non-trivial case when s and t are disequal, and otherwise
concludes that no solution exists. A similar insight can be derived for the other
invertibility conditions of this form.
1
Available at https://cvc4.cs.stanford.edu/papers/CAV2019-FP.
Invertibility Conditions for Floating-Point Formulas 121
R
We found that t is a conditional inverse for the case of rti(x) ≈ t and
x rem s ≈ t, that is, substituting t for x is an invertibility condition. For the
latter, we discovered an alternative invertibility condition:
RTP RTN
|t + t| ≤ |s| ∨ |t + t| ≤ |s| ∨ ite(t ≈ ±0, s ≈ ±0, t ≈ ±∞) (1)
In contrast to the condition from Table 2, this version does not involve rem.
It follows that certain applications of floating-point remainder, including those
whose first argument is an unconstrained variable, can be eliminated based on
this equivalence. Interestingly, for s rem x ≈ t, we did not succeed in finding an
invertibility condition. This case appears to not admit a concise solution; we
discuss further details below.
Table 3 gives the invertibility conditions for ≥. Since these constraints admit
more solutions, they typically have simpler invertibility conditions. In particular,
with the exception of rem, all conditions only involve floating-point classifiers.
When considering literals with predicates, the invertibility conditions for
cases involving x + s and s − x are identical for every predicate and rounding
mode. This is due to the fact that s − x is equivalent to s + (−x), indepen-
dent from the rounding mode. Thus, the negation of the inverse value of x for
an equation involving x + s is the inverse value of x for an equation involving
s − x. Similarly, the invertibility conditions for x · s and s ÷ x over predicates
{<, ≤, >, ≥, isInf, isNaN, isNeg, isZero} are identical for all rounding modes.
For all predicates except {≈, isNorm, isSub}, the invertibility conditions for
operators {+, −, ÷, ·} contain floating-point classifiers only. All of these condi-
tions are also independent from the rounding mode. Similarly, for operator fma
over predicates {isInf, isNaN, isNeg, isPos}, the invertibility conditions contain
only floating-point classifiers. All of these conditions except for isNeg(fma(x, s, t))
and isPos(fma(x, s, t)) are also independent from the rounding mode.
For all floating-point operators with predicate isNaN, the invertibility condi-
tion is , i.e., an inverse value for x always exists. This is due to the fact that
every floating-point operator returns NaN if one of its operands is NaN, hence
NaN can be picked as an inverse value of x. Conversely, we identified four cases
for which the invertibility condition is ⊥, i.e., an inverse value for x never exists.
These four cases are isNeg(|x|), isInf(x rem s), isInf(s rem x), and isSub(rti(x)). For
the first three cases, it is obvious why no inverse value exists. The intuition for
isSub(rti(x)) is that integers are not subnormal, and as a result if x is rounded to
an integer it can never be a subnormal number. All of these cases can be easily
implemented as rewrite rules in an SMT solver.
For operator fma, the invertibility conditions over predicates {isInf, isNaN,
isNeg, isPos} contain floating-point classifiers only. For predicate isZero, the
invertibility conditions are more involved. Equations (2) and (3) show the invert-
ibility conditions for isZero(fma(x, s, t)) and isZero(fma(s, t, x)) for all rounding
modes R.
R RTP R RTN
fma(−(t ÷ s), s, t) ≈ ±0 ∨ fma(−(t ÷ s), s, t) ≈ ±0 ∨ (s ≈ ±0 ∧ t ≈ ±0) (2)
R R
RTP RTN
fma(s, t, −(s · t)) ≈ ±0 ∨ fma(s, t, −(s · t)) ≈ ±0 (3)
These two invertibility conditions contain case splits similar to those in Table 2 and
RTP R RTP
indicate that, e.g., −t ÷ s is an inverse value for x when fma(−(t ÷ s), s, t) ≈ ±0
holds.
As we will describe in Sect. 4, an important aspect of synthesizing these
invertibility conditions was considering their visualizations. This helped us deter-
mine which invertibility conditions were relatively simple and which exhibited
complex behavior.
s s s s
Fig. 1. Invertibility conditions for {+, ·, ÷} over ≈ for F3,5 and rounding mode RNE.
Invertibility Conditions for Floating-Point Formulas 123
s s
t t
s s s s
(a) x rem s > t (b) x rem s ≥ t (c) s rem x > t (d) s rem x ≥ t
s s s s
(a) fma(x, s, t) ≈ ±0 (b) fma(s, t, x) ≈ ±0 (c) isSub(fma(x, s, t)) (d) isSub(fma(s, t, x))
Fig. 4. Invertibility conditions for fma over {isZero, isSub} for F3,5 and rnd. mode RNE.
for some rounding mode R. In other words, this conjecture states that IC(s, t)
holds exactly when there exists an x that, when rounding the result of adding x
to s according to mode R, yields t. Furthermore, we are interested in finding a
solution for IC that holds independently of the format of x, s, t. Note that SMT
solvers are not capable of reasoning about constraints that are parametric in the
floating-point format. To address this challenge, following the methodology from
previous work [26], our strategy for establishing (general) invertibility conditions
first solves the synthesis conjecture for a fixed format Fε,σ , and subsequently
checks whether that solution also holds for other formats. The choice of the
number of exponent bits ε and significand bits σ in Fε,σ balances two criteria:
1. ε, σ should be large enough to exercise many (or all) of the behaviors of the
operators and relations in our synthesis conjecture,
2. ε, σ should be small enough for the synthesis problem to be tractable.
In our experience, the best choices for (ε, σ) depended on the particular invert-
ibility condition we were solving. The most common choices for (ε, σ) were (3, 5),
(4, 5) and (4, 6). For most two-dimensional invertibility conditions (those that
involve two variables s and t), we used (3, 5), since the required synthesis pro-
cedures mentioned below were roughly eight times faster than for (4, 5). For
one-dimensional invertibility conditions, we often used higher precision formats.
Since floating-point operators like addition take as additional argument a round-
ing mode R, we assumed a fixed rounding mode when solving, and then cross-
checked our solution for multiple rounding modes.
126 M. Brain et al.
398.5 for F4,6 ). Despite the heavy cost of this step, it was crucial for accelerating
our framework for synthesizing invertibility conditions, described next.
cex-guided sampling
Samples
filter
SyGuS Grammar Verifier
Side Condition
Fig. 5. Architecture for synthesizing invertibility conditions for floating point formulas.
For instance, for some cases it was very easy to find invertibility conditions that
held when both s and t were normal (resp., subnormal), but very difficult when
s was normal and t was subnormal or vice versa.
We also implemented a fully automated mode for the synthesis loop in Fig. 5.
However, in practice, it was more effective to tweak the generated solutions
manually. The amount of user interaction was not prohibitively high in our
experience.
Finally, we found that it was often helpful to visualize the input/output
behavior of candidate solutions. In many cases, the difference between a candi-
date solution and the desired behavior of the invertibility condition would reveal
a required modification to the grammar or would suggest which parts of the
domain of the conjecture to focus on.
Fig. 6. Recursive procedure QEFP for computing quantifier elimination for x in the unit
linear formula ∃x. P (t1 , . . . , tj [x], . . . , tn ). The free variables in this formula and the
fresh variable y are implicitly universally quantified. Placeholder denotes a floating-
point operator from Table 1.
Invertibility Conditions for Floating-Point Formulas 131
3786 problems (116 ∗ 5 + 513 for each floating-point format) and checked them
using CVC4 [5] (master 546bf686) and Z3 [16] (version 4.8.4).
We consider an invertibility condition to be verified for a floating-point format
and rounding mode if at least one solver reports unsatisfiable. Given a CPU time
limit of one hour and a memory limit of 8 GB for each solver/benchmark pair, we
were able to verify 3577 (94.5%) invertibility conditions overall, with 99.2% of
F3,5 , 99.7% of F4,5 , 100% of F4,6 , 93.8% of F5,11 , 90.2% of F8,24 , and 84% of F11,53 .
This verification with CVC4 and Z3 required a total of 32 days of CPU time.
All verification jobs were run on cluster nodes with Intel Xeon E5-2637 3.5 GHz
and 32 GB memory.
where y1 and y2 are fresh variables. The third conjunct is trivially equivalent
to . This formula is quantifier-free and has the properties specified by the
following theorem.
Theorem 1. Let ∃x. P be a unit linear formula and let I be a model of TFP .
Then, I satifies ¬∃x. P if and only if there exists a model J of TFP (constructible
from I) that satisfies ¬QEFP (∃x. P ).
3
116 invertibility conditions from rounding mode dependent operators and 51 invert-
ibility conditions where the operator is rounding mode independent (e.g., rem).
132 M. Brain et al.
Niemetz et al. [26] present a similar algorithm for solving unit linear bit-vector
literals. In that work, a counterexample-guided loop was devised that made
use of Hilbert-choice expressions for representing quantifier instantiations. In
contrast to that work, we provide here only a quantifier elimination procedure.
Extending our techniques to a general quantifier instantiation strategy is the
subject of ongoing work. We discuss our preliminary work in this direction in
the next section.
7 Conclusion
References
1. Alur, R., et al.: Syntax-guided synthesis. In: Formal Methods in Computer-Aided
Design, FMCAD 2013, Portland, 20–23 October 2013, pp. 1–8. IEEE (2013).
http://ieeexplore.ieee.org/document/6679385/
134 M. Brain et al.
2. Alur, R., Radhakrishna, A., Udupa, A.: Scaling enumerative program synthesis via
divide and conquer. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol.
10205, pp. 319–336. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-
662-54577-5 18
3. IEEE Standards Association 754-2008 - IEEE standard for floating-point arith-
metic (2008). https://ieeexplore.ieee.org/servlet/opac?punumber=4610933
4. Barr, E.T., Vo, T., Le, V., Su, Z.: Automatic detection of floating-point exceptions.
SIGPLAN Not. 48(1), 549–560 (2013)
5. Barrett, C., et al.: CVC4. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011.
LNCS, vol. 6806, pp. 171–177. Springer, Heidelberg (2011). https://doi.org/10.
1007/978-3-642-22110-1 14
6. Barrett, C., Stump, A., Tinelli, C.: The satisfiability modulo theories library (SMT-
LIB) (2010). www.SMT-LIB.org
7. Ben Khadra, M.A., Stoffel, D., Kunz, W.: goSAT: floating-point satisfiability as
global optimization. In: FMCAD, pp. 11–14. IEEE (2017)
8. Bjørner, N., Janota, M.: Playing with quantified satisfaction. In: 20th International
Conferences on Logic for Programming, Artificial Intelligence and Reasoning -
Short Presentations, LPAR 2015, Suva, 24–28 November 2015, pp. 15–27 (2015)
9. Blum, L., Blum, M., Shub, M.: A simple unpredictable pseudo-random number
generator. SIAM J. Comput. 15(2), 364–383 (1986)
10. Brain, M., Dsilva, V., Griggio, A., Haller, L., Kroening, D.: Deciding floating-
point logic with abstract conflict driven clause learning. Formal Methods Syst.
Des. 45(2), 213–245 (2014)
11. Brain, M., Schanda, F., Sun, Y.: Building better bit-blasting for floating-point
problems. In: Vojnar, T., Zhang, L. (eds.) TACAS 2019, Part I. LNCS, vol. 11427,
pp. 79–98. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17462-0 5
12. Brain, M., Tinelli, C., Rümmer, P., Wahl, T.: An automatable formal semantics
for IEEE-754 floating-point arithmetic. In: 22nd IEEE Symposium on Computer
Arithmetic, ARITH 2015, Lyon, 22–24 June 2015, pp. 160–167. IEEE (2015)
13. Brillout, A., Kroening, D., Wahl, T.: Mixed abstractions for floating-point arith-
metic. In: FMCAD, pp. 69–76. IEEE (2009)
14. Conchon, S., Iguernlala, M., Ji, K., Melquiond, G., Fumex, C.: A three-tier strategy
for reasoning about floating-point numbers in SMT. In: Majumdar, R., Kunčak, V.
(eds.) CAV 2017. LNCS, vol. 10427, pp. 419–435. Springer, Cham (2017). https://
doi.org/10.1007/978-3-319-63390-9 22
15. Daumas, M., Melquiond, G.: Certification of bounds on expressions involving
rounded operators. ACM Trans. Math. Softw. 37(1), 1–20 (2010)
16. De Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R.,
Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg
(2008). https://doi.org/10.1007/978-3-540-78800-3 24
17. Dutertre, B.: Solving exists/forall problems in yices. In: Workshop on Satisfiability
Modulo Theories (2015)
18. Fu, Z., Su, Z.: XSat: a fast floating-point satisfiability solver. In: Chaudhuri, S.,
Farzan, A. (eds.) CAV 2016. LNCS, vol. 9780, pp. 187–209. Springer, Cham (2016).
https://doi.org/10.1007/978-3-319-41540-6 11
19. Heizmann, M., et al.: Ultimate automizer with an on-demand construction of
Floyd-Hoare automata. In: Legay, A., Margaria, T. (eds.) TACAS 2017, Part II.
LNCS, vol. 10206, pp. 394–398. Springer, Heidelberg (2017). https://doi.org/10.
1007/978-3-662-54580-5 30
20. Lapschies, F.: SONOLAR, the solver for non-linear arithmetic (2014). http://www.
informatik.uni-bremen.de/agbs/florian/sonolar
Invertibility Conditions for Floating-Point Formulas 135
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Numerically-Robust Inductive Proof
Rules for Continuous Dynamical Systems
1 Introduction
Infinite-time stability and safety properties of continuous dynamical systems are
typically established via inductive arguments over continuous time. For instance,
proving stability of a dynamical system is similar to proving termination of a
program. A system is stable at the origin in the sense of Lyapunov, if one can
find a Lyapunov function (essentially a ranking function) that is everywhere pos-
itive except for reaching exactly zero at the origin, and never increases over time
along the direction of the system dynamics [11]. Likewise, proving unbounded
safety of a dynamical system requires one to find a barrier function (or differ-
ential invariant [19]) that separates the system’s initial state from the unsafe
regions, and whenever the system states reach the barrier, the system dynam-
ics always points towards the safe side of the barrier [21]. In both cases, once
a candidate certificate (Lyapunov or barrier functions) is proposed, the verifi-
cation problem is reduced to checking the validity of a universally-quantified
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 137–154, 2019.
https://doi.org/10.1007/978-3-030-25543-5_9
138 S. Gao et al.
first-order formula over real-valued variables. The standard approaches for the
validation step use symbolic quantifier elimination [4] or Sum-of-Squares tech-
niques [17,18,24]. However, these algorithms are either extremely expensive or
numerically brittle. Most importantly, they can not handle systems with non-
polynomial nonlinearity, and thus fall short of a general framework for verifying
practical systems of significant complexity.
The standard approach of checking invariance conditions in program anal-
ysis is to use Satisfiability Modulo Theories (SMT) solvers [16]. However, to
check the inductive conditions for nonlinear dynamical systems, one has to solve
nonlinear SMT problems over real numbers, which are highly intractable or
undecidable [23]. Recent work on numerically-driven decision procedures pro-
vides a promising direction to bypass this difficulty [5,6]. They have been used
for many bounded-time verification and synthesis problems for highly nonlinear
systems [12]. However, the fundamental challenge with using numerically-driven
methods in inductive proofs is that numerical errors make it impossible to verify
the induction steps in the standard sense. Take the Lyapunov analysis of stability
properties as an example. A dynamical system is stable if there exists a func-
tion that vanishes exactly at the origin and its derivatives strictly decreases over
time. Since any numerical error blurs the difference between strict and non-strict
inequality, one can conclude that numerically-driven methods are not suitable
for verifying these strict constraints. However, proving a system is stable within
an arbitrarily tiny neighborhood around the origin is all we really need in prac-
tice. Thus, there is a discrepancy between what the standard theory requires
and what is needed in practice, or what can be achieved computationally. To
bridge this gap, we need to rethink about the fundamental definitions.
In this paper, we formulate new inductive proof rules for continuous dynam-
ical systems for establishing robust notions of stability and safety. These proof
rules are practically useful and computationally certifiable in a very general
sense. For instance, for stability, we define the notion of ε-stability that requires
the system to be stable within an ε-bounded distance from the origin, instead of
exactly at the origin. When ε is small enough, ε-stable systems are practically
indistinguishable from stable systems. We then define the notion of ε-Lyapunov
functions that are sufficient for establishing ε-stability. We then rigorously prove
that the ε-Lyapunov conditions are numerically stable and can be correctly deter-
mined by δ-complete decisions procedures for nonlinear real arithmetic [7]. In this
way, we can rely on various numerically-driven SMT solvers to establish a sound
and relative-complete proof systems for unbounded stability and safety prop-
erties of highly nonlinear dynamical systems. We believe these new definitions
have eliminated the core difficulty for reasoning about infinite-time properties of
nonlinear systems, and will pave the way for adapting a wide range of automated
methods from program analysis to continuous and hybrid systems. In short, the
paper makes the following contributions:
2 Background
2.1 Dynamical Systems
Throughout the paper, we use the following definition of an n-dimensional
autonomous dynamical system:
dx(t)
= f (x(t)), x(0) ∈ init and ∀t ∈ R≥0 , x(t) ∈ D, (1)
dt
140 S. Gao et al.
where an open set D ⊆ Rn is the state space, init ⊆ D is a set of initial states, and
f : D → Rn is a vector field specified by Lipschitz-continuous functions on each
dimension. For notational simplicity, all variable and function symbols can rep-
resent vectors. When vectors are used in logic formulas, they represent conjunc-
tions of the formulas for each dimension. For instance, when x = (x1 , . . . , xn ),
we write x = 0 to denote the formula x1 = 0 ∧ · · · ∧ xn = 0. For any system
defined by (1), we write its solution function as
t
F : D × R≥0 → R , F (x(0), t) = x(0) +
n
f (x(s))ds. (2)
0
Note that F usually does not have an analytic form. However, since f is Lipschitz-
continuous, F exists and is unique. We will often use Lie derivatives to measure
the change of a scalar function along the flow defined by another vector field:
We will make extensive use of first-order formulas over real numbers with Type 2
computable functions [25] to express and infer properties of nonlinear dynamical
systems. Definition 2 introduces the syntax of these formulas.
– true : ϕ is true.
– δ-false : ϕ+δ is false.
When the two cases overlap, either decision can be returned.
It follows that if ϕ is δ-robust, then a δ-complete decision procedure can
correctly determine the truth value of ϕ.
3.2 Epsilon-Stability
The standard definitions of stability requires a system to stabilize within arbi-
trarily small neighborhoods around the origin. However, very small neighbor-
hoods are practically indistinguishable from the origin. Thus, it is practically
sufficient to prove that a system is stable within some sufficiently small neigh-
borhood. We capture this intuition by making a minor change to the standard
definition, by simply putting a lower bound ε on the τ parameter in Definition 5.
As a result, the system is required to exhibit the same behavior as standard sta-
ble systems outside the ε-ball, but can behave arbitrarily within the ε-ball (for
instance, oscillate around the origin). The formal definition is as follows:
142 S. Gao et al.
In words, for any τ ≥ ε, there exists δ such that all trajectories that start within
the δ-ball will stay within a τ -ball around the origin.
Note that the only difference with the standard definition is that τ is bounded
from below by a positive ε instead of 0. The definition is depicted in Fig. 1c, which
shows the difference with the standard notion in Fig. 1a. Since the only difference
with the standard definition is the lower bound on the universally quantified τ ,
it is clear that ε-stability is strictly weaker than standard stability.
Proposition 2. For any ε ∈ R+ , Stable(f ) → Stableε (f ).
Thus, any system that is stable in the standard definition is also ε-stable for
any ε ∈ R+ . On the other hand, one can always choose small enough ε such
that an ε-stable system is practically indistinguishable from stable systems in
the standard definition.
1. Outside the ε-ball, there is some positive lower bound on the value of V .
Namely, there exists α ∈ R+ such that for any x ∈ D \ Bε , V (x) ≥ α.
2. Inside the ε-ball, there is a strictly smaller ε -ball in which the value of V
is bounded from above, to create a gap with its values outside the ε-ball.
Formally, there exists ε ∈ (0, ε) and β ∈ (0, α) such that for all x ∈ Bε ,
V (x) ≤ β.
3. The Lie derivative of V is strictly negative outside of Bε . Formally, there
exists γ ∈ R+ such that for all x ∈ D \ Bε , the Lie derivative of V along f
satisfies ∇f V (x) ≤ −γ.
In sum, the three conditions can be expressed with the following LRF -formula:
Remark 1. The logical structure of LFε (f, V ) is seemingly more complex than
the standard Lyapunov conditions in Definition 6 because of the extra existen-
tial quantification. In Theorem 3, we show that it does not add computational
complexity in checking the conditions.
The key result is that the conditions for an ε-Lyapunov function are sufficient
for establishing ε-stability.
We know V (x(t1 )) ≤ β < α ≤ V (x(t2 )) and hence V (x(t1 )) < V (x(t2 )) are both
true; however, this is in contradiction
with
the mean value theorem and the fact
that Bε ⊂ D and ∀D\Bε x ∇f V (x) < −γ .
144 S. Gao et al.
Remark 2. Proof of Theorem 1 shows that once state of the system enters Bε ,
it never leaves Bε . However, it would be still possible for the state to leave Bε .
One the other hand, since closure of Bε \ Bε is bounded, and for every x in this
area, V is continuous at x and ∇f V (x) ≤ −γ, no trajectory can be trapped in
the closure of Bε \ Bε . Therefore, even though state of the system might leave
Bε , it will visit inside of this ball infinitely often.
Example 1. Consider the time-reversed Van der Pol system given by the follow-
ing dynamics. Figure 3 shows the vector field of this system around the origin.
ẋ1 −x2
=
ẋ2 (x21 − 1)x2 + x1
We now prove that unlike the conventional conditions, the new inductive proof
rules are numerically robust. It follows that δ-decision procedures provide a
sound and relative-complete proof system for establishing the conditions in the
following sense:
Lemma 1. For any ε ∈ R+ , there exists δ ∈ Q+ such that LFε (f, V ) is δ-robust.
Note that if a formula φ is δ-robust then for every δ ∈ (0, δ), φ is δ -robust
as well. The soundness and relative-completeness then follow naturally.
Proof. Fix an arbitrary ε ∈ R+ for which LFε (f, V ) is true. Let φ := LFε (f, V ),
and using Lemma 1, let δ ∈ Q+ be such that φ is δ-robust. Since φ is true, we
conclude φ+δ is true as well. Using Definition 4, no δ-complete decision procedure
can return δ-false for φ.
LFε (f, V ) ↔ ∃(0,ε) ε sup V (x) < inf V (x) ∧ sup ∇f V (x) < 0
x∈Bε x∈D\Bε x∈D\Bε
Note that in this form the universal quantification is implicit in the sup and inf
operators. In this way, the formula is existentially quantified on only ε , which
can then be handled by binary search. This is an efficient way of checking the
conditions in practice. We also remark that without this method, the original
formulation with multiple parameters can be directly solved as ∃∀-formulas as
well using more expensive algorithms [13].
In this section, we define two types of ε-barrier functions that are robust to
numerical perturbations.
Proving unbounded safety requires the use of barrier functions. The idea is
that if one can find a barrier function that separates initial conditions from the
set of unsafe states, such that no trajectories can cross the barrier from the safe
to the unsafe side, then the system is safe. Here we use a formulation similar
to the that of Prajna [21]. The standard conditions on barrier functions include
constraints on the vector field of the system at the exact boundary of the barrier
set, which introduces robustness problems. We show that it is possible to avoid
these problems using two different formulations, which we call Type 1 and Type 2
ε-barrier functions. Type 1 ε-barrier functions strengthen the original definition
and requires strict contraction of the barrier. Instead of only asking the system to
be contractive exactly on the barrier’s border, we force it to be contractive when
reaching any state within a small distance from the border. Type 2 ε-barrier
functions allow the system to escape the barrier for a controllable distance and
a limited period of time. It should then return to the interior of the safe region.
Type 1 ε-barriers can be seen as a subclass of Type 2 ε-barriers. The benefit
for allowing bounded escape is that the shape of the barrier no longer needs
146 S. Gao et al.
to be an invariant set, which can be particularly helpful when the shape of the
system invariants cannot be determined or expressed symbolically. The down-
side to Type 2 ε-barriers is that checking the corresponding conditions requires
integration of the dynamics, which can be expensive but can still be handled
by δ-complete decision procedures. The intuition behind the two definitions is
shown in Fig. 2 and will be explained in detail in this section.
Before formally introducing robust safety and ε-barrier functions, we define the
safety and barrier functions first. It is easy to see that the robustness problem
with the barrier functions is similar to that of Lyapunov functions: if the bound-
ary is exactly separating the safe and unsafe regions then the inductive conditions
are not robust, since deviations in the variables by even a small amount from
the barrier will make it impossible to complete the proof.
It should be intuitively clear from the definition that the existence of ε-barrier
functions is sufficient for establishing invariants and safety properties. The new
requirement is that the system stays robustly within the barrier, by the area
defined by −ε ≤ B(x) ≤ 0.
Theorem 4. For any ε ∈ R+ , Barrierε (f, init, B) → Safe(f, init, B).
Proof. Assume Barrierε (f, init, B) is true. It is easy to see Barrier(f, init, B +ε), as
specified in Definition 10, is also true. Therefore, using Proposition 3, we know
Safe(f, init, B + ) and hence Safe(f, init, B) are both true.
It is clear that there is room for numerically perturbing the size of the area
and still obtaining a robust proof. The proof is similar to the one for Lemma 1
as shown in [8].
Theorem 5. For any ε ∈ R+ , there exists δ ∈ Q+ such that Barrierε (f, init, B)
is a δ-robust formula.
Example 2 (Type 1 ε-Barrier for timed-reversed Van der Pol). Consider the
time-reversed Van der Pol system introduced in Example 1. We use the same
example to demonstrate the effect of numerical errors in proving barrier cer-
tificates. The level sets of the Lyapunov functions in the stable region are bar-
rier certificates; however, for the barriers that are very close to the limiting
cycle, numerical sensitivity becomes a problem. In experiments, when ε = 10−5
and δ = 10−4 , we can verify that the level set z T P z = 90, is a Type 1 ε-
barrier. Table 2 lists parameters used in this proof. Figure 3 (Left) shows the
direction field for the timed-reversed Van der Pol dynamics, the border of the
set z T P z ≤ 90, which we prove is a type 1 ε-barrier, and the boundary of set
z T P z ≤ 110, which is clearly not a barrier, since it is outside of the limit cycle.
148 S. Gao et al.
1.5
2.5
1
2
1.5
0.5
1
0.5 0 init
x2
x2
0
-0.5
-0.5
-1 -1 Vector Field
B(x) = 1.0 Levelset
-1.5
Vector Field B(x) = -0.1 Levelset
-1.5
-2 Limit Cycle Unsafe B(x) = 0.0 Levelset
z'Pz=90 Forward Image of B(x)=0 Levelset at t=0.14
-2.5 z'Pz=110 Forward Image of B(x)=0 Levelset at t=0.28
-2 Forward Image of B(x)=0 Levelset at t=0.42
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5
x1 x1
Fig. 3. (Left) Van der pol example (Right) Type 2 barrier example
The conditions for ε-Lyapunov and ε-barrier functions look very similar, but
there is an important difference. In the case of Lyapunov functions, we do not
evaluate the Lie derivative of the balls. Thus, the balls do not define barrier sets.
On the other hand, the level sets of Lyapunov functions always define barriers.
Remark 3. The ε-barrier functions can also be used as a sufficient condition for
ε-stability, if a barrier can be found within the ε-ball required in ε-stability.
– (Bounded Escape) Before reaching back to the invariant set, we allow the
system to step outside the invariant, but only up to a bounded distance from
the boundary.
Proof. For the purpose of contradiction, suppose starting from x0 ∈ init, the
system is unsafe. Using continuity of the barrier B and the solution function F ,
let t ∈ R≥0 be a time at which B(x(t)) = 0, where x(t) is by definition F (x0 , t).
By the 1st property in Definition 12, we know B(x0 ) ≤ −ε < 0. Using continuity
of B and F , let t ∈ [0, t) be the supremum of all times at which B(x(t )) = −ε.
By the 3rd property in Definition 12, we know t−t > T , and by the 2nd property
in Definition 12, we know B(x(t + T )) ≤ −ε < −ε. Using continuity of B and
F , we know there is a time t ∈ (t + T, t) at which B(x(t )) = −ε. However,
this is in contradiction with t being the supremum.
Theorem 7. For any ε ∈ R+ , there exists δ ∈ Q+ such that BarrierT,ε (f, init, B)
is a δ-robust formula.
150 S. Gao et al.
Example 3. We use this example to show how Type 2 ε-barriers can be used to
establish safety. Consider the following system.
ẋ1 −0.1 −10 x1
=
ẋ2 4 −2 x2
Let init be the set {x | −0.1 ≤ x1 ≤ 0.1, −0.1 ≤ x2 ≤ 0.1}, and let U , the unsafe
set, be the set {x | −2.0 ≤ x1 ≤ −1.1, −2.0 ≤ x2 ≤ −1.1}. The system is stable
and safe with respect to the designated unsafe set. However, the safety cannot
be shown using any invariant of the form B(x) := x21 + x22 − c ≤ 0, where c ∈ Q+
is a constant, in the standard definition. This is because the vector field on the
boundary of such sets do not satisfy the inductive conditions. Nevertheless, we
can show that for c = 1, B(x) is a Type 2 ε-barrier. The dReal query verifies the
conditions with ε = 0.1. Since U (x) → B(x) > and init(x) → B(x) < −ε , we
know that the system cannot reach any unsafe states. Figure 3 (Right), illustrates
the example. The green set at the center represents init, and the red set represents
unsafe set U . The B(x) = 0 level set is not invariant, as evidenced in the figure
by the forward images at t = 0.14 and t = 0.28 leaving the set; however, as
the dReal query proves, the reachable set over 0 ≤ t ≤ 10 does not leave the
B(x) = 1.0 level set and is completely contained in the B(x) = −0.1 level set by
t = 0.4. Since U (x) → B(x) > 1.0 and init(x) → B(x) < −0.1, then the system
cannot reach any state in U .
5 Experiments
Table 1. Results for the ε-Lyapunov functions. Each Lyapunov function is of the
form z T P z, where z is a vector of monomials over the state variables. We report the
constant values satisfying the ε-Lyapunov conditions, and the time that verification of
each example takes (in seconds).
Table 2. Results for the ε-barrier functions. Each barrier function B(x) is of the form
z T P z − , where z is a vector of monomials over x. We indicate the highest degree of
the monomials used in z, the size of the P , the level used for each barrier function,
and the value of ε and γ used to the check ∇f B(x) < −γ.
matrix obtained using simulation-guided techniques from [10]. All the P matrices
are given in [8].
Time-Reversed Van der Pol. The time-reversed Van der Pol system has been
used as an example in the previous sections. Figure 3 (Left) shows the direction
field of this system around the origin. Using dReal with δ := 10−25 , we are able
to establish a 10−12 -Lyapunov function and a 10−5 -barrier function.
Normalized Pendulum. A standard pendulum system has continuous dynam-
ics containing a transcendental function, which causes difficulty for many tech-
niques. Here, we consider a normalized pendulum system with the follow-
ing dynamics, in which x1 and x2 represent angular position and velocity,
respectively. In our experiment, using δ = 10−50 , we can prove that function
V := xT P x is ε-Lyapunov, where ε := 10−12 .
ẋ1 x2
= (3)
ẋ2 − sin(x1 ) − x2
Using δ := 0.01, we are able to prove that for any value ∈ [0.1, 10], the function
B(x) := xT P x − , with x being the system state, and P a constant matrix given
in [8], is a Type 1 0.01-barrier function.
Moore-Greitzer Jet Engine. Next, we consider a simplified version of the
Moore-Greitzer model for a jet engine. The system has the following dynamics,
in which x1 and x2 are states related to mass flow and pressure rise.
ẋ1 −x2 − 32 x21 − 12 x31
= (4)
ẋ2 3x1 − x2
i̇ = c15 (r − c16 )
which followed the detailed description of the model and the constant parameter
values in [10]. We verified that there exists a function of the form B(x) = z T P z −
0.01 (z consist of 14 monomials with a maximum degree of 2), where ∇f B(x) <
−γ, when B(x) = −ε.
6 Conclusion
We formulated new inductive proof rules for stability and safety for dynamical
systems. The rules are numerically robust, making them amenable to verification
using automated reasoning tools such as those based on δ-decision procedures.
We presented several examples demonstrating the value of the new approach,
including safety verification tasks for highly nonlinear systems. The examples
show that the framework can be used to prove stability and safety for examples
that were out of reach for existing tools. The new framework relies on the ability
to generate reasonable candidate Lyapunov functions, which are analogous to
ranking functions from program analysis. Future work will include improved
techniques for efficiently generating the ε-Lyapunov and ε-barrier functions and
related theoretical questions.
Acknowledgement. Our work is supported by the United States Air Force and
DARPA under Contract No. FA8750-18-C-0092, AFOSR No. FA9550-19-1-0041, and
the National Science Foundation under NSF CNS No. 1830399. Any opinions, find-
ings and conclusions or recommendations expressed in this material are those of the
author(s) and do not necessarily reflect the views of the United States Air Force and
DARPA.
Numerically-Robust Proof Rules for Continuous Dynamical Systems 153
References
1. Bak, S.: t-Barrier certificates: a continuous analogy to k-induction. In: IFAC Con-
ference on Analysis and Design of Hybrid Systems (2018)
2. Bernfeld, S.R., Lakshmikantham, V.: Practical stability and Lyapunov functions.
Tohoku Math. J. (2) 32(4), 607–613 (1980)
3. Bobiti, R., Lazar, M.: A delta-sampling verification theorem for discrete-time, pos-
sibly discontinuous systems. In: HSCC (2015)
4. Collins, G.E.: Quantifier elimination for real closed fields by cylindrical algebraic
decompostion. In: Brakhage, H. (ed.) GI-Fachtagung 1975. LNCS, vol. 33, pp.
134–183. Springer, Heidelberg (1975). https://doi.org/10.1007/3-540-07407-4 17
5. Fränzle, M., Herde, C., Teige, T., Ratschan, S., Schubert, T.: Efficient solving of
large non-linear arithmetic constraint systems with complex boolean structure.
JSAT 1(3–4), 209–236 (2007)
6. Gao, S., Avigad, J., Clarke, E.: Delta-complete decision procedures for satisfiability
over the reals. In: Proceedings of the Automated Reasoning - 6th International
Joint Conference, IJCAR 2012, Manchester, UK, 26–29 June 2012, pp. 286–300
(2012)
7. Gao, S., Avigad, J., Clarke, E.M.: Delta-decidability over the reals. In: LICS, pp.
305–314. IEEE Computer Society (2012)
8. Gao, S., et al.: Numerically-robust inductive proof rules for continuous dynamical
systems (extended version) (2019). https://dreal.github.io/CAV19/
9. Gao, S., Kong, S., Clarke, E.M.: dReal: an SMT solver for nonlinear theories over
the reals. In: Bonacina, M.P. (ed.) CADE 2013. LNCS (LNAI), vol. 7898, pp. 208–
214. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38574-2 14
10. Kapinski, J., Deshmukh, J.V., Sankaranarayanan, S., Aréchiga, N.: Simulation-
guided Lyapunov analysis for hybrid dynamical systems. In: Hybrid Systems: Com-
putation and Control (2014)
11. Khalil, H.K.: Nonlinear Systems. Prentice Hall, Upper Saddle River (1996)
12. Kong, S., Gao, S., Chen, W., Clarke, E.: dReach: δ-reachability analysis for hybrid
systems. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 200–
205. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0 15
13. Kong, S., Solar-Lezama, A., Gao, S.: Delta-decision procedures for exists-forall
problems over the reals. In: Chockler, H., Weissenbacher, G. (eds.) CAV 2018.
LNCS, vol. 10982, pp. 219–235. Springer, Cham (2018). https://doi.org/10.1007/
978-3-319-96142-2 15
14. LaSalle, J.P., Lefschetz, S.: Stability by Liapunov’s Direct Method: With Applica-
tions. Mathematics in Science and Engineering. Academic Press, New York (1961)
15. Liberzon, D., Ying, C., Zharnitsky, V.: On almost Lyapunov functions. In: 2014
IEEE 53rd Annual Conference on Decision and Control (CDC), pp. 3083–3088,
December 2014
16. Monniaux, D.: A survey of satisfiability modulo theory. In: Gerdt, V.P., Koepf, W.,
Seiler, W.M., Vorozhtsov, E.V. (eds.) CASC 2016. LNCS, vol. 9890, pp. 401–425.
Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45641-6 26
17. Papachristodoulou, A., Prajna, S.: Analysis of non-polynomial systems using the
sum of squares decomposition. In: Henrion, D., Garulli, A. (eds.) Positive Polyno-
mials in Control. LNCIS, vol. 312, pp. 23–43. Springer, Heidelberg (2005). https://
doi.org/10.1007/10997703 2
18. Parrilo, P.: Structured semidenite programs and semialgebraic geometry methods
in robustness and optimization. Ph.D. thesis, August 2000
154 S. Gao et al.
19. Platzer, A., Clarke, E.M.: Computing differential invariants of hybrid systems as
fixedpoints. In: Gupta, A., Malik, S. (eds.) CAV 2008. LNCS, vol. 5123, pp. 176–
189. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70545-1 17
20. Podelski, A., Wagner, S.: Model checking of hybrid systems: from reachabil-
ity towards stability. In: Hespanha, J.P., Tiwari, A. (eds.) HSCC 2006. LNCS,
vol. 3927, pp. 507–521. Springer, Heidelberg (2006). https://doi.org/10.1007/
11730637 38
21. Prajna, S.: Optimization-based methods for nonlinear and hybrid systems verifica-
tion. Ph.D. thesis, California Institute of Technology, Pasadena, CA, USA (2005).
AAI3185641
22. Roohi, N., Prabhakar, P., Viswanathan, M.: Relating syntactic and semantic per-
turbations of hybrid automata. In: CONCUR, pp. 26:1–26:16 (2018)
23. Tarski, A.: A Decision Method for Elementary Algebra and Geometry, 2nd edn.
University of California Press, Berkeley (1951)
24. Topcu, U., Packard, A., Seiler, P.: Local stability analysis using simulations and
sum-of-squares programming. Automatica 44, 2669–2675 (2008)
25. Weihrauch, K.: Computable Analysis: An Introduction, 1st edn. Springer, Heidel-
berg (2013)
26. Weiss, L., Infante, E.F.: On the stability of systems defined over a finite time
interval. Proc. Nat. Acad. Sci. U.S.A. 54(1), 44 (1965)
27. Weiss, L., Infante, E.F.: Finite time stability under perturbing forces and on prod-
uct spaces. IEEE Trans. Autom. Control 12(1), 54–59 (1967)
28. Xu, X., Tabuada, P., Grizzle, J.W., Ames, A.D.: Robustness of control barrier
functions for safety critical control. IFAC-PapersOnLine 48(27), 54–61 (2015)
29. Zhai, G., Michel, A.N.: On practical stability of switched systems. In: Proceed-
ings of the 41st IEEE Conference on Decision and Control, vol. 3, pp. 3488–3493,
December 2002
30. Zhai, G., Michel, A.N.: Generalized practical stability analysis of discontinuous
dynamical systems. In: Proceedings of the 42nd IEEE Conference on Decision and
Control, vol. 2, pp. 1663–1668. IEEE (2003)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Icing: Supporting Fast-Math Style
Optimizations in a Verified Compiler
1 Introduction
Verified compilers formally guarantee that compiled machine code behaves
according to the specification given by the source program’s semantics. This
stringent requirement makes verifying “end-to-end” compilers for mainstream
languages challenging, especially when proving sophisticated optimizations that
developers rely on. Recent verified compilers like CakeML [38] for ML and
Z. Tatlock—This work was supported in part by the Applications Driving Architectures
(ADA) Research Center, a JUMP Center co-sponsored by SRC and DARPA.
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 155–173, 2019.
https://doi.org/10.1007/978-3-030-25543-5_10
156 H. Becker et al.
CompCert [24] for C have been steadily verifying more of these important opti-
mizations [39–41]. While the gap between verified compilers and mainstream
alternatives like GCC and LLVM has been shrinking, so-called “fast-math”
floating-point optimizations remain absent in verified compilers.
Fast-math optimizations allow a compiler to perform rewrites that are often
intuitive when interpreted as real-valued identities, but which may not preserve
strict IEEE 754 floating-point behavior. Developers selectively enable fast-math
optimizations when implementing heuristics, computations over noisy inputs,
or error-robust applications like neural networks—typically at the granularity
of individual source files. The IEEE 754-unsound rewrites used in fast-math
optimizations allow compilers to perform strength reductions, reorder code to
enable other optimizations, and remove some error checking [1,2]. Together these
optimization can provide significant savings and are widely-used in performance-
critical applications [12].
Unfortunately, strict IEEE 754 source semantics prevents proving fast-math
optimizations correct in verified compilers like CakeML and CompCert. Simple
strength-reducing rewrites like fusing the expression x ∗ y + z into a faster and
locally-more-accurate fused multiply-add (fma) instruction cannot be included
in such verified compilers today. This is because fma avoids an intermediate
rounding and thus may not produce exactly the same bit-for-bit result as the
unoptimized code. More sophisticated optimizations like vectorization and loop
invariant code motion depend on reordering operations to make expressions avail-
able, but these cannot be verified since floating-point arithmetic is not associa-
tive. Even simple reductions like rewriting x − x to 0 cannot be verified since
the result can actually be NaN (“not a number”) if x is NaN. Each of these cases
represent rewrites that developers would often, in principle, be willing to apply
manually to improve performance but which can be more conveniently handled
by the compiler. Verified compilers’ strict IEEE 754 source semantics similarly
hinders composing their guarantees with recent tools designed to improve accu-
racy of a source program [14,16,32], as these tools change program behavior to
reduce rounding error. In short, developers today are forced to choose between
verified compilers and useful tools based on floating-point rewrites.
The crux of the mismatch between verified compilers and fast-math lies in the
source semantics: verified compilers implement strict IEEE 754 semantics while
developers are intuitively programming against a looser specification of floating-
point closer to the reals. Developers currently indicate this perspective by pass-
ing compiler flags like --ffast-math for the parts of their code written against
this looser semantics, enabling mainstream compilers to aggressively optimize
those components. Ideally, verified compilers will eventually support such loos-
ened semantics by providing an “approximate real” data type and let the devel-
oper specify error bounds under which the compiler could freely apply any opti-
mization that stays within bounds. A good interface to tools for analyzing finite-
precision computations [11,16] could even allow independently-established formal
accuracy guarantees to be composed with compiler correctness.
Icing: Supporting Fast-Math Style Optimizations in a Verified Compiler 157
2.1 Syntax
Icing’s syntax is shown in Fig. 1. In addition to arithmetic, let-bindings and
conditionals, Icing supports fma operators, lists ([e1 . . .]), projections (e1 [n]), and
Map and Fold as primitives. Conditional guards consist of boolean constants (b),
binary comparisons (e1 e2 ), and an isNaN predicate. isNaN e1 checks whether e1
is a so-called Not-a-Number (NaN) special value. Under the IEEE 754 standard,
undefined operations (e.g., square root of a negative number) produce NaN results,
and most operations propagate NaN results when passed a NaN argument. It is
thus common to add checks for NaNs at the source or compiler level.
We use the Map and Fold primitives to show that Icing can be used to express
programs beyond arithmetic, while keeping the language simple. Language fea-
tures like function definitions or general loops do not affect floating-point com-
putations with respect to fast-math optimizations and are thus orthogonal.
The opt: scoping annotation implements one of the key features of Icing:
floating-point semantics are relaxed only for expressions under an opt: scope. In
this way, opt: provides fine-grained control both for expressions and conditional
guards.
A key feature of Icing’s design is that each rewrite can be guarded by a rewrite
precondition. We distinguish compiler rewrite preconditions as those that must
be true for the rewrite to be correct with respect to Icing semantics. Removing
a NaN check, for example, can change the runtime behavior of a floating-point
program: a previously crashing program may terminate or vice-versa. Thus a
NaN-check can only removed if the value can never be a NaN.
In contrast, an application rewrite precondition guards a rewrite that can
always be proven correct against the Icing semantics, but where a user may still
want finer-grained control. By restricting the context where Icing may fire these
rewrites, a user can establish end-to-end properties of their application, e.g.,
worst-case roundoff error. The crucial difference is that the compiler precondi-
tions must be discharged before the rewrite can be proven correct against the
Icing semantics, whereas the application precondition is an additional restriction
limiting where the rewrite is applied for a specific application.
A key benefit of this design is that rewrite preconditions can serve as an inter-
face to external tools to determine where optimizations may be conditionally
applied. This feature enables Icing to address limitations that have prevented
previous work from proving fast-math optimizations in verified compilers [5]
since “The only way to exploit these [floating-point] simplifications while pre-
serving semantics would be to apply them conditionally, based on the results
of a static analysis (such as FP interval analysis) that can exclude the prob-
lematic cases.” [5] In our setting, a static analysis tool can be used to establish
an application rewrite precondition, while compiler rewrite preconditions can be
discharged during (or potentially after) compilation via static analysis or manual
proof.
This design choice essentially decouples the floating-point static analyzer
from the general-purpose compiler. One motivation is that the compiler may per-
form hardware-specific rewrites, which source-code-based static analyzers would
generally not be aware of. Furthermore, integrating end-to-end verification of
these rewrites into a compiler would require it to always run a global static
analysis. For this reason, we propose an interface which communicates only the
necessary information.
Rewrites which duplicate matched subexpressions, e.g., distributing multi-
plication over addition, required careful design in Icing. Such rewrites can lead
to unexpected results if different copies of the duplicated expression are opti-
mized differently; this also complicates the Icing correctness proof. We show
how preconditions additionally enabled us to address this challenge in Sect. 4.
160 H. Becker et al.
For rewrite s→t at the head of rws, rewrite (rws, e) checks if s matches e,
applies the rewrite if so, and recurses. Function rewrite is used in our optimizers
in a bottom-up traversal of the AST. Icing users can specify which rewrites may
be applied under each distinct opt: scope in their code or use a default set
(shown in Table 1).
Table 1. Rewrites currently supported in Icing (◦ ∈ {+, ∗})
Constants are again defined as floating-point words and form the leaves of value
trees (variables obtain a constant value from the execution environment E). On
top of constants, value trees can represent the result of evaluating any floating-
point operation Icing supports.
The second key idea of our semantics is that it nondeterministically applies
rewrites from the configuration cfg while evaluating expression e instead of just
returning its value tree. In the semantics, we model the nondeterministic choice of
an optimization result for a particular value tree v with the relation rewritesTo,
where (cfg, v) rewritesTo r if either the configuration cfg allows for optimizations
to be applied, and value tree v can be rewritten into value tree r using rewrites
from the configuration cfg; or the configuration does not allow for rewrites to
be applied, and v = r. Rewriting on value trees reuses several definitions from
Sect. 2.2. We add the nondeterminism on top of the existing functions by making
the relation rewritesTo pick a subset of the rewrites from the configuration cfg
which are applied to value tree v.
Icing’s semantics allows optimizations to be applied for arithmetic and com-
parison operations. The rules Unary, Binary, fma, isNaN, and Compare first
evaluate argument expressions into value trees. The final result is then nonde-
terministically chosen from the rewritesTo relation for the obtained value tree
and the current configuration. Evaluation of Map, Fold, and let-bindings follows
standard textbook evaluation semantics and does not apply optimizations.
Rule Scope models the fine-grained control over where optimizations are
applied in the semantics. We store in the current configuration cfg that opti-
mizations are allowed in the (sub-)expression e (cfg with OptOk := true).
Evaluation of a conditional (if c then eT else eF ) first evaluates the condi-
tional guard c to a value tree cv. Based on value tree cv the semantics picks a
branch to continue evaluation in. This eager evaluation for conditionals (in con-
trast to delaying by leaving them in a value tree) is crucial to enable the later
simulation proof to connect Icing to CakeML which also eagerly evaluates condi-
tionals. As the value tree cv represents a delayed evaluation of a boolean value,
we have to turn it into a boolean constant when selecting the branch to con-
tinue evaluation in. This is done using the functions cTree2IEEE and tree2IEEE.
cTree2IEEE (v) computes the boolean value, and tree2IEEE (v) computes the
floating-point word represented by the value tree v by applying IEEE 754 arith-
metic operations and structural recursion.
Example. We illustrate Icing semantics and how optimizations are applied both
in syntax and semantics with the example in Fig. 3. The example first translates
the input list by 3.0 using a Map, and then computes the norm of the translated
list with Fold and sqrt.
Icing: Supporting Fast-Math Style Optimizations in a Verified Compiler 163
4 A Conditional Optimizer
We have implemented an IEEE 754 optimizer which has the same behavior as
CompCert and CakeML, and a greedy optimizer with the (observed) behavior
of GCC and Clang. The fine-grained control of where optimizations are applied
is essential for the usability of the greedy optimizer. However, in this section
we explain that the control provided by the opt annotation is often not enough.
We show how preconditions can be used to provide additional constraints on
where rewrites can be applied, and sketch how preconditions serve as an interface
between the compiler and external tools, which can and should discharge them.
We observe that in many cases, whether an optimization is acceptable or
not can be captured with a precondition on the optimization itself, and not on
every arithmetic operation separately. One example for such an optimization is
removal of NaN checks as a check for a NaN should only be removed if the check
never succeeds.
We argue that both application and compiler rewrite preconditions should
be discharged by external tools. Many interesting preconditions for a rewrite
depend on a global analysis. Running a global analysis as part of a compiler
is infeasible, as maintaining separate analyses for each rewrite is not likely to
scale. We thus propose to expose an interface to external tools in the form of
preconditions.
We implement this idea in the conditional optimizer optimizeCond that sup-
ports three different applications of fast-math optimizations: applying optimiza-
tions rws unconstrained (uncond rws), applying optimizations if precondition P
is true (cond P rws), and applying optimizations under the assumptions genera-
tion by function A which should be discharged externally (assume A rws). When
applying cond, optimizeCond checks whether precondition P is true before opti-
mizing, whereas for assume the propositions returned by A are assumed, and
should then be discharged separately by a static analysis or a manual proof.
166 H. Becker et al.
The main issue with duplicative rewrites is that they add new occurrences of
a matched subexpression. Applying (x ∗ (y + z) → x ∗ y + x ∗ z) to e1 * (2 + x)
returns e1 * 2 + e1 * x. The values for the two occurrences of e1 may differ
because of further optimizations applied to only one of it’s occurrences.
Any correctness proof for such a duplicative rewrite must match up
the two (potentially different) executions of e1 in the optimized expres-
sion (e1 * 2 + e1 * x) with the execution of e1 in the initial expression
(e1 * (2 + x)). This can only be achieved by finding a common intermedi-
ate optimization (resp. evaluation) result shared by both subexpressions of
e1 * 2 + e1 * x.
In general, existence of such an intermediate result can only be proven
for expressions that do not depend on “eager” evaluation, i.e. which consists
of let-bindings and arithmetic. We illustrate the problem using a conditional
(if c then e1 else e2). In Icing semantics, the guard c is first evaluated to a
value tree cv. Next, the semantics evaluates cv to a boolean value b using function
cTree2IEEE. Computing b from cv loses the structural information of value tree
cv by computing the results of previously delayed arithmetic operations. This
loss of information means that rewrites that previously matched the structure
of cv may no longer apply to b.
This is not a bug in the Icing semantics. On the contrary, our semantics makes
this issue explicit, while in other compilers it can lead to unexpected behavior
(e.g., in GCC’s support for distributivity under fast-math). CakeML, for exam-
ple, also eagerly evaluates conditionals and similarly loses structural information
about optimizations that otherwise may have been applied. Having lazy condi-
tionals in general would only “postpone” the issue until eager evaluation of the
conditional expression for a loop is necessary.
An intuitive compiler precondition that enables proving duplicative rewrites
is to forbid any control dependencies on the expression being optimized. How-
ever, this approach may be unsatisfactory as it disallows branching on the results
of optimized expressions and requires a verified dependency analysis that must
be rerun or incrementally updated after every rewrite, and thus could become
a bottleneck for fast-math optimizers. Instead, in Icing we restrict duplicative
rewrites to only fire when pattern variables are matched against program vari-
ables, e.g., pattern variables a, b, c only match against program variables x, y, z.
This restriction to only matching let-bound variables is more scalable, as it can
easily be checked syntactically, and allows us to loosen the restriction on control-
flow dependence by simply let-binding subexpressions as needed.
5 Connecting to CakeML
We have shown how to apply optimizations in Icing and how to use it to preserve
IEEE 754 semantics. Next, we describe how we connected Icing to an existing
verified compiler by implementing a translation from Icing source to CakeML
Icing: Supporting Fast-Math Style Optimizations in a Verified Compiler 169
6 Related Work
Verified Compilation of Floating-Point Programs. CompCert [25] uses a con-
structive formalization of IEEE 754 arithmetic [6] based on Flocq [7] which
allows for verified constant propagation and strength reduction optimizations
for divisions by powers of 2 and replacing x × 2 by x + x. The situation is similar
for CakeML [38] whose floating-point semantics is based on HOL’s [19,20]. With
Icing, we propose a semantics which allows important floating-point rewrites in
a verified compiler by allowing users to specify a larger set of possible behaviors
for their source programs. The precondition mechanism serves as an interface
to external tools. While Icing is implemented in HOL, our techniques are not
specific to higher-order logic or the details of CakeML and we believe that an
analog of our “verified fast-math” approach could easily be ported to CompCert.
The Alive framework [27] has been extended to verify floating-point peep-
hole optimizations [29,31]. While these tools relax some exceptional (NaN) cases,
2
We also extended the CakeML source semantics with an fma operation, as CakeML’s
compilation currently does not support mapping fma’s to hardware instructions.
170 H. Becker et al.
most optimizations still need to preserve “bit-for-bit” IEEE 754 behavior, which
precludes valuable rewrites like the fma introductions Icing supports.
7 Conclusion
We have proposed a novel semantics for IEEE 754-unsound floating-point com-
piler optimizations which allows them to be applied in a verified compiler setting
Icing: Supporting Fast-Math Style Optimizations in a Verified Compiler 171
and which captures the intuitive semantics developers often use today when rea-
soning about their floating-point code. Our semantics is nondeterministic in order
to provide the compiler the freedom to apply optimizations where they are useful
for a particular application and platform—but within clearly defined bounds. The
semantics is flexible from the developer’s perspective, as it provides fine-grained
control over which optimizations are available and where in a program they can
be applied. We have presented a formalization in HOL4, implemented three pro-
totype optimizers, and connected them to the CakeML verified compiler frontend.
For our most general optimizer, we have explained how it can be used to obtain
meta-theorems for its results by exposing a well-defined interface in the form of
preconditions. We believe that our semantics can be integrated fully with different
verified compilers in the future, and bridge the gap between compiler optimiza-
tions and floating-point verification techniques.
References
1. LLVM language reference manual - fast-math flags (2019). https://llvm.org/docs/
LangRef.html#fast-math-flags
2. Semantics of floating point math in GCC (2019). https://gcc.gnu.org/wiki/
FloatingPointMath
3. Becker, H., Zyuzin, N., Monat, R., Darulova, E., Myreen, M.O., Fox, A.: A verified
certificate checker for finite-precision error bounds in Coq and HOL4. In: 2018
Formal Methods in Computer Aided Design (FMCAD), pp. 1–10. IEEE (2018)
4. Blanchet, B., et al.: A static analyzer for large safety-critical software. In: PLDI
(2003)
5. Boldo, S., Jourdan, J.H., Leroy, X., Melquiond, G.: A formally-verified c compiler
supporting floating-point arithmetic. In: 2013 21st IEEE Symposium on Computer
Arithmetic (ARITH), pp. 107–115. IEEE (2013)
6. Boldo, S., Jourdan, J.H., Leroy, X., Melquiond, G.: Verified compilation of floating-
point computations. J. Autom. Reasoning 54(2), 135–163 (2015)
7. Boldo, S., Melquiond, G.: Flocq: a unified library for proving floating-point algo-
rithms in Coq. In: 19th IEEE International Symposium on Computer Arithmetic,
ARITH, pp. 243–252 (2011). https://doi.org/10.1109/ARITH.2011.40
8. Brain, M., Tinelli, C., Ruemmer, P., Wahl, T.: An automatable formal semantics
for IEEE-754 floating-point arithmetic. Technical report (2015). http://smt-lib.
org/papers/BTRW15.pdf
9. Cadar, C., Dunbar, D., Engler, D.: KLEE: unassisted and automatic generation of
high-coverage tests for complex systems programs. In: OSDI (2008)
10. Chen, L., Miné, A., Cousot, P.: A sound floating-point polyhedra abstract domain.
In: Ramalingam, G. (ed.) APLAS 2008. LNCS, vol. 5356, pp. 3–18. Springer,
Heidelberg (2008). https://doi.org/10.1007/978-3-540-89330-1 2
11. Chiang, W.F., Baranowski, M., Briggs, I., Solovyev, A., Gopalakrishnan, G.,
Rakamarić, Z.: Rigorous floating-point mixed-precision tuning. In: Symposium on
Principles of Programming Languages (POPL), pp. 300–315. ACM (2017)
12. Corden, M., Kreitzer, D.: Consistency of floating-point results using the Intel com-
piler. Technical report, Intel Corporation (2010)
13. Damouche, N., Martel, M.: Mixed precision tuning with salsa. In: PECCS, pp.
185–194. SciTePress (2018)
172 H. Becker et al.
14. Damouche, N., Martel, M., Chapoutot, A.: Intra-procedural optimization of the
numerical accuracy of programs. In: Núñez, M., Güdemann, M. (eds.) FMICS 2015.
LNCS, vol. 9128, pp. 31–46. Springer, Cham (2015). https://doi.org/10.1007/978-
3-319-19458-5 3
15. Darulova, E., Izycheva, A., Nasir, F., Ritter, F., Becker, H., Bastian, R.: Daisy
- framework for analysis and optimization of numerical programs (tool paper).
In: Beyer, D., Huisman, M. (eds.) TACAS 2018. LNCS, vol. 10805, pp. 270–287.
Springer, Cham (2018). https://doi.org/10.1007/978-3-319-89960-2 15
16. Darulova, E., Sharma, S., Horn, E.: Sound mixed-precision optimization with
rewriting. In: ICCPS (2018)
17. De Dinechin, F., Lauter, C.Q., Melquiond, G.: Assisted verification of elementary
functions using Gappa. In: ACM Symposium on Applied Computing, pp. 1318–
1322. ACM (2006)
18. Goubault, E., Putot, S.: Static analysis of finite precision computations. In: Jhala,
R., Schmidt, D. (eds.) VMCAI 2011. LNCS, vol. 6538, pp. 232–247. Springer,
Heidelberg (2011). https://doi.org/10.1007/978-3-642-18275-4 17
19. Harrison, J.: Floating point verification in HOL. In: Thomas Schubert, E., Windley,
P.J., Alves-Foss, J. (eds.) TPHOLs 1995. LNCS, vol. 971, pp. 186–199. Springer,
Heidelberg (1995). https://doi.org/10.1007/3-540-60275-5 65
20. Harrison, J.: Floating-point verification. In: Fitzgerald, J., Hayes, I.J., Tarlecki,
A. (eds.) FM 2005. LNCS, vol. 3582, pp. 529–532. Springer, Heidelberg (2005).
https://doi.org/10.1007/11526841 35
21. Jeannet, B., Miné, A.: Apron: a library of numerical abstract domains for static
analysis. In: Bouajjani, A., Maler, O. (eds.) CAV 2009. LNCS, vol. 5643, pp. 661–
667. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02658-4 52
22. Jourdan, J.H.: Verasco: a formally verified C static analyzer. Ph.D. thesis, Univer-
sité Paris Diderot (Paris 7), May 2016
23. Lee, W., Sharma, R., Aiken, A.: On automatically proving the correctness of
math.h implementations. In: POPL (2018)
24. Leroy, X.: Formal certification of a compiler back-end, or: programming a compiler
with a proof assistant. In: 33rd ACM Symposium on Principles of Programming
Languages, pp. 42–54. ACM Press (2006)
25. Leroy, X.: A formally verified compiler back-end. J. Autom. Reasoning 43(4), 363–
446 (2009). http://xavierleroy.org/publi/compcert-backend.pdf
26. Liew, D., Schemmel, D., Cadar, C., Donaldson, A.F., Zähl, R., Wehrle, K.:
Floating-point symbolic execution: a case study in n-version programming. In: Pro-
ceedings of the 32nd IEEE/ACM International Conference on Automated Software
Engineering. IEEE Press (2017)
27. Lopes, N.P., Menendez, D., Nagarakatte, S., Regehr, J.: Provably correct peephole
optimizations with alive. In: PLDI (2015)
28. Magron, V., Constantinides, G., Donaldson, A.: Certified roundoff error bounds
using semidefinite programming. ACM Trans. Math. Softw. 43(4), 1–34 (2017)
29. Menendez, D., Nagarakatte, S., Gupta, A.: Alive-FP: automated verification of
floating point based peephole optimizations in LLVM. In: Rival, X. (ed.) SAS
2016. LNCS, vol. 9837, pp. 317–337. Springer, Heidelberg (2016). https://doi.org/
10.1007/978-3-662-53413-7 16
30. Moscato, M., Titolo, L., Dutle, A., Muñoz, C.A.: Automatic estimation of verified
floating-point round-off errors via static analysis. In: Tonetta, S., Schoitsch, E.,
Bitsch, F. (eds.) SAFECOMP 2017. LNCS, vol. 10488, pp. 213–229. Springer,
Cham (2017). https://doi.org/10.1007/978-3-319-66266-4 14
Icing: Supporting Fast-Math Style Optimizations in a Verified Compiler 173
31. Nötzli, A., Brown, F.: LifeJacket: verifying precise floating-point optimizations in
LLVM. In: Proceedings of the 5th ACM SIGPLAN International Workshop on
State of the Art in Program Analysis, pp. 24–29. ACM (2016)
32. Panchekha, P., Sanchez-Stern, A., Wilcox, J.R., Tatlock, Z.: Automatically improv-
ing accuracy for floating point expressions. In: Conference on Programming Lan-
guage Design and Implementation (PLDI) (2015)
33. Püschel, M., et al.: SPIRAL - a generator for platform-adapted libraries of signal
processing alogorithms. IJHPCA 18(1), 21–45 (2004)
34. Ramananandro, T., Mountcastle, P., Meister, B., Lethin, R.: A unified Coq frame-
work for verifying C programs with floating-point computations. In: Certified Pro-
grams and Proofs (CPP) (2016)
35. Rubio-González, C., et al.: Precimonious: tuning assistant for floating-point preci-
sion. In: SC (2013)
36. Schkufza, E., Sharma, R., Aiken, A.: Stochastic optimization of floating-point pro-
grams with tunable precision. In: PLDI (2014)
37. Solovyev, A., Jacobsen, C., Rakamarić, Z., Gopalakrishnan, G.: Rigorous esti-
mation of floating-point round-off errors with Symbolic Taylor Expansions. In:
Bjørner, N., de Boer, F. (eds.) FM 2015. LNCS, vol. 9109, pp. 532–550. Springer,
Cham (2015). https://doi.org/10.1007/978-3-319-19249-9 33
38. Tan, Y.K., Myreen, M.O., Kumar, R., Fox, A., Owens, S., Norrish, M.: The verified
CakeML compiler backend. J. Funct. Program. 29 (2019)
39. Tristan, J.B., Leroy, X.: Formal verification of translation validators: a case study
on instruction scheduling optimizations. In: Proceedings of the 35th ACM Sym-
posium on Principles of Programming Languages (POPL 2008), pp. 17–27. ACM
Press, January 2008
40. Tristan, J.B., Leroy, X.: Verified validation of lazy code motion. In: Proceedings
of the 2009 ACM SIGPLAN Conference on Programming Language Design and
Implementation (PLDI 2009), pp. 316–326 (2009)
41. Tristan, J.B., Leroy, X.: A simple, verified validator for software pipelining. In:
Proceedings of the 37th ACM Symposium on Principles of Programming Languages
(POPL 2010), pp. 83–92. ACM Press (2010)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Sound Approximation of Programs
with Elementary Functions
1 Introduction
Numerical programs face an inherent tradeoff between accuracy and efficiency.
Choosing a larger finite precision provides higher accuracy, but is generally more
costly in terms of memory and running time. Not all applications, however, need
a very high accuracy to work correctly. We would thus like to compute the results
with only as much accuracy as is needed, in order to save resources.
Navigating this tradeoff between accuracy and efficiency is challenging. First,
estimating the accuracy, i.e. bounding roundoff and approximation errors, is non-
trivial due to the complex nature of finite-precision arithmetic which inevitably
occurs in numerical programs. Second, the space of possible implementations is
usually prohibitively large and thus cannot be explored manually.
Today, users can choose between different automated tools for analyzing
accuracy of floating-point programs [7,8,11,14,18,20,26] as well as for choosing
between different precisions [5,6,10]. The latter tools perform mixed-precision
tuning, i.e. they assign different floating-point precisions to different operations,
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 174–183, 2019.
https://doi.org/10.1007/978-3-030-25543-5_11
Sound Approximation of Programs with Elementary Functions 175
and can thus improve the performance w.r.t. a uniform precision implemen-
tation. The success of such an optimization is, however, limited to the case
when uniform precision is just barely not enough to satisfy a given accuracy
specification.
Another possible target for performance optimizations are elementary func-
tions (e.g. sin, exp). Users by default choose single- or double-precision libm
library function implementations, which are fully specified in the C language
standard (ISO/IEC 9899:2011) and provide high accuracy. Such implementa-
tions are, however, expensive. When high accuracy is not needed, we can save
significant resources by replacing libm calls by coarser approximations, opening
up a larger, and different tradeoff space than mixed-precision tuning. Unfortu-
nately, existing automated approaches [1,25] do not provide accuracy guarantees.
On the other hand, tools like Metalibm [3] approximate individual elementary
functions by polynomials with rigorous accuracy guarantees given by the user.
They, however, do not consider entire programs and leave the selection of its
parameters to the user, limiting its usability mostly to experts.
We present an approach and a tool which leverages the existing whole-program
error analysis of Daisy [8] and Metalibm’s elementary function approximation to
provide both sound whole-program guarantees as well as efficient C implementa-
tions for floating-point programs with elementary function calls. Given a target
error specification, our tool automatically distributes the error budget among uni-
form single or double precision arithmetic operations and elementary functions,
and selects a suitable polynomial degree for their approximation.
We have implemented our approach inside the tool Daisy and compare the
performance of generated programs against programs using libm on examples
from literature. The benchmarks spend on average 38% and up to 50% of time
for evaluation of the elementary functions. Our tool improves the overall perfor-
mance by on average 14% and up to 25% when approximating each elementary
function call individually, and on average 17% and up to 31% when approximat-
ing compound function calls. These improvements were achieved solely by opti-
mizing approximations to elementary functions and illustrate pertinence of our
approach. These performance improvements incur overall whole-program errors
which are only 2–3 magnitudes larger than double-precision implementations
using libm functions and are well below the errors of single-precision implemen-
tations. Our tool thus allows to effectively trade performance for larger, but
guaranteed, error bounds.
Related Work. Several static analysis tools bound roundoff errors of floating-
point computations [7,18,20,26], assuming libm implementations, or verify the
correctness of several functions in Intel’s libm library [17]. Muller [21] provides
176 E. Darulova and A. Volkova
2 Our Approach
We explain our approach using the following example [28] computing a forward
kinematics equation and written in Daisy’s real-valued specification language:
Unlike previous work, our tool guarantees that the user-specified error is satis-
fied. It soundly distributes the overall error budget among arithmetic operations
and libm calls using Daisy’s static analysis. Metalibm uses the state-of-the art
minimax polynomial approximation algorithm [2] and Sollya [4] and Gappa [12]
to bound errors of their implementations. Given a function, a target relative
error bound and implementation parameters, Metalibm generates C code. Our
tool does not guarantee to find the most efficient implementation; the search
space of implementation and approximation choices is highly complex and dis-
crete, and it is thus infeasible to find the optimal parameters.
The input to our tool is a straight-line program2 with standard arithmetic
operators (=, −, ∗, /) as well as the most commonly used elementary functions
√
(sin , cos , tan , log , exp , ). The user further specifies the domains of all inputs,
together with a target overall absolute error which must be satisfied. The output
is C code with arithmetic operations in uniform single or double precision, and
libm approximations in double precision (Metalibm’s only supported precision).
Algorithm. We will use ‘program’ for the entire expression, and ‘function’ for
individual elementary functions. Our approach works in the following steps.
Step 1 We re-use Daisy’s frontend which parses the input specification. We
add a pre-processing step, which decomposes the abstract syntax tree (AST) of
the program we want to approximate such that each elementary function call is
assigned to a fresh local variable. This transformation eases the later replacement
of the elementary functions with an approximation.
Step 2 We use Daisy’s roundoff error analysis on the entire program, assum-
ing a libm implementation of elementary functions. This analysis computes a
real-valued range and a worst-case absolute roundoff error bound for each subex-
pression in the AST, assuming uniform single or double precision as appropriate.
We use this information in the next step to distribute the error and to determine
the parameters for Metalibm for each function call.
Step 3 This is the core step, which calls Metalibm to generate a (piece-
wise) polynomial approximation for each elementary function which was assigned
to a local variable. Each call to Metalibm specifies the local target error for
each function call, the polynomial degree and the domain of the function call
arguments. To determine the argument domains, we use the range and error
information obtained in the previous step. Our tool tries different polynomial
degrees and selects the fastest implementation. We explain our error distribution
and polynomial selection further below.
Metalibm generates efficient double-precision C code including argument
reduction (if applicable), domain splitting, and polynomial approximation with
a guaranteed error below the specified target error (or returns an error). Met-
alibm furthermore supports approximations with lookup tables, whose size the
user can control manually via our tool frontend as well.
2
All existing approaches for analysing floating-point roundoff errors which handle
loops or conditional branches, reduce the reasoning about errors to straight-line code,
e.g. through loop invariants [9, 14] or loop unrolling [7], or path-wise analysis [7, 9, 15].
178 E. Darulova and A. Volkova
Step 4 Our tool performs roundoff error analysis again, this time taking into
account the new approximations’ precise error bounds reported by Metalibm.
Finally, Daisy generates C code for the program itself, as well as all necessary
headers to link with the approximation generated by Metalibm.
where we denote by f the real-valued specification of the program; fˆ1 and fˆ2 have
one and two elementary function calls approximated, respectively, and arithmetic
is considered exact; and f˜ is the final finite-precision implementation.
Daisy first determines the budget for the finite-precision roundoff error
(|fˆ2 (x) − f˜(x̃)|) and then distributes the remaining part among libm calls. At
this point, Daisy cannot compute |fˆ2 (x) − f˜(x̃)| exactly, as the approximations
are not available yet. Instead, it assumes libm-based approximations as baseline.
Then, Daisy distributes the remaining error budget either equally among
the elementary function calls, or by taking into account that the approximation
errors are propagated differently through the program. This error propagation
is estimated by computing the derivative w.r.t. to each elementary function call
(which gives an estimation of the conditional number). Daisy computes partial
derivatives symbolically and maximizes them over the specified input domain.
Finally, we obtain an error budget for each libm call, representing the total
error due to the elementary function call at the end of the program. For calling
Metalibm, however, we need the local error at the function call site. Due to error
propagation, these two errors can differ significantly, and may lead to overall
errors which exceed the error bound specified by the user. We estimate the error
propagation using a linear approximation based on derivatives, and use this
estimate to compute a local target error from the total error budget.
Since Metalibm usually generates approximations with slightly tighter error
bounds than asked for, our tool performs a second roundoff analysis (step 4),
where all errors (smaller or larger) are correctly taken into account.
3 Experimental Evaluation
We evaluate our approach in terms of accuracy and performance on bench-
marks from literature [9,19,28] which include elementary function calls, and
extend them with the examples rodriguesRotation3 and ex2* and ex3 d, which
are problems from a graduate analysis textbook. While they are relatively short,
they represent important kernels usually employing several elementary function
calls4 . We base target error bounds on roundoff errors of a libm implementation:
middle and large errors, each of which is roughly three and four orders of mag-
nitudes larger than the libm-based bound, respectively. By default, we assume
double 64 bit precision.
Our tool provides an automatic generation of benchmarking code for each
input program. Each benchmarking executable runs the Daisy-generated code
on 107 random inputs from the input domain and measures performance in the
number of processor clock cycles. Of the measured number of cycles we discard
the highest 10%, as we have observed these to be outliers.
Fig. 1. Average performance and standard deviation. For each benchmark, the first
bar shows the running time of the libm-based implementation and the second one of
our implementation. Even relatively small overall time improvements are significant
w.r.t. the time portion we can optimize (in green). Our implementations also have
significantly smaller standard deviation (black bars). (Color figure online)
4 Conclusion
We presented a fully automated approach which improves the performance of
small numerical kernels at the expense of some accuracy by generating custom
approximations of elementary functions. Our tool is parametrized by a user-given
whole-program absolute error bound which is guaranteed to be satisfied by the
generated code. Experiments illustrate that the tool efficiently uses the available
margin for improvement and provides significant speedups for double-precision
implementations. This work provides a solid foundation for future research in the
areas of automatic approximations of single-precision and multivariate functions.
Acknowledgments. The authors thank Christoph Lauter for useful discussions and
Youcef Merah for the work on an early prototype.
References
1. Bornholt, J., Torlak, E., Grossman, D., Ceze, L.: Optimizing synthesis with metas-
ketches. In: POPL (2016)
182 E. Darulova and A. Volkova
22. Panchekha, P., Sanchez-Stern, A., Wilcox, J.R., Tatlock, Z.: Automatically improv-
ing accuracy for floating point expressions. In: PLDI (2015)
23. Püschel, M., et al.: Spiral - a generator for platform-adapted libraries of signal
processing alogorithms. IJHPCA 18(1), 21–45 (2004)
24. Rubio-González, C., et al.: Precimonious: tuning assistant for floating-point preci-
sion. In: SC (2013)
25. Schkufza, E., Sharma, R., Aiken, A.: Stochastic optimization of floating-point pro-
grams with tunable precision. In: PLDI (2014)
26. Solovyev, A., Jacobsen, C., Rakamarić, Z., Gopalakrishnan, G.: Rigorous esti-
mation of floating-point round-off errors with Symbolic Taylor Expansions. In:
Bjørner, N., de Boer, F. (eds.) FM 2015. LNCS, vol. 9109, pp. 532–550. Springer,
Cham (2015). https://doi.org/10.1007/978-3-319-19249-9 33
27. Vuduc, R., Demmel, J.W., Bilmes, J.A.: Statistical models for empirical search-
based performance tuning. Int. J. High Perform. Comput. Appl. 18(1), 65–94
(2004)
28. Yazdanbakhsh, A., Mahajan, D., Esmaeilzadeh, H., Lotfi-Kamran, P.: AxBench: a
multiplatform benchmark suite for approximate computing. IEEE Des. Test 34(2),
60–68 (2017)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Verification
Formal Verification of Quantum
Algorithms Using Quantum Hoare Logic
1 Introduction
Due to the rapid progress of quantum technology in the recent years, it is
predicted that practical quantum computers can be built within 10–15 years.
Especially during the last 3 years, breakthroughs have been made in quantum
hardware. Programmable superconductor quantum computers and trapped ion
quantum computers have been built in universities and companies [1,3,4,6,23].
In another direction, intensive research on quantum programming has been
conducted in the last decade [16,45,51,53], as surveyed in [27,52]. In particular,
several quantum programming languages have been defined and their compil-
ers have been implemented, including Quipper [31], Scaffold [35], QWire [47],
Microsoft’s LIQUi| [25] and Q# [57], IBM’s OpenQASM [22], Google’s Cirq
[30], ProjectQ [56], Chisel-Q [40], Quil [55] and Q |SI [39]. These research allow
quantum programs to first run on an ideal simulator for testing, and then on
physical devices [5]. For instance, many small quantum algorithms and proto-
cols have already been programmed and run on IBM’s simulators and quantum
computers [1,2].
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 187–207, 2019.
https://doi.org/10.1007/978-3-030-25543-5_12
188 J. Liu et al.
Clearly, simulators can only be used for testing. It shows the correctness of
the program on one or a few inputs, not its correctness under all possible inputs.
Various theories and tools have been developed to formally reason about quan-
tum programs for all inputs on a fixed number of qubits. Equivalence checking
[7,8], termination analysis [38], reachability analysis [64], and invariant gen-
eration [62] can be used to verify the correctness or termination of quantum
programs. Unfortunately, the size of quantum programs on which these tools
are applicable is quite limited. This is because all of these tools still perform
calculations over the entire state space, which for quantum algorithms has size
exponential in the number of qubits. For instance, even on the best supercom-
puters today, simulation of a quantum program is restricted to about 50–60
qubits. Most model-checking algorithms, which need to perform calculations on
operators over the state space, are restricted to 25–30 qubits with the current
computing resources.
Deductive program verification presents a way to solve this state space explo-
sion problem. In deductive verification, we do not attempt to execute the pro-
gram or explore its state space. Rather, we define the semantics of the program
using precise mathematical language, and use mathematical reasoning to prove
the correctness of the program. These proofs are checked on a computer (for
example, in proof assistants such as Coq [15] or Isabelle [44]) to ensure a very
high level of confidence.
To apply deductive reasoning to quantum programs, it is necessary to first
define a precise semantics and proof system. There has already been a lot of work
along these lines [9,20,21,61]. A recent result in this direction is quantum Hoare
logic (QHL) [61]. It extends to sequential quantum programs the Floyd-Hoare-
Naur inductive assertion method for reasoning about correctness of classical
programs. QHL is proved to be (relatively) complete for both partial correctness
and total correctness of quantum programs.
In this paper, we formalize the theory of quantum Hoare logic in
Isabelle/HOL, and use it to verify a non-trivial quantum algorithm – Grover’s
search algorithm1 . In more detail, the contributions of this paper are as follows.
1. We formally prove the main results of quantum Hoare logic in Isabelle/HOL.
That is, we write down the syntax and semantics of quantum programs, spec-
ify the basic Hoare triples, and prove the soundness and completeness of the
resulting deduction system (for partial correctness of quantum programs). To
our best knowledge, this is the first formalization of a Hoare logic for quantum
programs in an interactive theorem prover.
2. As an application of the above formalization, we verify the correctness of
Grover’s search algorithm. In particular, we prove that the algorithm always
succeeds on the (infinite) class of inputs where the expected probability of
success is 1.
3. As preparation for the above, we extend Isabelle/HOL’s library for linear
algebra. Based on existing work [13,58], we formalize many further results in
linear algebra for complex matrices, in particular positivity and the Löwner
1
Available online at https://www.isa-afp.org/entries/QHLProver.html.
Formal Verification of Quantum Algorithms Using Quantum Hoare Logic 189
order. Another significant part of our work is to define the tensor product
of vectors and matrices, in a way that can be used to extend and combine
operations on quantum variables in a consistent way. Finally, we implement
algorithms to automatically prove identities in linear algebra to ease the for-
malization process.
The organization of the rest of the paper is as follows. Section 2 gives a brief
introduction to quantum Hoare logic. Section 3 describes in detail our formal-
ization of QHL in Isabelle/HOL. Section 4 describes the application to Grover’s
algorithm. Section 5 discusses automation techniques, and gives some idea about
the cost of the formalization. Section 6 reviews some related work. Finally, we
conclude in Sect. 7 with a discussion of future directions of work.
We expect theorem proving techniques will play a crucial role in formal rea-
soning about quantum computing, as they did for classical computing, and we
hope this paper will be one of the first steps in its development.
In this section, we briefly recall the basic concepts and results of quantum Hoare
logic (QHL). We only introduce the proof system for partial correctness, since
the one for total correctness is not formalized in our work. In addition, we make
two simplifications compared to the original work: we consider only variables
with finite dimension, and we remove the initialization operation. The complete
version of QHL can be found in [61].
In QHL, the number of quantum variables is pre-set before each run of the
program. Each quantum variable qi has dimension di . The (pure) state of the
quantum variable takes value in a complex vector space of dimension di . The
overall (pure) state takes value in thetensor product of the vector spaces for the
variables, which has dimension d = di . The mixed state for variable qi (resp.
overall) is given by a di × di (resp. d × d) matrix satisfying certain conditions
(making them partial density operators). The notation q is used to denote some
finite sequence of distinct quantum variables (called a quantum register ). We
denote the vector space corresponding to q by Hq .
The syntax of quantum programs is given by the following grammar:
where
(Skip) {P } skip {P }
(UT) {U † P U } q := U q {P }
P P {P } S {Q } Q Q
(Order)
{P } S {Q}
for all partial density operators ρ. Here tr is the trace of a matrix. The semantics
for total correctness is defined similarly:
We note that they become the same when the quantum program S is terminating,
i.e. tr(S(ρ)) = tr(ρ) for all partial density operators ρ.
The proof system qPD for partial correctness of quantum programs is given
in Fig. 1. The soundness and (relative) completeness of qPD is proved in [61]:
Theorem 1. The proof system qPD is sound and (relative) complete for partial
correctness of quantum programs.
3 Formalization in Isabelle/HOL
In this section, we describe the formalization of quantum Hoare logic in
Isabelle/HOL. Isabelle/HOL [44] is an interactive theorem prover based on
higher-order logic. It provides a flexible language in which one can state and
prove theorems in all areas of mathematics and computer science. The proofs
are checked by the Isabelle kernel according to the rules of higher-order logic,
providing a very high level of confidence in the proofs. A standard application
of Isabelle/HOL is the formalization of program semantics and Hoare logic. See
[43] for a description of the general technique, applied to a very simple classical
programming language.
the Schur decomposition theorem is formalized in [58], showing that any matrix
is similar to an upper-triangular matrix U . However, it does not show that Q
can be made unitary. We complete the proof of the full theorem, following the
outline of the previous proof.
Next, we define the key concept of positive semi-definite matrices (called
positive matrices from now on for simplicity). An n × n matrix A is positive if
v † Av ≥ 0 for any vector v. We formalize the basic theory of positive matrices,
in particular showing that any positive matrix is Hermitian.
Density operators and partial density operators are then defined as follows:
definition density operator A ←→ positive A ∧ trace A = 1
definition partial density operator A ←→ positive A ∧ trace A ≤ 1
Next, the Löwner partial order is defined as a partial order on the type
complex mat as follows:
definition lowner le ( infix ≤L 65) where
A ≤L B ←→ dim row A = dim row B ∧ dim col A = dim col B ∧ positive (B − A)
A key result that we formalize states that under the Löwner partial order, any
non-decreasing sequence of partial density operators has a least upper bound,
which is the pointwise limit of the operators when written as n × n matrices.
This is used to define the infinite sum of matrices, necessary for the semantics
of the while loop.
The total dimension d is given by (here prod list denotes the product of a
list of natural numbers).
definition d = prod list dims
The (mixed) state of the system is given by a partial density operator with
dimension d × d. Hence, we declare
type synonym state = complex mat
datatype com =
SKIP
| Utrans (complex mat)
| Seq com com ( ;;/ [60, 61] 60)
| Measure nat (nat ⇒ complex mat) (com list)
| While (nat ⇒ complex mat) com
At this stage, we assume that all matrices involved operate on the global state
(that is, all of the quantum variables). We will define commands that operate on a
subset of quantum variables later. Measurement is defined over any finite number
of matrices. Here Measure n f C is a measurement with n options, f i for i < n
are the measurement matrices, and C ! i is the command to be executed when the
measurement yields result i. Likewise, the first argument to While gives measure-
ment matrices, where only the first two values are used.
Next, we define well-formedness and denotation of quantum programs. The
predicate well com :: com ⇒ bool expresses the well-formedness condition. For a
quantum program to be well-formed, all matrices involved should have the right
dimension, the argument to Utrans should be unitary, and the measurements for
Measure and While should satisfy the condition i Mi† Mi = In . Denotation is
written as denote :: com ⇒ state ⇒ state, defined as in Sect. 2. Both well com
and denote is defined by induction over the structure of the program. The details
are omitted here.
With this, we can give the semantic definition of Hoare triples for partial and
total correctness. These definitions are intended for the case where P and Q are
quantum predicates, and S is a well-formed program. They define what Hoare
triples are valid.
definition hoare total correct (|=t {(1 )}/ ( )/ {(1 )} 50) where
|=t {P} S {Q} ←→ (∀ρ∈density states. trace (P * ρ) ≤ trace (Q * denote S ρ))
definition hoare partial correct (|=p {(1 )}/ ( )/ {(1 )} 50) where
|=p {P} S {Q} ←→ (∀ρ∈density states.
trace (P * ρ) ≤ trace (Q * denote S ρ) + (trace ρ − trace (denote S ρ)))
Next, we define what Hoare triples are provable in the qPD system. A Hoare
triple for partial correctness is provable (written as p {P} S {Q}) if it can
be derived by combining the rules in Fig. 1. This condition can be defined in
Isabelle/HOL as an inductive predicate. The definition largely parallels the for-
mulae shown in the figure.
194 J. Liu et al.
With these definitions, we can state and prove soundness and completeness of
the Hoare rules for partial correctness. Note that the statement for completeness
is very simple, seemingly without needing to state “relative to the theory of the
field of complex numbers”. This is because we are taking a shallow embedding
for predicates, hence any valid statement on complex numbers, in particular
positivity of matrices, is in principle available for use in the deduction system
(for example, in the assumption to the order rule).
theorem hoare partial sound:
p {P} S {Q} =⇒ well com S =⇒ |=p {P} S {Q}
So far in our development, all quantum operations act on the entire global state.
However, for the actual applications, we are more interested in operations that
act on only a few of the quantum variables. For this, we need to define an
extension operator, that takes a matrix on the quantum state for a subset of the
variables, and extend it to a matrix on all of the variables. More generally, we
need to define tensor products on vectors and matrices defined over disjoint sets
of variables. These need to satisfy various consistency properties, in particular
commutativity and associativity of the tensor product. Note that directly using
the Kronecker product is not enough, as the matrix to be extended may act
on any (possibly non-adjacent) subset of variables, and we need to distinguish
between all possible cases.
Before presenting the definition, we first review some preliminaries. We make
use of existing work in [13], in particular their encode and decode operations, and
emulate their definitions of matricize and dematricize (used in [13] to convert
between tensors represented as a list and matrices). Given a list of dimensions di ,
the encode and decode operations (named digit encode and digit decode) produce
a correspondence between lists ofindices ai satisfying ai < di for each i < n,
and a natural number less than i di . This works in a way similar to finding
the binary representation of a number (in which case all “dimensions” are 2).
List operation nths xs S constructs the subsequence of xs containing only the
elements at indices in the set S .
The locale partial state extends state sig, adding vars for a subset of quantum
variables. Our goal is to define the tensor product of two vectors or matrices over
vars and its complement −vars, respectively.
Formal Verification of Quantum Algorithms Using Quantum Hoare Logic 195
First, dims1 and dims2 are dimensions of variables vars and -vars:
definition dims1 = nths dims vars
definition dims2 = nths dims (−vars)
The operation encode1 (resp. encode2 ) provides the map from the product
of dims to the product of dims1 (resp. dims2 ).
definition encode1 i = digit decode dims1 (nths (digit encode dims i) vars)
definition encode2 i = digit decode dims2 (nths (digit encode dims i) (−vars))
With this, tensor products on vectors and matrices are defined as follows
(here d is the product of dims).
definition tensor vec :: ’a vec ⇒ ’a vec ⇒ ’a vec where
tensor vec v1 v2 = Matrix.vec d (λi. v1 $ encode1 i * v2 $ encode2 i)
We prove the basic properties of tensor vec and tensor mat, including that
they behave correctly with respect to identity, multiplication, adjoint, and trace.
Extension of matrices is a special case of the tensor product, where the matrix
on −vars is the identity (here d2 is the product of dim2 ).
definition mat extension :: ’a mat ⇒ ’a mat where
mat extension m = tensor mat m (1m d2)
To make use of tensor mat to define tensor product in this more general
setting, we need to find the relative position of variables vars1 within vars1 ∪
vars2 . This is done using ind in set, which counts the position of x within A.
definition ind in set A x = card {i. i ∈ A ∧ i < x}
definition vars1’ = (ind in set (vars1 ∪ vars2)) ‘ vars1
196 J. Liu et al.
Finally, the more general tensor products are defined as follows (note since
we are now outside the partial state locale, we must use qualified names for ten-
sor vec and tensor mat, and supply extra arguments for variables in the locale.
Here dims0 = nths dims (vars1 ∪ vars2) is the total list of dimensions).
definition ptensor vec :: ’a vec ⇒ ’a vec ⇒ ’a vec where
ptensor vec v1 v2 = partial state.tensor vec dims0 vars1’ v1 v2
The definitions ptensor vec and ptensor mat satisfy several key consistency
properties. In particular, they satisfy associativity of tensor product. For matri-
ces, this is expressed as follows:
In this section, we illustrate the above framework for tensor product of matrices
with an application, to be used in the verification of Grover’s algorithm in the
next section.
In many quantum algorithms, we need to deal with the tensor product of
an arbitrary number of Hadamard matrices. The Hadamard matrix (denoted
hadamard in Isabelle) is given by:
1 1 1
H=√
2 1 −1
For example, in Grover’s algorithm, we need to apply the Hadamard trans-
form on each of the first n quantum variables, given by vars1 . A single Hadamard
transform on the i’th quantum variable, extended to a matrix acting on the first
n quantum variables, is defined as follows:
Formal Verification of Quantum Algorithms Using Quantum Hoare Logic 197
In this section, we describe our application of the above framework to the veri-
fication of Grover’s quantum search algorithm [32]. Quantum search algorithms
[18,32] concern searching an unordered database for an item satisfying some
given property. This property is usually specified by an oracle. In a database of
N items, where M items satisfy the property, finding an item with the property
requires on average O(N/M ) calls to the oracle for classical computers. Grover’s
algorithm reduces this complexity to O( N/M ).
The basic idea of Grover’s algorithm is rotation. The algorithm starts from an
initial state/vector. At every step, it rotates towards the target state/vector for
a small angle. As summarised in [18,19,42], it can be mathematically described
by the following equation [42, Eq. (6.12)]:
2k + 1 2k + 1
Gk |ψ0 = cos( θ) |α + sin( θ) |β ,
2 2
where Grepresents the operator at each step, |ψ0 is the initial state, θ =
2 arccos (N − M )/N , |α is the bad state (for items not satisfying the prop-
erty), and |β is the good state (for items
satisfying the property). Thus when θ
is very small, i.e., M N , it costs O( N/M ) rounds to reach a target state.
Originally, Grover’s algorithm only resolves the case M = 1 [32]. It is imme-
diately generalized to the case of known M with the same idea and the case of
198 J. Liu et al.
unknown M with some modifications [18]. After that, the idea is generalized to
all invertible quantum processes [19].
The paper [61] uses Grover’s algorithm as the main example illustrating
quantum Hoare logic. We largely follow its approach in this paper. See also [42,
Chapter 6] for a general introduction.
First, we setup a locale for the inputs to the search problem.
locale grover state =
fixes n :: nat and f :: nat ⇒ bool
assumes n: n > 1
and dimM: card {i. i < (2::nat) ˆ n ∧ f i} > 0
card {i. i < (2::nat) ˆ n ∧ f i} < (2::nat) ˆ n
Here n is the number of qubits used to represent the items. That is, we assume
N = 2n items in total. The oracle is represented by the function f , where only
its values on inputs less than 2n are used. The number of items satisfying the
property is given by M = card {i. i < N ∧ f i}.
Next, we setup a locale for Grover’s algorithm.
locale grover state sig = grover state + state sig +
fixes R :: nat and K :: nat
assumes dims def: dims = replicate n 2 @ [K]
assumes R: R = π / (2 * θ) − 1 / 2
assumes K: K > R
Here tensor P denotes the tensor product of a matrix on the first n variables
(of dimension 2n × 2n ) and a matrix on the loop variable (of dimension K × K).
Executing this program is equivalent to multiplying the quantum state corre-
sponding to the first n variables by H ⊗n , as shown in Sect. 3.5.
The body of the loop is given by:
hadamard n n ;;
Utrans P vars1 mat Ph ;;
hadamard n n ;;
Utrans P vars2 (mat incr n)
where each of the three matrices mat O, mat Ph and mat incr can be defined
directly.
where the measurements for the while loop and at the end of the algorithm are:
definition M0 = mat K K (λ(i,j). if i = j ∧ i ≥ R then 1 else 0)
definition M1 = mat K K (λ(i,j). if i = j ∧ i < R then 1 else 0)
definition testN k = mat N N (λ(i,j). if i = k ∧ j = k then 1 else 0)
We can now state the final correctness result. Let proj v be the outer product
vv † , and proj k k be |k k|, where |k is the k’th basis vector on the vector space
corresponding to the loop variable. Let pre and post be given as follows:
definition pre = proj (vec N (λk. if k = 0 then 1 else 0))
definition post = mat N N (λ(i, j). if i = j ∧ f i then 1 else 0)
We now briefly outline the proof strategy. Following the definition of Grover ,
the proof of the above Hoare triple is divided into three main parts, for the
initialization by Hadamard matrices, for the while loop, and for the measurement
at the end.
200 J. Liu et al.
In each part, assertions are first inserted around commands according to the
Hoare rules to form smaller Hoare triples. In particular, the precondition of the
while loop part is exactly the invariant of the loop. Moreover, it has to be shown
that these assertions satisfy the conditions for being quantum predicates, which
involve computing their dimension, showing positiveness, and being bounded
by the identity matrix under the Löwner order. Then, these Hoare triples are
derived using our deduction system. Before combining them together, we have
to show that the postcondition of each command is equal to the precondition
of the later one. After that, the three main Hoare triples can be obtained by
combining these smaller ones.
After the derivation of the three Hoare triples above, we prove the Löwner
order between the postcondition of each triple and the precondition of the follow-
ing triple. Afterwards, the triples can be combined into the Hoare triple below:
5 Discussion
These extra conditions make the rules difficult to apply for standard Isabelle
automation. For our work, we implemented our own tactic handling these rules.
In addition to the ring properties, we also frequently need to use the cyclic
property of trace (e.g. tr(ABC) = tr(BCA)), as well as the properties of adjoint
((AB)† = B † A† and A†† = A). For simplicity, we restrict to identities involving
only n × n matrices, where n is a parameter given to the tactic.
The tactic is designed to prove equality between two expressions. It works
by computing the normal form of the expressions – using ring identities and
identities for the adjoint to fully expand the expression into polynomial form.
To handle the trace, the expression tr(A1 · · · An ) is normalized to put the Ai
that is the largest according to Isabelle’s internal term order last. All dimension
assumptions are collected and reduced (for example, the assumption A * B ∈
carrier mat n n is reduced to A ∈ carrier mat n n and B ∈ carrier mat n n).
Overall, the resulting tactic is used 80 times in our proofs. Below, we list some
of the more complicated equations resolved by the tactic. The tactic reduces the
goal to dimensional constraints on the atomic matrices (e.g. M ∈ carrier mat n
n and P ∈ carrier mat n n in the first case).
5.2 Statistics
Overall, the formalization consists of about 11,500 lines of Isabelle theories. An
old version of the proof is developed on and off for two years. The current version
is re-developed, using some ideas from the old version. The development of the
new version took about 5 person months. Detailed breakdown of number of lines
for different parts of the proof is given in the following table.
6 Related Work
The closest work to our research is Robert Rand’s implementation of Qwire in
Coq [49,50]. Qwire [47] is a language for describing quantum circuits. In this
model, quantum algorithms are implemented by connecting together quantum
gates, each with a fixed number of bit/qubit inputs and outputs. How the gates
are connected is determined by a classical host language, allowing classical con-
trol of quantum computation. The work [49] defines the semantics of Qwire in
Coq, and uses it to verify quantum teleportation, Deutsch’s algorithm, and an
example on multiple coin flips to illustrate applicability to a family of circuits. In
this framework, program verification proceeds directly from the semantics, with-
out defining a Hoare logic. As in our work, it is necessary to solve the problem
of how to define extensions of an operation on a few qubits to the global state.
The approach taken in [49] is to use the usual Kronecker product, augmented
either by the use of swaps between qubits, or by inserting identity matrices at
strategic positions in the Kronecker product.
There are two main differences between [49] and our work. First, quantum
algorithms are expressed using quantum circuits in [49], while we use quantum
programs with while loops. Models based on quantum circuits have the advan-
tage of being concrete, and indeed most of the earlier quantum algorithms can be
expressed directly in terms of circuits. However, several new quantum algorithms
can be more properly expressed by while loops, e.g. quantum walks with absorb-
ing boundaries, quantum Bernoulli factory (for random number generation),
HHL for systems of linear equations and qPCA (Principal Component Analy-
sis). Second, we formalized a Hoare logic while [49] uses denotational semantics
directly. As in verification of classical programs, Hoare logic encapsulates stan-
dard forms of argument for dealing with each program construct. Moreover,
the rules for QHL is in weakest-precondition form, allowing the possibility of
automated verification condition generation after specifying the loop invariants
(although this is not used in the present paper).
Besides Rand’s work, quite a few verification tools have been developed for
quantum communication protocols. For example, Nagarajan and Gay [41] mod-
eled the BB84 protocol [12] and verified its correctness. Ardeshir-Larijani et al.
[7,8] presented a tool for verification of quantum protocols through equivalence
checking. Existing tools, such as PRISM [37] and Coq, are employed to develop
verification tools for quantum protocols [17,29]. Furthermore, an automatic tool
called Quantum Model-Checker (QMC) is developed [28,46].
Recently, several specific techniques have been proposed to algorithmically
check properties of quantum programs. In [63], the Sharir-Pnueli-Hart method
for verifying probabilistic programs [54] has been generalised to quantum pro-
grams by exploiting the Schrödinger-Heisenberg duality between quantum states
and observables. Termination analysis of nondeterministic and concurrent quan-
tum programs [38] was carried out based on reachability analysis [64]. Invariants
can be generated at some steps in quantum programs for debugging and verifi-
cation of correctness [62]. But up to now no tools are available that implements
Formal Verification of Quantum Algorithms Using Quantum Hoare Logic 203
these techniques. Another Hoare-style logic for quantum programs was proposed
in [36], but without (relative) completeness.
Interactive theorem proving has made significant progress in the formal ver-
ification of classical programs and systems. Here, we focus on listing some tools
designed for special kinds of systems. EasyCrypt [10,11] is an interactive frame-
work for verifying the security of cryptographic constructs in the computational
model. It is developed based on a probabilistic relational Hoare logic to support
machine-checked construction and verification of game-based proofs. Recently,
verification of hybrid systems via interactive theorem proving has also been stud-
ied. KeYmaera X [26] is a theorem prover implementing differential dynamic
logic (dL) [48], for the verification of hybrid programs. In [60], a prover has been
implemented in Isabelle/HOL for reasoning about hybrid processes described
using hybrid CSP [34].
Our work is based on existing formalization of matrices and tensors in
Isabelle/HOL. In [59] (with corresponding AFP entry [58]), Thiemann et al.
developed the matrix library that we use here. In [14] (with corresponding AFP
entry [13]), Bentkamp et al. developed tensor analysis based on the above work,
in an effort to formalize an expressivity result of deep learning algorithms.
7 Conclusion
References
1. IBM Q devices and simulators. https://www.research.ibm.com/ibm-q/technology/
devices/
2. IBM Q experience community. https://quantumexperience.ng.bluemix.net/qx/
community?channel=papers&category=ibm
3. IonQ. https://ionq.co/resources
4. A preview of Bristlecone, Google’s new quantum processor. https://ai.googleblog.
com/2018/03/a-preview-of-bristlecone-googles-new.html
5. Qiskit Aer. https://qiskit.org/aer, https://medium.com/qiskit/qiskit-aer-
d09d0fac7759
6. Unsupervised machine learning on Rigetti 19Q with Forest 1.2. https://
medium.com/rigetti/unsupervised-machine-learning-on-rigetti-19q-with-forest-1-
2-39021339699
7. Ardeshir-Larijani, E., Gay, S.J., Nagarajan, R.: Equivalence checking of quantum
protocols. In: Piterman, N., Smolka, S.A. (eds.) TACAS 2013. LNCS, vol. 7795, pp.
478–492. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36742-
7 33
8. Ardeshir-Larijani, E., Gay, S.J., Nagarajan, R.: Verification of concurrent quantum
protocols by equivalence checking. In: Ábrahám, E., Havelund, K. (eds.) TACAS
2014. LNCS, vol. 8413, pp. 500–514. Springer, Heidelberg (2014). https://doi.org/
10.1007/978-3-642-54862-8 42
9. Baltag, A., Smets, S.: LQP: the dynamic logic of quantum information. Math.
Struct. Comput. Sci. 16(3), 491–525 (2006)
10. Barthe, G., Dupressoir, F., Grégoire, B., Kunz, C., Schmidt, B., Strub, P.-Y.:
EasyCrypt: a tutorial. In: Aldini, A., Lopez, J., Martinelli, F. (eds.) FOSAD 2012-
2013. LNCS, vol. 8604, pp. 146–166. Springer, Cham (2014). https://doi.org/10.
1007/978-3-319-10082-1 6
11. Barthe, G., Grégoire, B., Heraud, S., Béguelin, S.Z.: Computer-aided security
proofs for the working cryptographer. In: Rogaway, P. (ed.) CRYPTO 2011. LNCS,
vol. 6841, pp. 71–90. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-
642-22792-9 5
12. Bennett, C.H., Brassard, G.: Quantum cryptography: public key distribution and
coin tossing. In: International Conference on Computers, Systems and Signal Pro-
cessing, pp. 175–179. IEEE (1984)
13. Bentkamp, A.: Expressiveness of deep learning. Archive of Formal Proofs, Formal
proof development, November 2016. http://isa-afp.org/entries/Deep Learning.
html
14. Bentkamp, A., Blanchette, J.C., Klakow, D.: A formal proof of the expressiveness
of deep learning. In: Interactive Theorem Proving - 8th International Conference,
ITP 2017, Brası́lia, Brazil, September 26–29, 2017, Proceedings, pp. 46–64 (2017).
https://dblp.org/rec/bib/conf/itp/BentkampBK17
15. Bertot, Y., Castran, P.: Interactive Theorem Proving and Program Development:
Coq’Art The Calculus of Inductive Constructions, 1st edn. Springer, Heidelberg
(2010). https://doi.org/10.1007/978-3-662-07964-5
16. Bettelli, S., Calarco, T., Serafini, L.: Toward an architecture for quantum program-
ming. Eur. Phys. J. D 25, 181–200 (2003)
17. Boender, J., Kammüller, F., Nagarajan, R.: Formalization of quantum protocols
using Coq. In: QPL 2015 (2015)
Formal Verification of Quantum Algorithms Using Quantum Hoare Logic 205
18. Boyer, M., Brassard, G., Høyer, P., Tapp, A.: Tight bounds on quantum searching.
Fortschr. der Phys. Prog. Phys. 46(4–5), 493–505 (1998)
19. Brassard, G., Hoyer, P., Mosca, M., Tapp, A.: Quantum amplitude amplification
and estimation. Contemp. Math. 305, 53–74 (2002)
20. Brunet, O., Jorrand, P.: Dynamic quantum logic for quantum programs. Int. J.
Quantum Inf. 2, 45–54 (2004)
21. Chadha, R., Mateus, P., Sernadas, A.: Reasoning about imperative quantum pro-
grams. Electron. Notes Theoret. Comput. Sci. 158, 19–39 (2006)
22. Cross, A.W., Bishop, L.S., Smolin, J.A., Gambetta, J.M.: Open quantum assembly
language. arXiv preprint arXiv:1707.03429 (2017)
23. Debnath, S., Linke, N.M., Figgatt, C., Landsman, K.A., Wright, K., Monroe, C.:
Demonstration of a small programmable quantum computer with atomic qubits.
Nature 536(7614), 63–66 (2016)
24. D’Hondt, E., Panangaden, P.: Quantum weakest preconditions. Math. Struct.
Comput. Sci. 16, 429–451 (2006)
25. Wecker, D., Svore, K.: Liqui|: a software design architecture and domain-specific
language for quantum computing. (http://research.microsoft.com/en-us/projects/
liquid/)
26. Fulton, N., Mitsch, S., Quesel, J.-D., Völp, M., Platzer, A.: KeYmaera X: an
axiomatic tactical theorem prover for hybrid systems. In: Felty, A.P., Middeldorp,
A. (eds.) CADE 2015. LNCS (LNAI), vol. 9195, pp. 527–538. Springer, Cham
(2015). https://doi.org/10.1007/978-3-319-21401-6 36
27. Gay, S.: Quantum programming languages: survey and bibliography. Math. Struct.
Comput. Sci. 16, 581–600 (2006)
28. Gay, S.J., Nagarajan, R., Papanikolaou, N.: QMC: a model checker for quantum
systems. In: Gupta, A., Malik, S. (eds.) CAV 2008. LNCS, vol. 5123, pp. 543–547.
Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70545-1 51
29. Gay, S.J., Nagarajan, R., Papanikolaou, N.: Probabilistic model-checking of quan-
tum protocols. In: DCM Proceedings of International Workshop on Developments
in Computational Models, p. 504007. IEEE (2005). https://arxiv.org/abs/quant-
ph/0504007
30. Google AI Quantum team. https://github.com/quantumlib/Cirq
31. Green, A.S., Lumsdaine, P.L., Ross, N.J., Selinger, P., Valiron, B.: Quipper: a scal-
able quantum programming language. In: Proceedings of the 34th ACM SIGPLAN
Conference on Programming Language Design and Implementation, PLDI 2013,
pp. 333–342. ACM, New York (2013)
32. Grover, L.K.: A fast quantum mechanical algorithm for database search. In: Pro-
ceedings of the Twenty-eighth Annual ACM Symposium on Theory of Computing,
STOC 1996, pp. 212–219. ACM, New York (1996)
33. Haftmann, F., Wenzel, M.: Local theory specifications in isabelle/isar. In: Berardi,
S., Damiani, F., de’Liguoro, U. (eds.) TYPES 2008. LNCS, vol. 5497, pp. 153–168.
Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02444-3 10
34. He, J.: From CSP to hybrid systems. In: A Classical Mind, Essays in Honour of
C.A.R. Hoare, pp. 171–189. Prentice Hall International (UK) Ltd. (1994)
35. JavadiAbhari, A., et al.: ScaffCC: scalable compilation and analysis of quantum
programs. In: Parallel Computing, vol. 45, pp. 3–17 (2015)
36. Kakutani, Y.: A logic for formal verification of quantum programs. In: Datta,
A. (ed.) ASIAN 2009. LNCS, vol. 5913, pp. 79–93. Springer, Heidelberg (2009).
https://doi.org/10.1007/978-3-642-10622-4 7
206 J. Liu et al.
37. Kwiatkowska, M., Norman, G., Parker, P.: Probabilistic symbolic model-checking
with PRISM: a hybrid approach. Int. J. Softw. Tools Technol. Transf. 6, 128–142
(2004)
38. Li, Y., Yu, N., Ying, M.: Termination of nondeterministic quantum programs. Acta
Informatica 51, 1–24 (2014)
39. Liu, S., et al.: Q|SI: a quantum programming environment. In: Jones, C., Wang,
J., Zhan, N. (eds.) Symposium on Real-Time and Hybrid Systems. LNCS, vol.
11180, pp. 133–164. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-
01461-2 8
40. Liu, X., Kubiatowicz, J.: Chisel-Q: designing quantum circuits with a scala embed-
ded language. In: 2013 IEEE 31st International Conference on Computer Design
(ICCD), pp. 427–434. IEEE (2013)
41. Nagarajan, R., Gay, S.: Formal verification of quantum protocols (2002).
arXiv: quant-ph/0203086
42. Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information:
10th Anniversary Edition, 10th edn. Cambridge University Press, New York (2011)
43. Nipkow, T., Klein, G.: Concrete Semantics: With Isabelle/HOL. Springer, Cham
(2014). https://doi.org/10.1007/978-3-319-10542-0
44. Nipkow, T., Wenzel, M., Paulson, L.C. (eds.): Isabelle/HOL: A Proof Assistant
for Higher-Order Logic. LNCS, vol. 2283. Springer, Heidelberg (2002). https://
doi.org/10.1007/3-540-45949-9
45. Ömer, B.: Structured quantum programming. Ph.D. thesis, Technical University
of Vienna (2003)
46. Papanikolaou, N.: Model checking quantum protocols. Ph.D. thesis, Department
of Computer Science, University of Warwick (2008)
47. Paykin, J., Rand, R., Zdancewic, S.: QWIRE: a core language for quantum circuits.
In: Proceedings of 44th ACM Symposium on Principles of Programming Languages
(POPL), pp. 846–858 (2017)
48. Platzer, A.: A complete uniform substitution calculus for differential dynamic logic.
J. Autom. Reas. 59(2), 219–265 (2017)
49. Rand, R.: Formally verified quantum programming. Ph.D. thesis, University of
Pennsylvania (2018)
50. Robert Rand, J.P., Zdancewic, S.: QWIRE practice: formal verification of quantum
circuits in coq. In: Quantum Physics and Logic (2017)
51. Sanders, J.W., Zuliani, P.: Quantum programming. In: Backhouse, R., Oliveira,
J.N. (eds.) MPC 2000. LNCS, vol. 1837, pp. 80–99. Springer, Heidelberg (2000).
https://doi.org/10.1007/10722010 6
52. Selinger, P.: A brief survey of quantum programming languages. In: Kameyama, Y.,
Stuckey, P.J. (eds.) FLOPS 2004. LNCS, vol. 2998, pp. 1–6. Springer, Heidelberg
(2004). https://doi.org/10.1007/978-3-540-24754-8 1
53. Selinger, P.: Towards a quantum programming language. Math. Struct. Comput.
Sci. 14(4), 527–586 (2004)
54. Sharir, M., Pnueli, A., Hart, S.: Verification of probabilistic programs. SIAM J.
Comput. 13, 292–314 (1984)
55. Smith, R.S., Curtis, M.J., Zeng, W.J.: A practical quantum instruction set archi-
tecture. arXiv preprint arXiv:1608.03355 (2016)
56. Steiger, D.S., Häner, T., Troyer, M.: ProjectQ: an open source software framework
for quantum computing. Quantum 2, 49 (2018)
57. Svore, K., et al.: Q#: enabling scalable quantum computing and development with
a high-level DSL. In: Proceedings of the Real World Domain Specific Languages
Workshop 2018, pp. 7:1–7:10 (2018)
Formal Verification of Quantum Algorithms Using Quantum Hoare Logic 207
58. Thiemann, R., Yamada, A.: Matrices, Jordan normal forms, and spectral radius
theory. Archive of Formal Proofs, Formal proof development, August 2015. http://
isa-afp.org/entries/Jordan Normal Form.html
59. Thiemann, R., Yamada, A.: Formalizing Jordan normal forms in Isabelle/HOL.
In: Proceedings of the 5th ACM SIGPLAN Conference on Certified Programs and
Proofs, CPP 2016, pp. 88–99. ACM, New York (2016)
60. Wang, S., Zhan, N., Zou, L.: An improved HHL prover: an interactive theorem
prover for hybrid systems. In: Butler, M., Conchon, S., Zaı̈di, F. (eds.) ICFEM
2015. LNCS, vol. 9407, pp. 382–399. Springer, Cham (2015). https://doi.org/10.
1007/978-3-319-25423-4 25
61. Ying, M.: Floyd-Hoare logic for quantum programs. ACM Trans. Programm. Lang.
Syst. 33(6), 19:1–19:49 (2011)
62. Ying, M., Ying, S., Wu, X.: Invariants of quantum programs: characterisations and
generation. In: Proceedings of the 44th ACM SIGPLAN Symposium on Principles
of Programming Languages, POPL 2017, pp. 818–832 (2017)
63. Ying, M., Yu, N., Feng, Y., Duan, R.: Verification of quantum programs. Sci.
Comput. Programm. 78, 1679–1700 (2013)
64. Ying, S., Feng, Y., Yu, N., Ying, M.: Reachability probabilities of quantum Markov
chains. In: D’Argenio, P.R., Melgratti, H. (eds.) CONCUR 2013. LNCS, vol.
8052, pp. 334–348. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-
642-40184-8 24
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
SecCSL: Security Concurrent Separation
Logic
1 Introduction
Software verification successes abound, whether via interactive proof or via auto-
matic program verifiers. While the former has yielded individual, deeply verified
software artifacts [21,24,25] primarily by researchers, the latter appears to be
having a growing impact on industrial software engineering [11,36,39].
At the same time, recent work has heralded major advancements in program
logics for reasoning about secure information flow [23,33,34]—i.e. whether pro-
grams properly protect their secrets—yielding the first general program logics
and proofs of information flow security for non-trivial concurrent programs [34].
Yet so far, such logics have remained confined to interactive proof assistants,
making them practically inaccessible to industrial developers.
This is not especially surprising. The Covern logic [34], for example, pays for
its generality with regard to expressive security policies, in terms of complexity.
Worse, these logics reason only over very simple toy programming languages,
which even lack support for pointers, arrays, and structures. Their complexity, we
argue, hinders proof automation and makes scaling up these logics to real-world
languages impractical. How, therefore, can we leverage the power of existing
automatic deductive verification approaches for security proofs?
In this paper we present Security Concurrent Separation Logic (SecCSL),
which achieves an unprecedented combination of simplicity, power, and ease of
2 An Overview of SecCSL
sink OUTPUT_REG should never have confidential data written to it. Therefore the
example only ever writes non-confidential data into OUTPUT_REG.
Condition (1) specifies the sensitivity of a data value in memory, whereas
condition (2) specifies the sensitivity of the data that a memory location (i.e. data
sink) is permitted to hold. Prior security separation logics [14,20] reason only
about value-sensitivity condition (1) but, as we explain below, both are needed.
Like those prior logics, in SecCSL one specifies the sensitivity of the value
denoted by an expression e via a security label : the assertion e :: means that
the sensitivity of the value denoted by expression e is at most . Security labels
are drawn from a lattice with top element high (denoting the most confidential
information), bottom element low (denoting public information), and ordered
via : means that information labelled with is at least as sensitive as
that labelled by . Using this style of assertion, in conjunction with standard
separation logic connectives (explained below), condition (1) can be specified as:
with a security label , and are used to specify which parts of the memory are
observable to the attacker (and so must never contain sensitive information):
→ e means that the value denoted by the expression e is present in memory
e −
at the location denoted by e, and additionally that at all times the sensitivity of
the value stored in that locations is never allowed to exceed . Thus in SecCSL,
→ e abbreviates e −−→ e . In Fig. 1, that OUTPUT_REG is publicly-observable
high
e −
can be specified as:
low
∃v. OUTPUT_REG −−→ v (2)
A {P } c {Q} (3)
e :: = x | f (e1 , . . . , en ) | e1 = e2 | φ ? e1 : e2 ρ :: = φ | e :: el | ρ1 ⇒ ρ2
Disjunction, negation, and implication are excluded because they cause issues
for describing the set of -visible heap location to the -attacker, similarly to the
problem of defining heap footprints for non-precise assertions [26,40,41]. These
connectives can still occur between pure and relational expressions.
The standard expression semantics es evaluates e over a store s, which
assigns values to variables x as s(x). The interpretation f A of a function symbol f
is a function, given statically by a logical structure A. Specifically, A is the
semantic ordering of the security lattice. We write s |= φ if φs = true.
214 G. Ernst and T. Murray
The relational semantics of assertions, written (s, h), (s , h ) |= P , is defined
in Fig. 2 over two states (s, h) and (s , h ) each consisting of a store and a
heap. The semantics is defined against the attacker security level (called A in
Sect. 2.3). Stores s and s are related via e :: el . We require the expression el
denoting the sensitivity to coincide on s and s and whenever el s A
holds, e must evaluate to the same value both states, (7). Heaps are related
el
by (s, h), (s , h ) |= ep −→ ev , which similarly ensures that the two heap frag-
ments are identical h = h when el says so, (9). Conditional assertions φ ? P : Q
evaluate to P when φ holds (relationally), and to Q otherwise. The separat-
ing conjunction splits both heaps independently, (12). Similarly, the existential
quantifier picks two values v and v , (13). Whether parts of the split resp. these
two values actually agree will depend on other assertions made.
holds in some (initial) state (see Sect. 2.3). We define this set in Fig. 3, denoted
lows (P, s) for initial store s. Note that, by design, the definition does not give a
low
useful result for an existential like ∃p v. p −−→ v. This mirrors the usual difficulty
of defining footprints for non-precise separation logic assertions [26,40,41]. This
el
restriction is not an issue in practice, as location sensitivity assertions ep −→ ev
are intended to describe the static regions of memory (data sinks) visible to the
attacker, for which existential quantification over variables free in ep or el is not
necessary. A generalization to all precise predicates should be possible.
3.2 Entailments
Although implications between spatial formulas is not part of the assertion lan-
guage, entailments P =⇒ Q between assertions still play a role in SecCSL’s
Hoare style consequence rule (Conseq in Fig. 4). We discuss entailment now as
it sheds useful light on some consequences of SecCSL’s relational semantics.
Definition 1 (Secure Entailment). P =⇒ Q holds iff
– (s, h), (s , h ) |= P implies (s, h), (s , h ) |= Q for all s, h and s , h , and
– lows (P, s) ⊆ lows (Q, s) for all s
The security level is used not just in the evaluation of the assertions but also
to preserve the -attacker visible locations of P in Q. This reflects the intuition
that P is stronger than Q, and so Q should make fewer assumptions than P on
the limitations of an attacker’s observational powers.
216 G. Ernst and T. Murray
Proposition 1.
e = e ∧ el = el ∧ e :: el =⇒ e :: el (14)
e :: el ∧ el el ∧ el :: =⇒ e :: el (15)
el :: =⇒ c :: el for a constant c (16)
e1 :: el ∧ · · · ∧ en :: el =⇒ f (e1 , . . . , en ) :: el for n > 0 (17)
el el
ep −→ ev ∧ el =⇒ ep −
→ ev ∧ ep :: el ∧ ev :: el (18)
∀ s. lows (P, s) = lows (Q, s) implies φ ∧ (φ ? P : Q) =⇒ P (19)
P =⇒ P and Q =⇒ Q implies P Q =⇒ P Q (20)
a static collection of valid lock identifiers l, each of which has an assertion as its
associated invariant inv(l), characterizing the protected portion of the heap. We
describe the program semantics in Sect. 4 as part of the soundness proof.
The SecCSL proof rules are shown in Fig. 4. They extend the standard
rules of concurrent separation logic [38] (CSL) by additional side-conditions that
amount to information flow checks e :: _ as part of the respective preconditions.
Similarly to [46], without loss of generality we require that assignments (rules
Asg, Load) are always to distinct variables, to avoid renaming in the assertions.
In the postcondition of Load, x :: el can be derived by Conseq for (18). Stor-
el
ing to a heap location through an el -sensitive location ep −→ ev (rule Store)
requires that the value ev written to that location admits the corresponding secu-
rity level el of the location ep . Note that due to monotonicity (15) the security
level does not have to match exactly. The rules for locking are standard [12]. To
preclude information leaks through timing channels, the execution can branch
on non-secret values only. This manifests in side conditions b:: for the respective
branching condition b where, recall, is the attacker security level (If, While).
Logical Split picks those two cases where φs = φs , ruling out the other two
by φ :: . The consequence rule (Conseq) uses entailment relative to (Defini-
tion 1). Rule Par has the usual proviso that the variables modified in one thread
cannot interfere with those relied on by the other and its pre-/postcondition.
The soundness theorem for SecCSL guarantees that if some triple {P } c {Q}
is derived using the rules of Fig. 4, then: all executions of c started in a state
satisfying precondition P are memory safe, partially correct with respect to
postcondition Q, and moreover secure with respect to the sensitivity of values
as denoted by P and Q and at all times respect the sensitivity of locations as
denoted by P (see Sect. 2.3). Proof outlines are relegated to Appendix B. All
results have been mechanised in Isabelle/HOL [37] and are available at [18].
The top-level security property of SecCSL is a noninterference condi-
tion [19]. Noninterference as a security property specifies, roughly, that for any
pair of executions that start in states that agree on the values of all attacker-
observable inputs, then, from the attacker’s point of view the resulting executions
will be indistinguishable, i.e. all of the attacker visible observations will agree.
In SecCSL, what is “attacker-observable” depends on the attacker level . The
“inputs” are the expressions e, and the attacker-visible inputs are those expres-
sions e whose sensitivity is given by e :: judgements in the precondition P for
which . The attacker-visible observations are the contents of all memory
locations in lows (P, s), for initial store s and precondition P . Thus we define
when two heaps are indistinguishable to the -attacker.
218 G. Ernst and T. Murray
the actions 1 and 2 respectively denote the execution of the left- and right-
hand sides of a parallel composition for a single step, and so define a deter-
ministic scheduling discipline reminiscent of separation kernels [32]. For exam-
1·σ σ
ple, (run c1 c2 , L, s, h) −→ (run c1 c2 , L , s , h ) if (run c1 , L, s, h) −→
(run c1 , L , s , h ). Configurations (run lock l, L, s, h) can only be scheduled if
l ∈ L (symmetrically for unlock)) and otherwise block without a possible step.
σ1 ···σn ∗ σi
Executions k1 −−− −−−→ kn+1 chain several steps ki −→ ki+1 by accumulating
the schedule. We are considering partial correctness only, thus the schedule is
always finite and so are all executions. The rules for program steps are otherwise
standard and can be found in Appendix A.
(si , hi ), (si , hi ) |= Pi F invs(Li ) where invs(Li ) = li ∈Li inv(li ) (22)
Here Pi describes the part of the heap that command ci is currently accessing.
invs(Li ) is the set of lock invariants for the locks li ∈ Li not currently acquired.
Its presence ensures that whenever a lock is acquired that the associated invariant
can be assumed to hold. Finally F is an arbitrary frame, an assertion that does
not mention variables updated by ci . Its inclusion allows the security property
to compose with respect to different parts of the heap.
Moreover, each Pi+1 invs(Li+1 ) is required to preserve the sensitivity of all
-visible heap locations of Pi invs(Li ), i.e. so that lows (Pi invs(Li ), si ) ⊆
lows (Pi+1 invs(Li+1 ), si+1 ). If some intermediate step m ≤ n terminates, then
Pm+1 = Q, ensuring the postcondition holds when the executions terminate.
Lastly, neither execution is allowed to reach an abort configuration.
220 G. Ernst and T. Murray
If the initial state satisfies P1 F invs(L1 ) then (22) holds throughout the
entire execution, and establishes the end-to-end property that any final state
indeed satisfies the postcondition and that lows (P1 invs(L1 ), s1 ) ⊆ lows (Pi
invs(Li ), si ) with respect to the initially specified low locations.
The property securen (P, c, Q) is defined recursively to match the steps of the
lockstep execution of the program.
Definition 3 (Security).
– secure0 (P1 , c1 , Q) holds always.
– securen+1
(P1 , c1 , Q) holds, iff for all pairs of states (s1 , h1 ), (s1 , h1 ),
frames F , and sets of locks L1 , such that (s1 , h1 ), (s1 , h1 ) |= P1
σ
F invs(L1 ), and given two steps (run c1 , L1 , s1 , h1 ) −→ k and
σ
(run c1 , L1 , s1 , h1 ) −→ k there exists an assertion P2 and a pair of suc-
cessor states with either of
• k = (stop L2 , s2 , h2 ) and k = (stop L2 , s2 , h2 ) and P2 = Q
• k = (run c2 , L2 , s2 , h2 ) and k = (run c2 , L2 , s2 , h2 ) with
securen (P2 , c2 , Q)
such that (s2 , h2 ), (s2 , h2 ) |= P2 F invs(L2 ) and lows (P1
invs(L1 ), s1 ) ⊆ lows (P2 invs(L2 ), s2 ) in both cases.
Two further side condition are imposed, ensuring all mutable shared state lies in
the heap (cf. Sect. 3): c1 doesn’t modify variables occurring in invs(L1 ) and F
(which guarantees that both remain intact), and the free variables in P2 can
only mention those already present in P1 , c1 , or in any lock invariant (which
guarantees that P2 remains stable against concurrent assignments). Note that
each step can pick a different frame F , as required for the soundness of rule Par.
The top-level noninterference property also follows from Lemma 2 via Lemma
1. For brevity, we state the noninterference property directly in the theorem:
SecCSL: Security Concurrent Separation Logic 221
Features. In addition to the logic from Sect. 3, SecC supports procedure mod-
ular verification with pre-/postconditions as usual; and it supports user-defined
spatial predicates. While some issues of the C source language are not addressed
(yet), such as integer overflow, those that impact directly on information flow
security are taken into account. Specifically, the shortcut semantics of boolean
operators &&, ||, and ternary _ ? _ : _ count as branching points and as such
222 G. Ernst and T. Murray
the left hand side resp. the test must not depend on sensitive data, similarly to
the conditions of if statements and while loops.
A direct benefit of the integration of security levels into the assertion lan-
guage is that it becomes possible to specify the sensitivity of data passed to
library and operating system functions. For example, the execution time of
malloc(len) would depend on the value of len, which can thus be required
to satisfy len :: low by annotating its function header with an appropriate pre-
condition, using SecC’s requires annotation. Likewise, SecC can reason about
limited forms of declassification, in which external functions are trusted to safely
release otherwise sensitive data, by giving them appropriate pre-/postconditions.
For example, a password hashing library function prototype might be annotated
with a postcondition asserting its result is low, via SecC’s ensures annotation.
Examples and Case Study. SecC proves Fig. 1 secure, and correctly flags buggy
variants as insecure, e.g., where the test in thread 1 is reversed, or when thread 2
does not clear the data field upon setting the is_classified to FALSE. SecC also
correctly analyzes those 7 examples from [17] that are supported by the logic
and tool (each in ∼10 ms). All examples are available at [18].
To compare SecC and SecCSL against the recent Covern logic [34], we
took a non-trivial example program that Murray et al. verified in Covern, man-
ually translated it to C, and verified it automatically using SecC. The original
program2 , written in Covern’s tiny While language embedded in Isabelle/HOL,
models the software functionality of a simplified implementation of the Cross
Domain Desktop Compositor (CDDC) [5]. The CDDC is a device that facili-
tates interactions with multiple PCs, each of which runs applications at differing
sensitivity, from a single keyboard, mouse and display. Its multi-threaded soft-
ware handles routing of keyboard input to the appropriate PC and switching
between the PCs via mouse gestures. Verifying the C translation required adding
SecCSL annotations for procedure pre-/postconditions and loop invariants. The
C translation including those annotations is ∼250 lines in length. The present,
unoptimised, implementation of SecC verifies the resulting artifact in ∼5 s. In
contrast, the Covern proof of this example requires ∼600 lines of Isabelle/HOL
definitions/specification, plus ∼550 lines of Isabelle proof script.
6 Related Work
There has been much work targeting type systems and program logics for con-
current information flow. Karbyshev et al. [23] provide an excellent overview.
Here we concentrate on work whose ideas are most closely related to SecCSL.
Costanzo and Shao [14] propose a sequential separation logic for reasoning
about information flow. Unlike SecCSL, theirs does not distinguish value and
location sensitivity. Their separation logic assertions have a fairly standard (non-
relational) semantics, at the price of having a security-aware language semantics
2
https://bitbucket.org/covern/covern/src/master/examples/cddc/Example_CDDC_
WhileLockLanguage.thy.
SecCSL: Security Concurrent Separation Logic 223
that propagates security labels attached to values in the store and heap. As
mentioned in Sect. 3.2, this has the unfortunate side-effect of breaking intuitive
properties about sensitivity assertions. We conjecture that the absence of such
properties would make their logic harder to automate than SecCSL, which
SecC demonstrates is feasible. SecCSL avoids the aforementioned drawbacks
by adopting a relational assertion semantics.
Gruetter and Murray [20] propose a security separation logic in Coq [8] for
Verifiable C, the C subset of the Verified Software Toolchain [2,3]. However they
provide no soundness proof for its rules and its feasibility to automate is unclear.
Two recent compositional logics for concurrent information flow are the Cov-
ern logic [34] and the type and effect system of Karbyshev et al. [23]. Both
borrow ideas from separation logic. However, unlike SecCSL, neither is defined
for languages with pointers, arrays etc.
Like SecCSL, Covern proves a timing-sensitive security property. Location
sensitivity is defined statically by value-dependent predicates, and value sensi-
tivity is tracked by a dependent security typing context Γ [35], relative to a
Hoare logic predicate P over the entire shared memory. In Covern locks carry
non-relational invariants. In contrast, SecCSL unifies these elements together
into separation logic assertions with a relational semantics. Doing so leads to a
much simpler logic, amenable to automation, while supporting pointers, etc.
On the other hand, Karbyshev et al. [23] prove a timing-insensitive security
property, but rely on primitives to interact with the scheduler to prevent leaks via
scheduling decisions. Unlike SecCSL, which assumes a deterministic scheduling
discipline, Karbyshev et al. support a wider class of scheduling policies. Their sys-
tem tracks resource ownership and transfer between threads at synchronisation
points, similar to CSLs. Their resources include labelled scheduler resources that
account for scheduler interaction, including when scheduling decisions become
tainted by secret data—something that cannot occur in SecCSL’s deterministic
scheduling model.
Prior logics for sequential languages, e.g. [1,4], have also adopted separa-
tion logic ideas to reason locally about memory, combining them with relational
assertions similar to SecCSL’s e :: el assertions. For instance, the agreement
assertions A(e) of [4] coincide with SecCSL’s e :: low. Unlike SecCSL, some of
these logics support languages with explicit declassification actions [4].
Self-composition is another technique to exploit existing verification infras-
tructure for proofs of general hyperproperties [13], including but not limited to
non-interference. Eilers et al. [17] present such an approach for Viper, which
supports an assertion language similar to that of separation logic. It does not
support public heap locations (which are information sources and sinks at the
same time) albeit sinks can be modeled via preconditions of procedures. A similar
approach is implemented in Frama-C [9]. Both of [9,17] do not support concur-
rency, and it remains unclear how self-composition could avoid an exponential
blow-up from concurrent interleaving, which SecCSL avoids.
The soundness proof for SecCSL follows the general structure of
Vafeiadis’ [46] for CSL, which is also mechanised in Isabelle/HOL. There is,
224 G. Ernst and T. Murray
7 Conclusion
We presented SecCSL, a concurrent separation logic for proving expressive data-
dependent information flow properties of programs. SecCSL is considerably
simpler, yet handles features like pointers, arrays etc., which are out of scope for
contemporary logics. It inherits the structure of traditional concurrent separation
logics, and so like those logics can be automated via symbolic execution [10,
22,30]. To demonstrate this, we implemented SecC, an automatic verifier for
expressive information flow security for a subset of the C language.
Separation logic has proved to be a remarkably powerful vehicle for reason-
ing about programs, weak memory concurrency [47], program synthesis [42], and
many other domains. With SecCSL, we hope that in future the same possibili-
ties might be opened to verified information flow security.
Acknowledgement. We thank the anonymous reviewers for their careful and detailed
comments that helped significantly to clarify the discussion of finer points.
This research was sponsored by the Department of the Navy, Office of Naval
Research, under award #N62909-18-1-2049. Any opinions, findings, and conclusions
or recommendations expressed in this material are those of the author(s) and do not
necessarily reflect the views of the Office of Naval Research.
A Command Semantics
Symmetric parallel rules in which c2 is scheduled under the action 2 omitted.
l∈L L = L \ {l}
τ
(run lock l, L, s, h) −→ (stop L , s, h)
SecCSL: Security Concurrent Separation Logic 225
l∈
/L L = L ∪ {l} σ
(run c1 , L, s, h) −→ abort
τ σ
(run unlock l, L, s, h) −→ (stop L , s, h) (run c1 ; c2 , L, s, h) −→ abort
σ
(run c1 , L, s, h) −→ abort σ
(run c1 , L, s, h) −→ (stop L , s , h )
1·σ σ
(run c1 c2 , L, s, h) −−−→ abort (run c1 ; c2 , L, s, h) −→ (run c2 , L , s , h )
σ
(run c1 , L, s, h) −→ (run c1 , L , s , h )
σ
(run c1 ; c2 , L, s, h) −→ (run c1 ; c2 , L , s , h )
σ
(run c1 , L, s, h) −→ (stop L , s , h )
1·σ
(run c1 c2 , L, s, h) −−−→ (run c2 , L , s , h )
σ
(run c1 , L, s, h) −→ (run c1 , L , s , h )
1·σ
(run c1 c2 , L, s, h) −−−→ (run c1 c2 , L , s , h )
if s |= b then c = c1 else c = c2
τ
(run if b then c1 else c2 , L, s, h) −−→ (run c , L, s, h)
s |= b
τ
(run while b do c, L, s, h) −−→ (stop L, s, h)
s |= b
τ
(run whileb do c, L, s, h) −−→ (run (c; ω), L, s, h)
ω
σ σ
1
k −→ k k −→
2 ∗
k
σ1 ·σ2 ∗
k −→∗ k k −→ k
B Proofs
Proof of Lemma 1
A
If (s, h), (s , h ) |= P , then h ≡ h for A = lows (P, s).
Proof. By induction on the structure of P , noting that lows (_, s) contains loca-
tions of the corresponding sub-heap only.
226 G. Ernst and T. Murray
Proof of Lemma 2
{P } c {Q} implies securen (P, c, Q) for every n ≥ 0.
Proof (Outline). By induction on the derivation of the validity of the judgement.
Noting that n = 0 is trivial, we may unfold the recursion of the security definition
once to prove the base cases of assignment, load, store, and locking, which then
follow from the respective side conditions of the proof rules.
For rules If and While, the side condition b :: guarantees that the test
evaluates equivalently in the two states and thus execution proceeds with the
same remainder program.
Except for If, all remaining rules need a second induction on n to stepwise
match security of the premise to security of the conclusion (e.g. over the steps
of the first command in a sequential composition c1 ; c2 ).
The rule Frame instantiates the frame F with the same assertion in each
step, whereas Par uses the frame F to preserve the current precondition P2 of c2
over steps of c1 and vice-versa.
Proof of Corollary 1
Given a command c and initial states (s1 , h1 ), (s1 , h1 ) |= P invs(L1 ) and
two executions under the same schedule to resulting configurations k and k
respectively, then {P } c {Q} implies k = abort ∧ k = abort.
Proof. By induction on the number of steps n of the executions from
securen (P, c, Q) via Lemma 2.
Proof of Theorem 1
Given a command c and initial states (s1 , h1 ), (s1 , h1 ) |= P invs(L1 ) and two
complete executions under the same schedule σ
σ
(run c, L1 , s1 , h1 ) −→∗ (stop L2 , s2 , h2 )
σ
(run ci , Li , si , hi ) −→∗ (stop L2 , s2 , h2 )
then {P } c {Q} implies (s2 , h2 ), (s2 , h2 ) |= Q invs(L2 ).
Proof. By induction on the number of steps n of the executions from
securen (P, c, Q) via Lemma 2.
Proof of Theorem 2
Given a command c, and initial states (s1 , h1 ), (s1 , h1 ) |= P invs(L1 ) then
A
{P } c {Q} implies hi ≡ hi , where A = lows (P, s1 ), for all pairs of heaps hi
and hi arising from executing the same schedule from each initial state.
Proof. By induction on the number of steps i up to that state from
securei (P, c, Q) via Lemma 2 we have lows (P invs(L1 ), s1 ) ⊆ lows (Pi
invs(L1 ), si ) transitively over the prefix, where Pi and si are from the i-th state.
The theorem then follows from Lemma 1 in Sect. 3.1.
SecCSL: Security Concurrent Separation Logic 227
References
1. Amtoft, T., Bandhakavi, S., Banerjee, A.: A logic for information flow in object-
oriented programs. In: Proceedings of Principles of Programming Languages
(POPL), pp. 91–102. ACM (2006)
2. Appel, A.W., et al.: Program Logics for Certified Compilers. Cambridge University
Press, New York (2014)
3. Appel, A.W., et al.: The Verified Software Toolchain (2017). https://github.com/
PrincetonUniversity/VST
4. Banerjee, A., Naumann, D.A., Rosenberg, S.: Expressive declassification policies
and modular static enforcement. In: Proceedings of Symposium on Security and
Privacy (S&P), pp. 339–353. IEEE (2008)
5. Beaumont, M., McCarthy, J., Murray, T.: The cross domain desktop compositor:
using hardware-based video compositing for a multi-level secure user interface. In:
Annual Computer Security Applications Conference (ACSAC), pp. 533–545. ACM
(2016)
6. Benton, N.: Simple relational correctness proofs for static analyses and program
transformations. In: Proceedings of Principles of Programming Languages (POPL),
pp. 14–25. ACM (2004)
7. Berdine, J., Calcagno, C., O’Hearn, P.W.: Symbolic execution with separation
logic. In: Yi, K. (ed.) APLAS 2005. LNCS, vol. 3780, pp. 52–68. Springer, Heidel-
berg (2005). https://doi.org/10.1007/11575467_5
8. Bertot, Y., Castéran, P.: Interactive Theorem Proving and Program Development.
Coq’Art: The Calculus of Inductive Constructions. Texts in Theoretical Computer
Science. An EATCS Series. Springer, Heidelberg (2004). https://doi.org/10.1007/
978-3-662-07964-5
9. Blatter, L., Kosmatov, N., Le Gall, P., Prevosto, V., Petiot, G.: Static and dynamic
verification of relational properties on self-composed C code. In: Dubois, C., Wolff,
B. (eds.) TAP 2018. LNCS, vol. 10889, pp. 44–62. Springer, Cham (2018). https://
doi.org/10.1007/978-3-319-92994-1_3
10. Calcagno, C., Distefano, D.: Infer: an automatic program verifier for memory safety
of C programs. In: Bobaru, M., Havelund, K., Holzmann, G.J., Joshi, R. (eds.)
NFM 2011. LNCS, vol. 6617, pp. 459–465. Springer, Heidelberg (2011). https://
doi.org/10.1007/978-3-642-20398-5_33
11. Calcagno, C., et al.: Moving fast with software verification. In: Havelund, K., Holz-
mann, G., Joshi, R. (eds.) NFM 2015. LNCS, vol. 9058, pp. 3–11. Springer, Cham
(2015). https://doi.org/10.1007/978-3-319-17524-9_1
12. Chlipala, A.: Formal Reasoning About Programs (2016)
13. Clarkson, M.R., Schneider, F.B.: Hyperproperties. In: Proceedings of Computer
Security Foundations Symposium (CSF), pp. 51–65 (2008)
14. Costanzo, D., Shao, Z.: A separation logic for enforcing declarative information flow
control policies. In: Abadi, M., Kremer, S. (eds.) POST 2014. LNCS, vol. 8414, pp.
179–198. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54792-
8_10
15. de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R.,
Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg
(2008). https://doi.org/10.1007/978-3-540-78800-3_24
16. Del Tedesco, F., Sands, D., Russo, A.: Fault-resilient non-interference. In: Pro-
ceedings of Computer Security Foundations Symposium (CSF), pp. 401–416. IEEE
(2016)
228 G. Ernst and T. Murray
17. Eilers, M., Müller, P., Hitz, S.: Modular product programs. In: Ahmed, A. (ed.)
ESOP 2018. LNCS, vol. 10801, pp. 502–529. Springer, Cham (2018). https://doi.
org/10.1007/978-3-319-89884-1_18
18. Ernst, G., Murray, T.: SecC tool description and Isabelle theories for SecCSL
(2019). https://covern.org/secc
19. Goguen, J., Meseguer, J.: Security policies and security models. In: Proceedings of
Symposium on Security and Privacy (S&P), Oakland, California, USA, pp. 11–20,
April 1982
20. Gruetter, S., Murray, T.: Short paper: towards information flow reasoning about
real-world C code. In: Proceedings of Workshop on Programming Languages and
Analysis for Security (PLAS), pp. 43–48. ACM (2017)
21. Gu, R., et al.: CertiKOS: an extensible architecture for building certified concurrent
OS kernels. In: Proceedings of USENIX Symposium on Operating Systems Design
and Implementation (OSDI), November 2016
22. Jacobs, B., Smans, J., Philippaerts, P., Vogels, F., Penninckx, W., Piessens, F.:
VeriFast: a powerful, sound, predictable, fast verifier for C and java. In: Bobaru,
M., Havelund, K., Holzmann, G.J., Joshi, R. (eds.) NFM 2011. LNCS, vol. 6617,
pp. 41–55. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20398-
5_4
23. Karbyshev, A., Svendsen, K., Askarov, A., Birkedal, L.: Compositional non-
interference for concurrent programs via separation and framing. In: Bauer, L.,
Küsters, R. (eds.) POST 2018. LNCS, vol. 10804, pp. 53–78. Springer, Cham
(2018). https://doi.org/10.1007/978-3-319-89722-6_3
24. Klein, G., et al.: Comprehensive formal verification of an OS microkernel. ACM
Trans. Comput. Syst. 32(1), 2:1–2:70 (2014)
25. Leroy, X.: Formal verification of a realistic compiler. Commun. ACM 52(7), 107–
115 (2009)
26. Löding, C., Madhusudan, P., Murali, A., Peña, L.: A first order logic with frames.
http://madhu.cs.illinois.edu/FOFrameLogic.pdf
27. Lourenço, L., Caires, L.: Dependent information flow types. In: Proceedings of
Principles of Programming Languages (POPL), Mumbai, India, pp. 317–328, Jan-
uary 2015
28. Mantel, H., Sands, D.: Controlled declassification based on intransitive noninterfer-
ence. In: Chin, W.-N. (ed.) APLAS 2004. LNCS, vol. 3302, pp. 129–145. Springer,
Heidelberg (2004). https://doi.org/10.1007/978-3-540-30477-7_9
29. Mantel, H., Sands, D., Sudbrock, H.: Assumptions and guarantees for composi-
tional noninterference. In: Proceedings of Computer Security Foundations Sympo-
sium (CSF), Cernay-la-Ville, France, pp. 218–232, June 2011
30. Müller, P., Schwerhoff, M., Summers, A.J.: Viper: a verification infrastructure for
permission-based reasoning. In: Jobstmann, B., Leino, K.R.M. (eds.) VMCAI 2016.
LNCS, vol. 9583, pp. 41–62. Springer, Heidelberg (2016). https://doi.org/10.1007/
978-3-662-49122-5_2
31. Murray, T.: Short paper: on high-assurance information-flow-secure programming
languages. In: Proceedings of Workshop on Programming Languages and Analysis
for Security (PLAS), pp. 43–48 (2015)
32. Murray, T., et al.: seL4: from general purpose to a proof of information flow enforce-
ment. In: Proceedings of Symposium on Security and Privacy (S&P), San Fran-
cisco, CA, pp. 415–429, May 2013
33. Murray, T., Sabelfeld, A., Bauer, L.: Special issue on verified information flow
security. J. Comput. Secur. 25(4–5), 319–321 (2017)
SecCSL: Security Concurrent Separation Logic 229
34. Murray, T., Sison, R., Engelhardt, K.: COVERN: a logic for compositional veri-
fication of information flow control. In: Proceedings of European Symposium on
Security and Privacy (EuroS&P), London, United Kingdom, April 2018
35. Murray, T., Sison, R., Pierzchalski, E., Rizkallah, C.: Compositional verification
and refinement of concurrent value-dependent noninterference. In: Proceedings of
Computer Security Foundations Symposium (CSF), pp. 417–431, June 2016
36. Newcombe, C., Rath, T., Zhang, F., Munteanu, B., Brooker, M., Deardeuff, M.:
How Amazon web services uses formal methods. Commun. ACM 58(4), 66–73
(2015)
37. Nipkow, T., Wenzel, M., Paulson, L.C. (eds.): Isabelle/HOL-A Proof Assistant for
Higher-Order Logic. LNCS, vol. 2283. Springer, Heidelberg (2002). https://doi.
org/10.1007/3-540-45949-9
38. O’Hearn, P.W.: Resources, concurrency and local reasoning. In: Gardner, P.,
Yoshida, N. (eds.) CONCUR 2004. LNCS, vol. 3170, pp. 49–67. Springer, Hei-
delberg (2004). https://doi.org/10.1007/978-3-540-28644-8_4
39. O’Hearn, P.W.: Continuous reasoning: scaling the impact of formal methods. In:
Proceedings of Logic in Computer Science (LICS), pp. 13–25. ACM (2018)
40. O’Hearn, P.W., Yang, H., Reynolds, J.C.: Separation and information hiding. ACM
Trans. Programm. Lang. Syst. (TOPLAS) 31(3), 11 (2009)
41. Piskac, R., Wies, T., Zufferey, D.: Automating separation logic using SMT. In:
Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 773–789. Springer,
Heidelberg (2013). https://doi.org/10.1007/978-3-642-39799-8_54
42. Polikarpova, N., Sergey, I.: Structuring the synthesis of heap-manipulating pro-
grams. Proc. ACM Program. Lang. 3(POPL), 72 (2019)
43. Prabawa, A., Al Ameen, M.F., Lee, B., Chin, W.-N.: A logical system for modular
information flow verification. Verification, Model Checking, and Abstract Interpre-
tation. LNCS, vol. 10747, pp. 430–451. Springer, Cham (2018). https://doi.org/
10.1007/978-3-319-73721-8_20
44. Reynolds, J.C.: Separation logic: a logic for shared mutable data structures. In:
Proceedings of Logic in Computer Science (LICS), pp. 55–74. IEEE (2002)
45. Sabelfeld, A., Sands, D.: Probabilistic noninterference for multi-threaded programs.
In: Proceedings of Computer Security Foundations Workshop (CSFW), pp. 200–
214. IEEE (2000)
46. Vafeiadis, V.: Concurrent separation logic and operational semantics. In: Proceed-
ings of Mathematical Foundations of Programming Semantics (MFPS), pp. 335–
351 (2011)
47. Vafeiadis, V., Narayan, C.: Relaxed separation logic: a program logic for C11 con-
currency. In: Proceedings of Object Oriented Programming Systems Languages &
Applications (OOPSLA), pp. 867–884. ACM (2013)
48. Volpano, D., Smith, G.: Probabilistic noninterference in a concurrent language. J.
Comput. Secur. 7(2,3), 231–253 (1999)
49. Yang, H.: Relational separation logic. Theor. Comput. Sci. 375(1–3), 308–334
(2007)
50. Zheng, L., Myers, A.C.: Dynamic security labels and static information flow con-
trol. Int. J. Inf. Secur. 6(2–3), 67–84 (2007)
230 G. Ernst and T. Murray
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Reachability Analysis for AWS-Based
Networks
1 Introduction
Cloud computing provides on-demand access to IT resources such as compute,
storage, and analytics via the Internet with pay-as-you-go pricing. Each of these
IT resources are typically networked together by customers, using a growing
number of virtual networking features. Amazon Web Services (AWS), for exam-
ple, today provides over 30 virtualized networking primitives that allow cus-
tomers to implement a wide variety of cloud-based applications.
Correctly configured networks are a key part of an organization’s security
posture. Clearly documented and, more importantly, verifiable network design
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 231–241, 2019.
https://doi.org/10.1007/978-3-030-25543-5_14
232 J. Backes et al.
is important for compliance audits, e.g. the Payment Card Industry Data Secu-
rity Standard (PCI DSS) [10]. As the scale and diversity of cloud-based services
grows, each new offering used by an organization adds another dimension of pos-
sible interaction at the networking level. Thus, customers and auditors increas-
ingly need tooling for the security of their networks that is accurate, automated
and scalable, allowing them to automatically detect violations of their require-
ments.
In this industrial case-study, we describe a new tool, called Tiros, which
uses off-the-shelf automated theorem proving tools to perform formal analysis of
virtual networks constructed using AWS APIs. Tiros encodes the semantics of
AWS networking concepts into logic and then uses a variety of reasoning engines
to verify security-related properties. Tools that Tiros can use include Soufflé
[17], MonoSAT [3], and Vampire [23]. Tiros performs its analysis statically: it
sends no packets on the customer’s network. This distinction is important. The
size of many customer networks makes it intractable to find problems through
traditional network probing or penetration testing. Tiros allows users to gain
assurance about the security of their networks that would be impossible through
testing.
Tiros is used directly today by AWS customers as part of the Amazon
Inspector service [11], which currently checks six Tiros-based network reach-
ability invariants on customer networks. The use of Tiros is especially pop-
ular amongst security-obsessed customers, e.g., the world’s largest hedge fund
Bridgewater Associates, an AWS customer, recently discussed the importance of
network verification techniques for their organization [6], including their usage
of Tiros.
Related Work. Several previous tools using automated theorem proving have
been developed in an effort to answer questions about software defined networks
(SDNs) [1,2,5,12,13,16,19,25]. Similar to our approach, these tools reduce the
problems to automated reasoning engines. In some cases, they employ over-
approximative static analysis [18,19]. In other cases, they use general purpose
reasoning engines such as Datalog [12,15], BDD [1], SMT [5,16], and SAT
Solvers [2,25]. VeriCon [2], NICE [8], and VeriFlow [19] verify network invari-
ants by analyzing software-defined-network (SDN) programs, with the former
two applying formal software verification techniques, and the latter using static
analysis to split routes into equivalence classes. SecGuru [5,16] uses an SMT
solver to compare the routes admitted by access control lists (ACLs), routing
tables, and border gateway protocol (BGP) policies, but does not support full-
network reachability queries. In our approach we employ multiple encodings and
reasoning engines. Our SMT encoding is similar in design to Anteater [25] and
ConfigChecker [1]. Anteater performs SAT-based bounded model checking [4],
while ConfigChecker uses BDD-based fixed-point model checking [7]. Previous
work has applied Datalog to reachability analysis in either software or network
contexts [12–14,24]. The approach used in Batfish [13,24] and SyNET [12] is
similar to our Datalog approach; they allow users to express general queries
about whole-network reachability properties using an expressive logic language.
Reachability Analysis for AWS-Based Networks 233
Batfish presents results for small but complex routing scenarios, involving a few
dozen routers. SyNET [12] also uses a similar Datalog representation of network
reachability semantics, but rather than verifying network reachability properties,
they provide techniques to synthesize networks from a specification. The focus in
Tiros’s encoding is expressiveness and completeness; it encodes the semantics of
the entire AWS cloud network service stack. It scales well to networks consisting
of hundreds of thousands of instances, routers, and firewall rules.
2 AWS Networking
AWS provides customers with virtualized implementations of practically all
known traditional networking concepts, e.g. subnets, route tables, and NAT
gateways. In order to facilitate on-demand scalability, many AWS network fea-
tures focus on elasticity, e.g. Elastic Load Balancers (ELBs) support autoscaling
groups, which customers configure to describe when/how to scale resource usage.
Another important AWS networking concept is that of Virtual Private Cloud
(VPC), in which customers can use AWS resources in an isolated virtual net-
work that they control. Over 30 additional networking concepts are supported
by AWS, including Elastic Network Interfaces (ENIs), internet gateways, transit
gateways, direct connections, and peering connections.
Figure 1 is an example AWS-based network that consists of two subnets
“Web” and “Database”. The “Web” subnet contains two instances (sometimes
called virtual machines) and the “Database” subnet contains one instance. Note
that these machines are in fact virtualized in the AWS data center. The “Web”
subnet’s route table has a route to the internet gateway, whereas the “Database”
subnet’s route table only has local routes (within the VPC). In addition, each
of the subnets has an ACL that contains security access rules. In particular, one
of the rules forbids SSH access to the database servers.
AWS-based networks frequently start small and grow over time, accumulating
new instances and security and access rules. Customers or regulators want to
234 J. Backes et al.
make sure that their VPC networks retain security invariants as their complexity
grows. A customer may ask network configuration questions such as:
1. “Are there any instances in subnet ‘Web’ that are tagged ‘Bastion’ ?” or net-
work reachability questions such as:
2. “Are there any instances that can be accessed from the public internet over
SSH (TCP port 22)?”
To answer such questions we must reason about which network components are
accessible via feasible paths through the VPC, either from the internet, from
other components in the VPC, or from other components in a different VPC via
a peering connection or transit gateway.
The snapshot part of the network model contains constants and facts (ground
clauses with no antecedents) that describe the configuration of a specific AWS
Reachability Analysis for AWS-Based Networks 235
network. Constants have the form typeid . For example, the snapshot of a network
with an instance with id 1234 in a subnet with id web consists of the constants
instance1234 and subnetweb , and the fact hasSubnet(instance1234 , subnetweb ).
We illustrate the Datalog encoding using examples from Sect. 2. The network
configuration question, q(I), is encoded as q(I) ← hasSubnet(I, subnetweb ) ∧
hasTag(I, tagbastion ). The network reachability question, r(I, E), is encoded as:
protocol bv:8
srcAdr bv:32
dstAdr bv:32
srcPort bv:16
dstPort bv:16
Fig. 2. (Left) The symbolic graph corresponding to the VPC in Fig. 1. (Right) A
simplified symbolic packet, composed of bitvectors.
to be false if the packet’s source address does not match the ENI’s IP address.
This ensures that packets leaving the ENI must have that ENI’s IP address as
their source address. Similar constraints ensure that packets entering the ENI
must have that ENI’s IP address as their destination address.
We encode reachability constraints into this graph using the SMT solver
MonoSAT [3], which supports a theory of finite graph reachability. Specifically,
we add a start and end node to the graph, with edges to the source components
of the query and from the destination components of the query, and then we
enforce a graph reachability constraint reaches(start, end), which is true iff there
is a start-end path under assignment to the edge literals. To encode the query
“Are there any instances that can be accessed from the public internet over
SSH?”, we would add an edge from the start node to the internet, and from each
EC2 instance to the end node. Additionally, we would add bitvector constraints
forcing the protocol of the symbolic packet to be exactly 6 (TCP), and the
destination port to be exactly 22.
Fig. 3. A small portion of the VPC graph, with constraints over the edges between an
ENI and its subnet enforcing that packets entering or leaving the ENI have that ENI’s
source or destination address.
saturation-based theorem prover and a finite model builder, running both modes
in parallel and recording the result of the fastest successful run.
Our encoding begins with the same set of facts as were generated from
the network model by our Datalog encoding, represented here by the symbols
(A1 , A2 , . . .). From there, we handle network configuration and network reach-
ability questions differently, with network-configuration encodings optimized
for proof-by-contradiction, while reachability configurations are optimized for
model-building. Proof-by-contradiction for yes/no questions is potentially faster
than model-building, as intermediate variables need not be enumerated.
We encode a network configuration question ϕ in negated form: A1 ∧ . . . ∧
An ⇒ ¬ϕ. If Vampire can prove a contradiction in the negated formula, then ϕ
holds. We encode a network reachability question ϕ into a formula of the form
A1 ∧ . . . ∧ An ∧ (∀z̄)(q(z̄) ⇔ ϕ) ⇒ (∀z̄)q(z̄), where q is a fresh predicate symbol,
and z̄ are free variables of the network question ϕ. Each substitution of z̄ that
satisfies q corresponds to a distinct solution to the reachability question.
Our encoding targets Vampire’s implementation of many-sorted first-order
logic with equality, extended with the theory of linear integer arithmetic, the
theory of arrays [22], and the theory of tuples [20]. We encode types, constants,
and predicates using Clark completion [9]. We direct the reader to our co-author’s
dissertation (cf. Chapter 5 [21]) for a more detailed explanation of the Vampire
encoding, including a detailed analysis of the performance trade-offs considered
in this encoding.
In this section we describe the performance of the various solvers when used by
Tiros in practice. Recall that our MonoSAT implementation can only answer
reachability questions, whereas the other implementations also answer more gen-
eral network configuration questions (such as the examples in Sect. 2).
In our experiments with Vampire, we found that the first order logic encod-
ing we used does not scale well. As we were not able to obtain good performance
from our Vampire-based implementation, in what follows we only present the
experimental results for MonoSAT and Soufflé. We explain the poor perfor-
mance of the Vampire encoding mainly by the fact that large finite domains,
routinely used in network specifications, are represented as long clauses coming
from the domain closure axioms. Saturation theorem provers, including Vam-
pire, have a hard time dealing with such clauses.
idea of the relative size of the constraint systems solved, in the smallest case
our Soufflé encoding consisted of 2,856 facts, and the MonoSAT encoding
consisted of 609 variables, 21 bitvectors, and 2,032 clauses. In the largest case,
our Soufflé encoding consisted of 7517 facts, and the MonoSAT encoding
consisted of 2,038 variables, 21 bitvectors, and 17,731 clauses.
Fig. 4. Comparison of runtime in seconds for the different solver backends. Each bench-
mark uses a different color, e.g. Soufflé on benchmark-1 is a solid blue line, and
MonoSAT on benchmark-1 is a dashed blue line. In these experiments, Soufflé
recompiles each query before solving it, which adds ≈ 45 s to the runtime of each
Soufflé query. In practice this cost can be amortized by caching compiled queries.
(Color figure online)
Automating PCI Compliance Auditing. Many AWS services are built using other
AWS services, e.g. AWS Lambda is built using AWS EC2 and the various AWS
networking features. Thus within AWS we are using Tiros to prove the cor-
rectness of our own internal requirements. As an example, we use Tiros to
Reachability Analysis for AWS-Based Networks 239
Custom Application. AWS’s Professional Services team works with some of the
most security-obsessed customers to use advanced tools such as Tiros to achieve
custom-tailored solutions. For example, as discussed in a public lecture [6],
Bridgewater Associates worked with AWS Professional Services to build a Tiros-
based solution which proves invariants of new AWS-based network designs before
they are deployed in Bridgewater’s AWS environment. Proof of these invariants
assures the absence of possible data exfiltration paths that could be leveraged
by an adversary.
5 Conclusion
References
1. Al-Shaer, E., Marrero, W., El-Atawy, A., Elbadawi, K.: Network configuration
in A box: towards end-to-end verification of network reachability and security.
In: Proceedings of the 17th Annual IEEE International Conference on Network
Protocols, 2009. ICNP 2009, Princeton, NJ, USA, 13–16 October 2009, pp. 123–
132 (2009). https://doi.org/10.1109/ICNP.2009.5339690
2. Ball, T., et al.: VeriCon: towards verifying controller programs in software-
defined networks. In: ACM SIGPLAN Conference on Programming Language
Design and Implementation, PLDI 2014, Edinburgh, UK, 9–11 June 2014, pp.
282–293 (2014). https://doi.org/10.1145/2594291.2594317, http://doi.acm.org/10.
1145/2594291.2594317
3. Bayless, S., Bayless, N., Hoos, H.H., Hu, A.J.: SAT modulo monotonic theories.
In: Proceedings of AAAI, pp. 3702–3709 (2015)
4. Biere, A., Cimatti, A., Clarke, E., Zhu, Y.: Symbolic model checking without
BDDs. In: Cleaveland, W.R. (ed.) TACAS 1999. LNCS, vol. 1579, pp. 193–207.
Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49059-0 14
5. Bjørner, N., Jayaraman, K.: Checking cloud contracts in Microsoft azure. In:
Natarajan, R., Barua, G., Patra, M.R. (eds.) ICDCIT 2015. LNCS, vol. 8956,
pp. 21–32. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14977-6 2
240 J. Backes et al.
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Distributed Systems and Networks
Verification of Threshold-Based
Distributed Algorithms by Decomposition
to Decidable Logics
1 Introduction
Fault-tolerant distributed protocols play an important role in the avionic and
automotive industries, medical devices, cloud systems, blockchains, etc. Their
unexpected behavior might put human lives at risk or cause a huge financial
loss. Therefore, their correctness is of ultimate importance.
Ensuring correctness of distributed protocols is a notoriously difficult task,
due to the unbounded number of processes and messages, as well as the non-
deterministic behavior caused by the presence of faults, concurrency, and mes-
sage delays. In general, the problem of verifying such protocols is undecidable.
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 245–266, 2019.
https://doi.org/10.1007/978-3-030-25543-5_15
246 I. Berkovits et al.
This imposes two directions for attacking the problem: (i) developing fully-
automatic verification techniques for restricted classes of protocols, or (ii) design-
ing deductive techniques for a wide range of systems that require user assistance.
Within the latter approach, recently emerging techniques [29] leverage decidable
logics that are supported by mature automated solvers to significantly reduce
user effort, and increase verification productivity. Such logics bring several key
benefits: (i) their solvers usually enjoy stable performance, and (ii) whenever
annotations provided by the user are incorrect, the automated solvers can pro-
vide a counterexample for the user to examine.
Deductive verification based on decidable logic requires a logical formalism
that satisfies two conflicting criteria: the formalism should be expressive enough
to capture the protocol, its correctness properties, its inductive invariants, and
ultimately its verification conditions. At the same time, the formalism should be
decidable and have an effective automated tool for checking verification conditions.
In this paper we develop a methodology for deductive verification of
threshold-based distributed protocols using decidable logic, well-established
decidable logics to settle the tension explained above.
In threshold-based protocols, a process may take different actions based on
the number of processes from which it received certain messages. This is often
used to achieve fault-tolerance. For example, a process may take a certain step
once it has received an acknowledgment from a strict majority of its peers, that
is, from more than n/2 processes, where n is the total number of processes.
Such expressions as n/2, are called thresholds, and in general they can depend
on additional parameters, such as the maximal number of crashed processes, or
the maximal number of Byzantine processes.
Verification of such protocols requires two flavors of reasoning, as demon-
strated by the following example. Consider the Paxos [20] protocol, in which
each process proposes a value and all must agree on a common proposal. The
protocol tolerates up to t process crashes, and ensures that every two processes
that decide agree on the decided value. The protocol requires n > 2t processes,
and each process must obtain confirmation messages from n − t processes before
making a decision. The protocol is correct due to, among others, the fact that
if n > 2t then any two sets of n − t processes have a process in common. To
verify this protocol we need to express (i) relationships between an unbounded
number of processes and values, which typically requires quantification over unin-
terpreted domains (“every two processes”), and (ii) properties of sets of certain
cardinalities (“any two sets of n − t processes intersect”). Crucially, these two
types of reasoning are intertwined, as the sets of processes for which we need to
capture cardinalities may be defined by their relations with other state compo-
nents (“messages from at least n − t processes”). While uninterpreted first-order
logic (FOL) seems like the natural fit for the first type of reasoning, it is seemingly
a poor fit for the second type, since it cannot express set cardinalities and the
arithmetic used to define thresholds. Typically, logics that combine both types
of reasoning are either undecidable or not flexible enough to capture protocols
as intricate as the ones we consider.
Verification of Threshold-Based Distributed Algorithms 247
1
An extended version of this paper, which includes additional details and proofs,
appears in [3].
248 I. Berkovits et al.
2 Preliminaries
the set of sorts and the set of edges is defined as follows: every function symbol
introduces edges from its arguments’ sorts to its image’s sort, and every exis-
tential quantifier ∃x that resides in the scope of universal quantifiers introduces
edges from the sorts of the universally quantified variables to the sort of x. The
quantifier alternation graph is extended to sets of formulas as expected.
the protocol and in the verification conditions. For verification to succeed, some
properties of the sets satisfying the cardinality threshold must be captured in
FOL. This is done by introducing additional assumptions (formally, axioms of
the transition system) expressed in FOL, as discussed in Sect. 4.
intersect in n+t
2 nodes, which, after removing t nodes which may be faulty, still
leaves us with more than n−t
2 nodes, satisfying the condition in line 9.
Threshold Conditions. Both the description of the protocol and the inductive
invariant may include conditions that require the size of some set of nodes to be
“at least t”, “at most t”, and so on, where the threshold t is of the form t = k ,
where k is a positive integer, and is a ground BAPA integer term over Prm (we
do not allow comparing sizes of two sets – we observe that it is not needed for
threshold-based protocols). We denote the set of thresholds by T . For example,
in Bosco, T = {n − t, n+3t+1
2 , n−t+1
2 }.
Wlog we assume that all conditions on set cardinalities are of the form “at
least t” since every condition can be written this way, possibly by introducing
new thresholds:
– For every threshold t we introduce a threshold sort sett with the intended
meaning that elements of this sort are sets of nodes whose size is at least t.
– Each sort sett is equipped with a binary relation symbol membert between
sorts node and sett that captures the membership relation of a node in a set.
– For each set parameter a ∈ PrmS we introduce a unary relation symbol
membera over sort node that captures membership of a node in the set a.
252 I. Berkovits et al.
Syntax. We define TIP as follows, with t ∈ T a threshold (of the form k) and
a ∈ PrmS :
TIP restricts the use of set cardinality to threshold guards g≥t (b) with the
meaning |b| ≥ t. No other arithmetic atomic formulas are allowed. Comparison
atomic formulas are restricted to b = ∅ and bc = ∅. Quantifiers must be guarded,
and negation, disjunction and existential quantification are excluded. We forbid
set union and restrict complementation to atomic set terms. We refer to such
formulas as intersection properties since they express properties of intersections
of (atomic) sets.
254 I. Berkovits et al.
Example 1. In Bosco, the following property captures the fact that the intersec-
tion of a set of at least n − t nodes and a set of more than n+3t
2 nodes consists
of at least n−t
2 non-faulty nodes. This is needed for establishing correctness of
the protocol.
Corollary 1. For every closed TIP formula ϕ such that Γ |= ϕ, we have that
FO(ϕ) is satisfied by every threshold-faithful first-order structure.
Notation. For the rest of this section, we fix a set Prm of parameters, a set Γ
of resilience conditions over Prm, and a set T of thresholds. Note that b = ∅ ≡
g≥1 (b) and bc = ∅ ≡ g≥n (b). Therefore, for uniformity of the presentation, given
def
a set T of thresholds, we define T̂ = T ∪ {1, n} and replace atomic formulas of
the form b = ∅ and bc = ∅ by the corresponding guard formulas. As such, the
only atomic formulas are of the form g≥t (b) where t ∈ T̂ . Note that guards in
quantifiers are still restricted to g≥t where t ∈ T . Given a set PrmS , we also
denote Prmˆ S = PrmS ∪ {ac | a ∈ PrmS }.
In this section, we present Aip– an algorithm for inferring all Γ -valid TIP formu-
las. A naı̈ve (non-terminating) algorithm would iteratively check Γ -validity of
every TIP formula. Instead, Aip prunes the search space relying on the following
condition:
If Γ |= t ≤ 0 then t is degenerate in the sense that g≥t (b) is always Γ -valid, and
∀x : g≥t . g≥t (x ∩ b) is never Γ -valid unless t is also degenerate.
We observe that we can (i) push conjunctions outside of formulas (since ∀ dis-
tributes over ∧), and assuming non-degeneracy, (ii) ignore terms of the form xc :
256 I. Berkovits et al.
Interpolant. Δint . There may exist Δint ⊆ Δ s.t. FO(Δint ) |= FO(Δ) but
FO(Δint ) suffices to prove the first-order VCs, and enables to discharge the
VCs more efficiently. We compute such a set Δint iteratively. Initially, Δint = ∅.
In each iteration, we check the VCs. If a counterexample to induction (CTI)
is found, we add to Δint a formula from Δ not satisfied by the CTI. In this
approach, Δ is not pre-computed. Instead, Aip is invoked lazily to generate
candidate formulas in reaction to CTIs.
6 Evaluation
We evaluate the approach by verifying several challenging threshold-based dis-
tributed protocols that use sophisticated thresholds: we verify the safety of
Bosco [39] (presented in Sect. 3) under its 3 different resilience conditions, the
safety and liveness (using the liveness to safety reduction presented in [30]) of
Hybrid Reliable Broadcast [40], and the safety of Byzantine Fast Paxos [23].
Hybrid Reliable Broadcast tolerates four different types of faults, while Fast
Byzantine Paxos is a fast-learning [21,22] Byzantine fault-tolerant consensus
protocol; fast-learning protocols are notorious because two such algorithms,
Zyzzyva [17] and FaB [28], were recently revealed incorrect [1] despite having
been published at major systems conferences.
and tI lists the total Ivy runtime, with the standard deviation specified below. V (resp. I) lists the number of Γ -valid (resp. Γ -invalid)
simple formulas considered before the final set was reached. CTI lists the number of counterexample iterations required, and Q lists
the maximal number of quantifiers of any TIP formula considered. Finally, tv lists the time required to verify the first-order transition
259
system assuming the obtained set of properties. T.O. indicates that a time out of 1 h was reached.
260 I. Berkovits et al.
deviation (σ). The figure’s caption explains the presented information, and we
discuss the results below.
AipEager For all protocols, running Aip took less than 1 min (column tC ), and
generated all Γ -valid simple TIP formulas. We observe that for most formu-
las, (in)validity is deduced from other formulas by subsumption, and less than
2%–5% of the formulas are actually checked using a BAPA query. With the
optimization of the redundancy check, minimization of the set is performed in
negligible time. The resulting set, ΔEager , contains 3–5 formulas, compared to
39–79 before minimization.
Due to the optimization described in Sect. 4 for the BAPA validity queries,
the number of quantifiers in the TIP formulas that are checked by Aip does
not affect the time needed to compute the full Δ. For example, Bosco under
the Strongly One-step resilience condition contains Γ -valid simple TIP formulas
with up to 7 quantifiers (as n > 7t and t1 = n − t), but Aip does not take
significantly longer to find Δ. Interestingly, in this example the Γ -valid TIP
formulas with more than 3 quantifiers are implied (in FOL) by formulas with at
most 3 quantifiers, as indicated by the fact that these are the only formulas that
remain in ΔBosco
Eager
Strongly One-step
.
AipLazy With the lazy approach based on CTIs, the time for finding the set
of TIP formulas, ΔLazy , is generally longer. This is because the run time is
dominated by calls to Ivy with FO axioms that are too weak for verifying the
protocol. However, the resulting ΔLazy has a significant benefit: it lets Ivy prove
the protocol much faster compared to using ΔEager . Comparing tV in AipEager
vs. AipLazy shows that when the former takes a minute, the latter takes a few
seconds, and when the former times out after 1 h, the latter terminates, usually
in under 1 min. Comparing the formulas of ΔEager and ΔLazy reveals the reason.
While the FO translation of both yields EPR formulas, the formulas resulting
from ΔEager contain more quantifiers and generate much more ground terms,
which degrades the performance of Z3.
Another advantage of the lazy approach is that during the search, it avoids
considering formulas with many quantifiers unless those are actually needed.
Comparing the 3 versions of Bosco we see that AipLazy is not sensitive to the
largest number of quantifiers that may appear in a Γ -valid simple TIP formula.
The downside is that AipLazy performs many Ivy checks in order to compute the
final ΔLazy . The total duration of finding CTIs varies significantly (as demon-
strated under the column tI ), in part because it is very sensitive to the CTIs
returned by Ivy, which are in turn affected by the random seed used in the
heuristics of the underlying solver.
Finally, ΔLazy provides more insight into the protocol design, since it presents
minimal assumptions that are required for protocol correctness. Thus, it may be
useful in designing and understanding protocols.
Verification of Threshold-Based Distributed Algorithms 261
7 Related Work
Fully Automatic Verification of Threshold-Based Protocols. Algorithms
modeled as Threshold automata (TA) [14] have been studied in [13,16], and ver-
ified using an automated tool ByMC [15]. The tool also automatically synthe-
sizes thresholds as arithmetic expressions [24]. Reachability properties of TAs
for more general thresholds are studied in [18]. There have been recent advances
in verification of synchronous threshold-based algorithms using TAs [41], and
of asynchronous randomized algorithms where TAs support coin tosses and an
unbounded number of rounds [4]. Still, this modeling is very restrictive and not
as faithful to the pseudo-code as our modeling.
Another approach for full automation is to use sound and incomplete proce-
dures for deduction and invariant search for logics that combine quantifiers and
set cardinalities [8,10]. However, distributed systems of the level of complexity
we consider here (e.g., Byzantine Fast Paxos) are beyond the reach of these
techniques.
Verification using interactive theorem provers. We are not aware of works based
on interactive theorem provers that verified protocols with complex thresholds
as we do in this work (although doing so is of course possible). However, many
works used interactive theorem provers to verify related protocols, e.g., [12,27,
36–38,43] (the most related protocols use either n2 or 2n
3 as the only thresholds,
other protocols do not involve any thresholds). The downside of verification
using interactive theorem provers is that it requires tremendous human efforts
and skills. For example, the Verdi proof of Raft included 50,000 lines of proof
in Coq for 500 lines of code [44].
8 Conclusion
References
1. Abraham, I., Gueta, G., Malkhi, D., Alvisi, L., Kotla, R., Martin, J.P.: Revisiting
Fast Practical Byzantine Fault Tolerance (2017)
2. Bansal, K., Reynolds, A., Barrett, C., Tinelli, C.: A new decision procedure for
finite sets and cardinality constraints in SMT. In: Olivetti, N., Tiwari, A. (eds.)
IJCAR 2016. LNCS (LNAI), vol. 9706, pp. 82–98. Springer, Cham (2016). https://
doi.org/10.1007/978-3-319-40229-1 7
3. Berkovits, I., Lazić, M., Losa, G., Padon, O., Shoham, S.: Verification of
threshold-based distributed algorithms by decomposition to decidable logics. CoRR
abs/1905.07805 (2019). http://arxiv.org/abs/1905.07805
4. Bertrand, N., Konnov, I., Lazic, M., Widder, J.: Verification of Randomized Dis-
tributed Algorithms under Round-Rigid Adversaries. HAL hal-01925533, Novem-
ber 2018. https://hal.inria.fr/hal-01925533
5. de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R.,
Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg
(2008). https://doi.org/10.1007/978-3-540-78800-3 24
6. Drăgoi, C., Henzinger, T.A., Veith, H., Widder, J., Zufferey, D.: A logic-based
framework for verifying consensus algorithms. In: McMillan, K.L., Rival, X. (eds.)
VMCAI 2014. LNCS, vol. 8318, pp. 161–181. Springer, Heidelberg (2014). https://
doi.org/10.1007/978-3-642-54013-4 10
7. Dragoi, C., Henzinger, T.A., Zufferey, D.: PSync: A partially synchronous lan-
guage for fault-tolerant distributed algorithms. In: Proceedings of the 43rd Annual
ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages,
POPL 2016, St. Petersburg, FL, USA, January 20–22, 2016, vol. 51, no. 1,
pp. 400–415 (2016). https://dblp.uni-trier.de/rec/bibtex/conf/popl/DragoiHZ16?
q=speculative%20AQ4%20Byzantine%20fault%20tolerance
8. Dutertre, B., Jovanović, D., Navas, J.A.: Verification of fault-tolerant protocols
with sally. In: Dutle, A., Muñoz, C., Narkawicz, A. (eds.) NFM 2018. LNCS, vol.
10811, pp. 113–120. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-
77935-5 8
9. Ge, Y., de Moura, L.: Complete instantiation for quantified formulas in satisfi-
abiliby modulo theories. In: Bouajjani, A., Maler, O. (eds.) CAV 2009. LNCS,
vol. 5643, pp. 306–320. Springer, Heidelberg (2009). https://doi.org/10.1007/978-
3-642-02658-4 25
10. v. Gleissenthall, K., Bjørner, N., Rybalchenko, A.: Cardinalities and universal
quantifiers for verifying parameterized systems. In: Proceedings of the 37th ACM
SIGPLAN Conference on Programming Language Design and Implementation,
PLDI 2016, pp. 599–613. ACM (2016)
11. von Gleissenthall, K., Kici, R.G., Bakst, A., Stefan, D., Jhala, R.: Pretend syn-
chrony: synchronous verification of asynchronous distributed programs. PACMPL
3(POPL), 59:1–59:30 (2019). https://dl.acm.org/citation.cfm?id=3290372
12. Hawblitzel, C., Howell, J., Kapritsos, M., Lorch, J.R., Parno, B., Roberts, M.L.,
Setty, S.T.V., Zill, B.: Ironfleet: proving practical distributed systems correct. In:
Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015,
Monterey, CA, USA, 4–7 October 2015, pp. 1–17 (2015). https://doi.org/10.1145/
2815400.2815428,
264 I. Berkovits et al.
13. Konnov, I., Lazic, M., Veith, H., Widder, J.: Para2 : Parameterized path
reduction, acceleration, and SMT for reachability in threshold-guarded dis-
tributed algorithms. Form. Methods Syst. Des. 51(2), 270–307 (2017).
https://link.springer.com/article/10.1007/s10703-017-0297-4
14. Konnov, I., Veith, H., Widder, J.: On the completeness of bounded model checking
for threshold-based distributed algorithms: reachability. Inf. Comput. 252, 95–109
(2017)
15. Konnov, I., Widder, J.: ByMC: byzantine model checker. In: Margaria, T., Steffen,
B. (eds.) ISoLA 2018. LNCS, vol. 11246, pp. 327–342. Springer, Cham (2018).
https://doi.org/10.1007/978-3-030-03424-5 22
16. Konnov, I.V., Lazic, M., Veith, H., Widder, J.: A short counterexample property
for safety and liveness verification of fault-tolerant distributed algorithms. In: Pro-
ceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming
Languages, POPL 2017, Paris, France, 18–20 January 2017, pp. 719–734 (2017)
17. Kotla, R., Alvisi, L., Dahlin, M., Clement, A., Wong, E.: Zyzzyva: speculative
Byzantine fault tolerance. SIGOPS Oper. Syst. Rev. 41(6), 45–58 (2007)
18. Kukovec, J., Konnov, I., Widder, J.: Reachability in parameterized systems: all
flavors of threshold automata. In: CONCUR. LIPIcs, vol. 118, pp. 19:1–19:17.
Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2018)
19. Kuncak, V., Nguyen, H.H., Rinard, M.: An algorithm for deciding BAPA: boolean
algebra with presburger arithmetic. In: Nieuwenhuis, R. (ed.) CADE 2005. LNCS
(LNAI), vol. 3632, pp. 260–277. Springer, Heidelberg (2005). https://doi.org/10.
1007/11532231 20
20. Lamport, L.: The Part-time Parliament 16(2), 133–169 (1998–2005). https://doi.
org/10.1145/279227.279229
21. Lamport, L.: Lower bounds for asynchronous consensus. In: Schiper, A., Shvarts-
man, A.A., Weatherspoon, H., Zhao, B.Y. (eds.) Future Directions in Distributed
Computing. LNCS, vol. 2584, pp. 22–23. Springer, Heidelberg (2003). https://doi.
org/10.1007/3-540-37795-6 4
22. Lamport, L.: Lower bounds for asynchronous consensus. Distrib. Comput. 19(2),
104–125 (2006)
23. Lamport, L.: Fast byzantine paxos, 17 November 2009. uS Patent 7,620,680
24. Lazic, M., Konnov, I., Widder, J., Bloem, R.: Synthesis of distributed algorithms
with parameterized threshold guards. In: OPODIS (2017, to appear). http://
forsyte.at/wp-content/uploads/opodis17.pdf
25. Lewis, H.R.: Complexity results for classes of quantificational formulas. Comput.
Syst. Sci. 21(3), 317–353 (1980)
26. Liffiton, M.H., Previti, A., Malik, A., Marques-Silva, J.: Fast, flexible mus enumer-
ation. Constraints 21(2), 223–250 (2016)
27. Liu, Y.A., Stoller, S.D., Lin, B.: From clarity to efficiency for distributed algo-
rithms. ACM Trans. Program. Lang. Syst. 39(3), 121–1241 (2017). https://doi.
org/10.1145/2994595
28. Martin, J.P., Alvisi, L.: Fast Byzantine consensus. IEEE Trans. Dependable Secure
Comput. 3(3), 202–215 (2006)
29. McMillan, K.L., Padon, O.: Deductive verification in decidable fragments with ivy.
In: Podelski, A. (ed.) SAS 2018. LNCS, vol. 11002, pp. 43–55. Springer, Cham
(2018). https://doi.org/10.1007/978-3-319-99725-4 4
30. Padon, O., Hoenicke, J., Losa, G., Podelski, A., Sagiv, M., Shoham, S.: Reducing
liveness to safety in first-order logic. PACMPL 2(POPL), 26:1–26:33 (2018)
Verification of Threshold-Based Distributed Algorithms 265
31. Padon, O., Hoenicke, J., McMillan, K.L., Podelski, A., Sagiv, M., Shoham, S.:
Temporal prophecy for proving temporal properties of infinite-state systems. In:
FMCAD, pp. 1–11. IEEE (2018)
32. Padon, O., Losa, G., Sagiv, M., Shoham, S.: Paxos made EPR: decidable reasoning
about distributed protocols. PACMPL 1(OOPSLA), 1081–10831 (2017)
33. Padon, O., McMillan, K.L., Panda, A., Sagiv, M., Shoham, S.: Ivy: safety verifi-
cation by interactive generalization. In: Krintz, C., Berger, E. (eds.) Proceedings
of the 37th ACM SIGPLAN Conference on Programming Language Design and
Implementation, PLDI 2016, Santa Barbara, CA, USA, 13–17 June 2016, pp. 614–
630. ACM (2016)
34. Piskac, R.: Decision procedures for program synthesis and verification (2011).
http://infoscience.epfl.ch/record/168994
35. Piskac, R., de Moura, L., Bjrner, N.: Deciding effectively propositional logic using
DPLL and substitution sets. J. Autom. Reason. 44(4), 401–424 (2010)
36. Rahli, V., Guaspari, D., Bickford, M., Constable, R.L.: Formal specification, ver-
ification, and implementation of fault-tolerant systems using eventml. ECEASST
72 (2015). https://doi.org/10.14279/tuj.eceasst.72.1013
37. Rahli, V., Vukotic, I., Völp, M., Esteves-Verissimo, P.: Velisarios: Byzantine fault-
tolerant protocols powered by Coq. In: Ahmed, A. (ed.) ESOP 2018. LNCS, vol.
10801, pp. 619–650. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-
89884-1 22
38. Sergey, I., Wilcox, J.R., Tatlock, Z.: Programming and proving with distributed
protocols. PACMPL 2(POPL), 28:1–28:30 (2018)
39. Song, Y.J., van Renesse, R.: Bosco: one-step Byzantine asynchronous consensus.
In: Taubenfeld, G. (ed.) DISC 2008. LNCS, vol. 5218, pp. 438–450. Springer, Hei-
delberg (2008). https://doi.org/10.1007/978-3-540-87779-0 30
40. Srikanth, T., Toueg, S.: Simulating authenticated broadcasts to derive simple fault-
tolerant algorithms. Dist. Comp. 2, 80–94 (1987)
41. Stoilkovska, I., Konnov, I., Widder, J., Zuleger, F.: Verifying safety of synchronous
fault-tolerant algorithms by bounded model checking. In: Vojnar, T., Zhang, L.
(eds.) TACAS 2019. LNCS, vol. 11428, pp. 357–374. Springer, Cham (2019).
https://doi.org/10.1007/978-3-030-17465-1 20
42. Taube, M., et al.: Modularity for decidability of deductive verification with appli-
cations to distributed systems. In: PLDI, pp. 662–677. ACM (2018)
43. Wilcox, J.R., et al.: Verdi: a framework for implementing and formally verifying
distributed systems. In: Proceedings of the 36th ACM SIGPLAN Conference on
Programming Language Design and Implementation, Portland, OR, USA, 15–17
June 2015, pp. 357–368 (2015). https://doi.org/10.1145/2737924.2737958
44. Woos, D., Wilcox, J.R., Anton, S., Tatlock, Z., Ernst, M.D., Anderson, T.E.: Plan-
ning for change in a formal verification of the raft consensus protocol. In: Proceed-
ings of the 5th ACM SIGPLAN Conference on Certified Programs and Proofs,
Saint Petersburg, FL, USA, 20–22 January 2016, pp. 154–165 (2016). https://doi.
org/10.1145/2854065.2854081
266 I. Berkovits et al.
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Gradual Consistency Checking
1 Introduction
1
All the CC variations become NP-complete without the assumption that each value
is written at most once [6]. This holds for the variations of CC we introduce in this
paper as well.
Gradual Consistency Checking 269
resp., write, operations is denoted by R, resp., W. The set of read, resp., write,
operations in a set of operations O is denoted by R(O), resp., W(O). The variable
accessed by an operation o is denoted by var(o).
Consistency criteria like SC or TSO are formalized on an abstract view of
an execution called history. A history includes a set of write or read opera-
tions ordered according to a (partial) program order po which order operations
issued by the same thread. Most often, po is a union of sequences, each sequence
containing all the operations issued by some thread. Then, we assume that
the history includes a write-read relation which identifies the write operation
writing the value returned by each read in the execution. Such a relation can
be extracted easily from executions where each value is written at most once.
Since shared-memory implementations (or cache coherence protocols) are data-
independent [31] in practice, i.e., their behavior doesn’t depend on the concrete
values read or written in the program, any potential buggy behavior can be
exposed in such executions.
Definition 1. A history O, po, wr is a set of operations O along with a strict
partial program order po and a write-read relation wr ⊆ W(O) × R(O), such
that the inverse of wr is a total function and if (write(x, v), read(x , v )) ∈ wr,
then x = x and v = v .
We assume that every history includes a write operation writing the initial
value of variable x, for each variable x. These write operations precede all other
operations in po. We use h, h1 , h2 , . . . to range over histories.
We now define the SC and TSO memory models (we use the same definitions
as in the formal framework developed by Alglave et al. [3]). Given a history
h = O, po, wr and a variable x, a store order on x is a strict total order wwx on
the write operations write (x, ) in O. A store order is a union of store orders wwx ,
one for each variable x used in h. A history O, po, wr is sequentially consistent
(SC, for short) if there exists a store order ww such that po ∪ wr ∪ ww ∪ rw is
acyclic. The read-write relation rw is defined by rw = wr−1 ◦ww (where ◦ denotes
the standard relation composition).
The definition of TSO relies on three additional relations: (1) the ppo relation
which excludes from the program order pairs formed of a write and respectively,
a read operation, i.e., ppo = po \ (W(O) × R(O)), (2) the po-loc relation which
is a restriction of po to operations accessing the same variable, i.e., po-loc =
po∩{(o, o ) | var(o) = var(o )}, and (3) the write-read external relation wre which
is a restriction of the write-read relation to pairs of operations in different threads
(not related by program order), i.e., wre = wr ∩ {(o, o ) | (o, o ) ∈ po and (o , o) ∈
po}. Then, we say that a history satisfies TSO if there exists a store order ww
such that po-loc ∪ wre ∪ ww ∪ rw and ppo ∪ wre ∪ ww ∪ rw are both acyclic.
Notice that the formal definition of the TSO given above is equivalent to the
formal operational model of TSO that consists in considering that each thread
has a store buffer, and then, each write issued by a thread is first sent to its
store buffer before being committed to the memory later in a nondeterministic
way. To read a value on some variable x, a thread first checks if it there is still
Gradual Consistency Checking 271
a write on x pending in its own buffer and in this case it takes the value of the
last such as write, otherwise it fetches the value of x in the memory.
The weakest variation of causal consistency, called weak causal consistency (CC,
for short), requires that any two causally-dependent values are observed in the
same order by all threads, where causally-dependent means that either those
values were written by the same thread (i.e., the corresponding writes are ordered
by po), or that one value was written by a thread after reading the other value,
or any transitive composition of such dependencies. Values written concurrently
by two threads can be observed in any order, and even-more, this order may
change in time. A history O, po, wr satisfies CC if po ∪ wr ∪ rw[co] is acyclic
where co = (po∪wr)+ is called the causal relation. The read-write relation rw[co]
induced by the causal relation is defined by
The conflict relation relates two writes w1 and w2 when w1 is causally related
to a read taking its value from w2 . The definition of CCM, our new variation
of causal consistency, relies on a generalization of the conflict relation where a
different relation is used instead of co. Given a binary relation R on operations,
RWR denotes the projection of R on pairs of writes and reads on the same
variable, respectively.
Fig. 1. Histories with two threads used to compare different consistency models. Oper-
ations of the same thread are aligned vertically.
implies that it observed write(x, 2) after write(x, 1). While this is allowed by CM
where different threads can observe concurrent writes in different orders, it is
not allowed by CCv. Then, the history in Fig. 1b is CCv but not CM. It is not
allowed by CM because reading the initial value 0 from z implies that write(x, 1)
is observed after write(x, 2) while reading 2 from x implies that write(x, 2) is
observed after write(x, 1) (write(x, 1) must have been observed because the same
thread reads 1 from y and the writes on x and y are causally related). However,
under CCv, a thread simply reads the most recent value on each variable and the
order in which these values are ordered using timestamps for instance is inde-
pendent of the order in which variables are read in a thread, e.g., reading 0 from
z doesn’t imply that the timestamp of write(x, 2) is smaller than the timestamp
of write(x, 1). This history is admitted by CCv assuming that the order in which
write(x, 1) and write(x, 2) are observed is write(x, 1) before write(x, 2).
Let us give the formal definition of CM. Let h=O, po, wr be a history. For
every operation o in h, let hbo be the smallest transitive relation such that:
1. if two operations are causally related, and each one causally related to o, then
they are related by hbo , i.e., (o1 , o2 ) ∈ hbo if (o1 , o2 ) ∈ co, (o1 , o) ∈ co, and
(o2 , o) ∈ co∗ (where co∗ is the reflexive closure of co), and
2. two writes w1 and w2 are related by hbo if w1 is hbo -related to a read taking
its value from w2 , and that read is done by the same thread executing o
and before o (this scenario is similar to the definition of the conflict relation
above), i.e., (write(x, v), write(x, v )) ∈ hbo if (write(x, v), read(x, v )) ∈ hbo ,
(write(x, v ), read(x, v )) ∈ wr, and (read(x, v ), o) ∈ po∗ , for some read(x, v ).
A history O, po, wr satisfies CM if it satisfies CC and for each operation o
in the history, the relation hbo is acyclic.
Bouajjani et al. [6] show that the problem of checking whether a history
satisfies CC, CCv, or CM is polynomial time. This result is a straightforward
consequence of the above definitions, since the union of relations required to be
acyclic can be computed in polynomial time from the relations po and wr which
are fixed in a given history. In particular, the union of these relations can be
computed by a DATALOG program.
The partial store order pww contains the ordering constraints between writes in
all relations hbo used to defined causal memory, and also, the conflict relation
274 R. Zennou et al.
induced by this set of constraints (a weaker version of conflict relation was used
to define causal convergence).
As a first result, we show that all the variations of causal consistency in
Sect. 3.1, i.e., CC, CCv and CM, are strictly weaker than CCM.
Lemma 1. If a history satisfies CCM, then it satisfies CC, CCv and CM.
Proof. Let h = O, po, wr be a history satisfying CCM. By the definition of hb,
we have that coWW ⊆ hbWW . Indeed, any two writes o1 and o2 related by co are
also related by hbo2 , which by the definition of hb, implies that they are related
by hbWW . Then, by the definition of pww, we have that hbWW ⊆ pww. This
implies that rw[co] ⊆ rw[pww] (by definition, rw[co] = rw[coWW ]). Therefore, the
acyclicity of po ∪ wr ∪ pww ∪ rw[pww] implies that its subset (po ∪ wr ∪ rw[co] is
also acyclic, which means that h satisfies CC. Also, it implies that po∪wr ∪cf[hb]
is acyclic (the last term of the union is included in pww), which by co ⊆ hb,
implies that po ∪ wr ∪ cf[co] is acyclic, and thus, h satisfies CCv. The fact that
h satisfies CM follows from the fact that h satisfies CC (since po ∪ wr is acyclic)
and hb is acyclic (hbWW is included in pww and the rest of the dependencies in
hb are included in po ∪ wr).
The reverse of the above lemma doesn’t hold. Figure 1c shows a history which
satisfies CM and CCv, but it is not CCM. To show that this history does not
satisfy CCM we use the fact that pww relates any two writes which are ordered
by program order. Then, we get that read(x, 1) and write(x, 2) are related by
rw[pww] (because write(x, 1) is related by write-read with read(x, 1)), which fur-
ther implies that (read(x, 1), read(y, 1)) ∈ rw[pww] ◦ po. Similarly, we have that
(read(y, 1), read(x, 1)) ∈ rw[pww]◦po, which implies that po∪wr ∪pww ∪rw[pww]
is not acyclic, and therefore, the history does not satisfy CCM. The fact that
this history satisfies CM and CCv follows easily from definitions.
Next, we show that CCM is weaker than SC, which will be important in our
algorithm for checking whether a history satisfies SC.
Proof. Using the definition of CCM, Let h = O, po, wr be a history satisfying
SC. Then, there exists a store order ww such that po∪wr ∪ww ∪rw[ww] is acyclic.
We show that the two relations hbWW and cf[hb], whose union constitutes pww,
are both included in ww. We first prove that hb ⊆ (po ∪ wr ∪ ww ∪ rw[ww])+ by
structural induction on the definition of hbo :
1. if (o1 , o2 ) ∈ co = (po ∪ wr)+ , then clearly, (o1 , o2 ) ∈ (po ∪ wr ∪ ww ∪ rw[ww])+ ,
2. if (write(x, v), read(x, v )) ∈ (po ∪ wr ∪ ww ∪ rw[ww])+ and there is read(x, v )
such that (write(x, v ), read(x, v )) ∈ wr, then (write(x, v), write(x, v )) ∈ ww.
Otherwise, assuming by contradiction that (write(x, v ), write(x, v)) ∈ ww, we
get that (read(x, v ), write(x, v)) ∈ rw[ww] (by the definition of rw[ww] using
the hypothesis (write(x, v ), read(x, v )) ∈ wr). Note that the latter implies
that po ∪ wr ∪ ww ∪ rw[ww] is cyclic.
Gradual Consistency Checking 275
Fig. 2. Relationships between consistency models. Directed arrows denote the “weaker-
than” relation while dashed lines connect incomparable models.
The left side of Fig. 2 (ignoring wCCM and TSO) summarizes the relation-
ships between the consistency models presented in this section.
The partial store order pww can be computed in polynomial time (in the size
of the input history). Indeed, the hbo relations can be computed using a least
fixpoint calculation that converges in at most a quadratic number of iterations
and acyclicity can be decided in polynomial time. Therefore,
Theorem 1. Checking whether a history satisfies CCM is polynomial time in
the size of the history.
t0 : t1 : t2 : t3 :
write(x, 1) write(y, 1) read(x, 1) read(y, 1)
read(y, 0) read(x, 0)
Then, we say that a history O, po, wr satisfies weak convergent causal memory
(wCCM) if both relations:
are acyclic.
Proof. Let h = O, po, wr be a history satisfying TSO. Then, there exists a
store order ww such that po-loc ∪ wre ∪ ww ∪ rw and ppo ∪ wre ∪ ww ∪ rw are
both acyclic. The fact that
can be proved by structural induction like in the case of SC (the step of the
proof showing that hb ⊆ po ∪ wr ∪ ww ∪ rw[ww]). Then, since ww is a total order
on writes on the same variable, we get that the projection of whb (the transitive
closure of the union of hbpo-loc and hbppo ) on pairs of writes on the same variable
278 R. Zennou et al.
is included in ww. Therefore, whbWW ⊆ ww. Then, since cf e [Rπ ] ⊆ Rπ for each
Rπ = (π ∪ wre ∪ ww ∪ rw)+ with π ∈ {ppo, po-loc} and since each cf e [Rπ ] relates
only writes on the same variable, we get that each cf e [Rπ ] is included in ww.
This implies that wpww ⊆ ww.
Finally, since wpww ⊆ ww, we get that (π ∪ wr ∪ wpww ∪ rw[wpww])+ ⊆
(π ∪ wr ∪ ww ∪ rw[ww])+ , for each π ∈ {ppo, po-loc}. In each case, the acyclicity
of the latter implies the acyclicity of the former. Therefore, h satisfies wCCM.
The reverse of the above lemma does not hold. Indeed, it can be easily seen
that wCCM is weaker than CCM (since wpww is included in pww) and the history
in Fig. 3, which satisfies CCM but not TSO (as explained in the beginning of the
section), is also an example of a history that satisfies wCCM but not TSO. Then,
wCCM is incomparable to CM. For instance, the history in Fig. 1b is allowed by
wCCM (since it is allowed by TSO as explained in the beginning of the section)
but not by CM. Also, since CCM is stronger than CM, the history in Fig. 3
satisfies CM but not wCCM (since it does not satisfy TSO). These relationships
are summarized in Fig. 2. Establishing the precise relation between CC/CCv and
TSO is hard because they are defined using one, resp., two, acyclicity conditions.
We believe that CC and CCv are weaker than TSO, but we don’t have a formal
proof.
Finally, it can be seen that, similarly to pww, the weak partial store order
wpww can be computed in polynomial time, and therefore:
The algorithm for checking TSO conformance for a given history is given in
Fig. 2. It starts by checking whether the history violates the weaker consistency
Gradual Consistency Checking 279
model wCCM. If yes, it returns false. If not, it starts enumerating the orders
between the writes that are not related by the weak partial store order wpww
until it founds one that allows establishing TSO conformance and in this case it
returns true. Otherwise it returns false.
5 Experimental Evaluation
To demonstrate the practical value of the theory developed in the previous sec-
tions, we argue that our algorithms are efficient and scalable. We experiment
with both SC and TSO algorithms, investigating their running time compared
to a standard encoding of these models into boolean satisfiability on a bench-
mark obtained by running realistic cache coherence protocols within the Gem5
simulator [5] in system emulation mode.
Histories are generated with random clients of the following cache coher-
ence protocols included in the Gem5 distribution: MI, MEOSI Hammer,
MESI Two Level, and MEOSI AMD Base. The randomization process is
parametrized by the number of cpus (threads) and the total number of read-
/write operations. We ensure that every value is written at most once.
We have compared two variations of our algorithms for checking SC/TSO with
a standard encoding of SC/TSO into boolean satisfiability (named X-SAT where
X is SC or TSO). The two variations differ in the way in which the partial store
order pww dictated by CCM is completed to a total store order ww as required
by SC/TSO: either using standard enumeration (named X-CCM+Enum where
X is SC or TSO) or using a SAT solver (named X-CCM+SAT where X is SC or
TSO).
The computation of the partial store order pww is done using an encoding of
its definition into a DATALOG program. The inductive definition of hbo supports
an easy translation to DATALOG rules, and the same holds for the union of two
relations, or their composition. We used Clingo [19] to run DATALOG programs.
5.1 Checking SC
Figure 4 reports on the running time of the three algorithms while increasing the
number of operations or cpus. All the histories considered in this experiment sat-
isfy SC. This is intended because valid histories force our algorithms to enumerate
extensions of the partial store order (SC violations may be detected while check-
ing CCM). The graph on the left pictures the evolution of the running time when
increasing the number of operations from 100 to 500, in increments of 100 (while
using a constant number of 4 cpus). For each number of operations, we have con-
sidered 200 histories and computed the average running time. The graph on the
right shows the running time when increasing the number of cpus from 2 to 6, in
increments of 1. For x cpus, we have limited the number of operations to 50x. As
before for each number of cpus, we have considered 200 histories and computed
280 R. Zennou et al.
(a) Checking SC while varying the (b) Checking SC while varying the
number of operations. number of cpus.
the average running time. As it can be observed, our algorithms scale much bet-
ter than the SAT encoding and interestingly enough, the difference between an
explicit enumeration of pww extensions and one using a SAT solver is not signif-
icant. Note that even small improvements on the average running time provide
large speedups when taking into account the whole testing process, i.e., checking
consistency for a possibly large number of (randomly-generated) executions. For
instance, the work on McVerSi [13], which focuses on the complementary prob-
lem of finding clients that increase the probability of uncovering bugs, shows that
exposing bugs in some realistic cache coherence implementations requires even 24
h of continuous testing.
Since the bottleneck in our algorithms is given by the enumeration of pww
extensions, we have measured the percentage of pairs of writes that are not
ordered by pww. Thus, we have considered a random sample of 200 histories
(with 200 operations per history) and evaluated this percentage to be just 6.6%,
which is surprisingly low. This explains the net gain in comparison to a SAT
encoding of SC, since the number of pww extensions that need to be enumerated
is quite low. As a side remark, using CCv instead of CCM in the algorithms
above leads to a drastic increase in the number of unordered writes. For the
same random sample of 200 histories, we conclude that using CCv instead of
CCM leaves 57.75% of unordered writes in average which is considerably bigger
than the percentage of unordered writes when using CCM.
We have also evaluated our algorithms on SC violations. These violations
were generated by reordering statements from the MI implementation, e.g., swap-
ping the order of the actions s store hit and p profileHit in the transition
transition(M, Store). As an optimization, our implementation checks grad-
ually the weaker variations of causal consistency CC and CCv before checking
CCM. This is to increase the chances of returning in the case of a violation (a vio-
lation to CC/CCv is also a violation to CCM and SC). We have considered 1000
histories with 100 to 400 operations and 2 to 8 cpus, equally distributed in function
Gradual Consistency Checking 281
Fig. 5. Checking SC for invalid histories while increasing the number of cpus.
(a) Checking TSO while varying the (b) Checking TSO while varying the
number of operations. number of cpus.
of the number of cpus. Figure 5 reports on the evolution of the average running
time. Since these histories happen to all be CCM violations, SC-CCM+Enum
and SC-CCM+SAT have the same running time. As an evaluation of our opti-
mization, we have found that 50% of the histories invalidate weaker variations of
causal consistency, CC or CCv.
We have evaluated our TSO algorithms on the same set of histories used for SC
in Fig. 4. Since these histories satisfy SC, they satisfy TSO as well. As in the case
of SC, our algorithms scale better than the SAT encoding. However, differently
from SC, the enumeration of wpww extensions using a SAT solver outperforms
282 R. Zennou et al.
the explicit enumeration. Since this difference was more negligible in the case of
SC, it seems that the SAT variation is generally better.
6 Related Work
While several static techniques have been developed to prove that a shared-
memory implementation (or cache coherence protocol) satisfies SC [1,4,9–12,17,
20,23,27,28] few have addressed dynamic techniques such as testing and runtime
verification (which scale to more realistic implementations). From the complexity
standpoint, Gibbons and Korach [21] showed that checking whether a history
is SC is np-hard while Alur et al. [4] showed that checking SC for finite-state
shared-memory implementations (over a bounded number of threads, variables,
and values) is undecidable [4]. The fact that checking whether a history satisfies
TSO is also np-hard has been proved by Furbach et al. [18].
There are several works that addressed the testing problem for related cri-
teria, e.g., linearizability. While SC requires that the operations in a history
be explained by a linearization that is consistent with the program order, lin-
earizability requires that such a linearization be also consistent with the real-
time order between operations (linearizability is stronger than SC). The works
in [25,30] describe monitors for checking linearizability that construct lineariza-
tions of a given history incrementally, in an online fashion. This incremental con-
struction cannot be adapted to SC since it strongly relies on the specificities of
linearizability. Line-Up [8] performs systematic concurrency testing via schedule
enumeration, and offline linearizability checking via linearization enumeration.
The works in [15,16] show that checking linearizability for some particular class
of ADTs is polynomial time. Emmi and Enea [14] consider the problem of check-
ing weak consistency criteria, but their approach focuses on specific relaxations
in those criteria, falling back to an explicit enumeration of linearizations in the
context of a criterion like SC or TSO. Bouajjani et al. [6] consider the problem
of checking causal consistency. They formalize the different variations of causal
consistency we consider in this work and show that the problem of checking
whether a history satisfies one of these variations is polynomial time.
The complementary issue of test generation, i.e., finding clients that increase
the probability of uncovering bugs in shared memory implementations, has been
approached in the McVerSi framework [13]. Their methodology for checking a
criterion like SC lies within the context of white-box testing, i.e., the user is
required to annotate the shared memory implementation with events that define
the store order in an execution. Our algorithms have the advantage that the
implementation is treated as a black-box requiring less user intervention.
7 Conclusion
We have introduced an approach for checking the conformance of a computation
to SC or to TSO, a problem known to be NP-hard. The idea is to avoid an explicit
enumeration of the exponential number of possible total orders between writes in
Gradual Consistency Checking 283
order to solve these problems. Our approach is to define weaker criteria that are
as strong as possible but still polynomial time checkable. This is useful for (1)
early detection of violations, and (2) reducing the number of pairs of writes for
which an order must be found in order to check SC/TSO conformance. Morally,
the approach consists in being able to capture an “as large as possible” partial
order on writes that can be computed in polynomial time (using a least fixpoint
calculation), and which is a subset of any total order witnessing SC/TSO con-
formance. Our experimental results show that this approach is indeed useful and
performant: it allows to catch most of violations early using an efficient check,
and it allows to compute a large kernel of write constraints that reduces signifi-
cantly the number of pairs of writes that are left to be ordered in an enumerative
way. Future work consists in exploring the application of this approach to other
correctness criteria that are hard to check such a serializability in the context of
transactional programs.
References
1. Abdulla, P.A., Haziza, F., Holı́k, L.: Parameterized verification through view
abstraction. STTT 18(5), 495–516 (2016). https://doi.org/10.1007/s10009-015-
0406-x
2. Ahamad, M., Neiger, G., Burns, J.E., Kohli, P., Hutto, P.W.: Causal memory: def-
initions, implementation, and programming. Distrib. Comput. 9(1), 37–49 (1995)
3. Alglave, J., Maranget, L., Tautschnig, M.: Herding cats: modelling, simulation,
testing, and data mining for weak memory. ACM Trans. Program. Lang. Syst.
36(2), 7:1–7:74 (2014). https://doi.org/10.1145/2627752
4. Alur, R., McMillan, K.L., Peled, D.A.: Model-checking of correctness conditions
for concurrent objects. Inf. Comput. 160(1–2), 167–188 (2000). https://doi.org/
10.1006/inco.1999.2847
5. Binkert, N., et al.: The gem5 simulator. SIGARCH Comput. Archit. News 39(2),
1–7 (2011). https://doi.org/10.1145/2024716.2024718
6. Bouajjani, A., Enea, C., Guerraoui, R., Hamza, J.: On verifying causal consistency.
In: Castagna, G., Gordon, A.D. (eds.) Proceedings of the 44th ACM SIGPLAN
Symposium on Principles of Programming Languages, POPL 2017, Paris, France,
January 18–20, 2017, pp. 626–638. ACM (2017). http://dl.acm.org/citation.cfm?
id=3009888
7. Burckhardt, S.: Principles of Eventual Consistency. Now publishers, Boston, Octo-
ber 2014
8. Burckhardt, S., Dern, C., Musuvathi, M., Tan, R.: Line-up: a complete and auto-
matic linearizability checker. In: Zorn, B.G., Aiken, A. (eds.) Proceedings of the
2010 ACM SIGPLAN Conference on Programming Language Design and Imple-
mentation, PLDI 2010, Toronto, Ontario, Canada, 5–10 June 2010, pp. 330–340.
ACM (2010). https://doi.org/10.1145/1806596.1806634
9. Clarke, E.M., et al.: Verification of the futurebus+ cache coherence protocol. In:
Agnew, D., Claesen, L.J.M., Camposano, R. (eds.) Computer Hardware Descrip-
tion Languages and their Applications, Proceedings of the 11th IFIP WG10.2
International Conference on Computer Hardware Description Languages and their
Applications - CHDL 1993, sponsored by IFIP WG10.2 and in cooperation with
IEEE COMPSOC, Ottawa, Ontario, Canada, 26–28 April 1993. IFIP Transactions,
vol. A-32, pp. 15–30. North-Holland (1993)
284 R. Zennou et al.
26. Perrin, M., Mostefaoui, A., Jard, C.: Causal consistency: beyond memory. In: Pro-
ceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of
Parallel Programming, PPoPP 2016, pp. 26:1–26:12. ACM, New York (2016)
27. Pong, F., Dubois, M.: A new approach for the verification of cache coherence
protocols. IEEE Trans. Parallel Distrib. Syst. 6(8), 773–787 (1995). https://doi.
org/10.1109/71.406955
28. Qadeer, S.: Verifying sequential consistency on shared-memory multiprocessors by
model checking. IEEE Trans. Parallel Distrib. Syst. 14(8), 730–741 (2003). https://
doi.org/10.1109/TPDS.2003.1225053
29. Sewell, P., Sarkar, S., Owens, S., Nardelli, F.Z., Myreen, M.O.: x86-tso: a rigorous
and usable programmer’s model for x86 multiprocessors. Commun. ACM 53(7),
89–97 (2010). https://doi.org/10.1145/1785414.1785443
30. Wing, J.M., Gong, C.: Testing and verifying concurrent objects. J. Parallel Distrib.
Comput. 17(1–2), 164–182 (1993). https://doi.org/10.1006/jpdc.1993.1015
31. Wolper, P.: Expressing interesting properties of programs in propositional temporal
logic. In: Conference Record of the Thirteenth Annual ACM Symposium on Prin-
ciples of Programming Languages, St. Petersburg Beach, Florida, USA, January
1986, pp. 184–193. ACM Press (1986). https://doi.org/10.1145/512644.512661
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Checking Robustness Against
Snapshot Isolation
1 Introduction
This work is supported in part by the European Research Council (ERC) under the
Horizon 2020 research and innovation programme (grant agreement No 678177).
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 286–304, 2019.
https://doi.org/10.1007/978-3-030-25543-5_17
Checking Robustness Against Snapshot Isolation 287
most prominent being snapshot isolation (SI) [5]. Then, an important issue is to
ensure that the level of consistency needed by a given program coincides with
the one that is guaranteed by its infrastructure, i.e., the database it uses. One
way to tackle this issue is to investigate the problem of checking robustness of
programs against consistency relaxations: Given a program P and two consis-
tency models S and W such that S is stronger than W , we say that P is robust
for S against W if for every two implementations IS and IW of S and W respec-
tively, the set of computations of P when running with IS is the same as its set
of computations when running with IW . This means that P is not sensitive to
the consistency relaxation from S to W , and therefore it is possible to reason
about the behaviors of P assuming that it is running over S, and no additional
synchronization is required when P runs over the weak model W such that it
maintains all its properties satisfied with S.
In this paper, we address the problem of verifying robustness of transactional
programs for serializability, against snapshot isolation. Under snapshot isolation,
any transaction t reads values from a snapshot of the database taken at its start
and t can commit only if no other committed transaction has written to a loca-
tion that t wrote to, since t started. Robustness is a form of program equivalence
between two versions of the same program, obtained using two semantics, one
more permissive than the other. It ensures that this permissiveness has no effect
on the program under consideration. The difficulty in checking robustness is to
apprehend the extra behaviors due to the relaxed model w.r.t. the strong model.
This requires a priori reasoning about complex order constraints between opera-
tions in arbitrarily long computations, which may need maintaining unbounded
ordered structures, and make robustness checking hard or even undecidable.
Our first contribution is to show that verifying robustness of transac-
tional programs against snapshot isolation can be reduced in polynomial time
to the reachability problem in concurrent programs under sequential consis-
tency (SC). This allows (1) to avoid explicit handling of the snapshots from
where transactions read along computations (since this may imply memorizing
unbounded information), and (2) to leverage available tools for verifying invari-
ants/reachability problems on concurrent programs. This also implies that the
robustness problem is decidable for finite-state programs, PSPACE-complete
when the number of sites is fixed, and EXPSPACE-complete otherwise. This is
the first result on the decidability and complexity of the problem of verifying
robustness in the context of transactional programs. The problem of verifying
robustness has been considered in the literature for several models, including
eventual and causal consistency [6,10–12,20]. These works provide (over- or
under-)approximate analyses for checking robustness, but none of them pro-
vides precise (sound and complete) algorithmic verification methods for solving
this problem.
Based on this reduction, our second contribution is a proof methodology
for establishing robustness which builds on Lipton’s reduction theory [18]. We
use the theory of movers to establish whether the relaxations allowed by SI are
harmless, i.e., they don’t introduce new behaviors compared to serializability.
288 S. M. Beillahi et al.
conflict
p1: p2:
t1: [r1 = y //0 || t2: [r2 = x //0
x = 1] y = 1] [r1 = y; x = 1] [r2 = x; y = 1]
conflict
2 Overview
In this section, we give an overview of our approach for checking robustness
against snapshot isolation. While serializability enforces that transactions are
atomic and conflicting transactions, i.e., which read or write to a common loca-
tion, cannot commit concurrently, SI [5] allows that conflicting transactions
commit in parallel as long as they don’t contain a write-write conflict, i.e., write
on a common location. Moreover, under SI, each transaction reads from a snap-
shot of the database taken at its start. These relaxations permit the “anomaly”
known as Write Skew (WS) shown in Fig. 1a, where an anomaly is a program
execution which is allowed by SI, but not by serializability. The execution of
Write Skew under SI allows the reads of x and y to return 0 although this
cannot happen under serializability. These values are possible since each trans-
action is executed locally (starting from the initial snapshot) without observing
the writes of the other transaction.
Execution Trace. Our notion of program robustness is based on an abstract
representation of executions called trace. Informally, an execution trace is a set
of events, i.e., accesses to shared variables and transaction begin/commit events,
along with several standard dependency relations between events recording the
data-flow. The transitive closure of the union of all these dependency relations
is called happens-before. An execution is an anomaly if the happens-before of its
trace is cyclic. Figure 1b shows the happens-before of the Write Skew anomaly.
Notice that the happens-before order is cyclic in both cases.
Semantically, every transaction execution involves two main events, the issue
and the commit. The issue event corresponds to a sequence of reads and/or
writes where the writes are visible only to the current transaction. We interpret
Checking Robustness Against Snapshot Isolation 289
The write to x was delayed by storing the value in the auxiliary register rx and
the happens-before chain exists because the read on y that was done by t1 is
conflicting with the write on y from t2 and the read on x by t2 is conflicting
with the write of x in the simulation of t1 ’s commit event. On the other hand,
the following execution of Write-Skew without the read on y in t1 :
begin(p1 , t1 )st(p1 , t1 , rx , 1) st(p1 , t1 , x, rx )
begin(p2 , t2 )ld(p2 , t2 , x, 0)isu(p2 , t2 , y, 1)com(p2 , t2 )
3 Programs
A program is parallel composition of processes distinguished using a set of iden-
tifiers P. Each process is a sequence of transactions and each transaction is a
sequence of labeled instructions. Each transaction starts with a begin instruc-
tion and finishes with a commit instruction. Each other instruction is either an
assignment to a process-local register from a set R or to a shared variable from
a set V, or an assume statement. The read/write assignments use values from a
data domain D. An assignment to a register reg := var is called a read of the
shared-variable var and an assignment to a shared variable var := reg-expr
is called a write to var (reg-expr is an expression over registers whose syn-
tax we leave unspecified since it is irrelevant for our development). The assume
bexpr blocks the process if the Boolean expression bexpr over registers is
false. They are used to model conditionals as usual. We use goto statements
to model an arbitrary control-flow where the same label can be assigned to
multiple instructions and multiple goto statements can direct the control to the
same label which allows to mimic imperative constructs like loops and condition-
als. To simplify the technical exposition, our syntax includes simple read/write
instructions. However, our results apply as well to instructions that include SQL
(select/update) queries. The experiments reported in Sect. 7 consider programs
with SQL based transactions.
The semantics of a program under SI is defined as follows. The shared vari-
ables are stored in a central memory and each process keeps a replicated copy
of the central memory. A process starts a transaction by discarding its local
copy and fetching the values of the shared variables from the central memory.
When a process commits a transaction, it merges its local copy of the shared
variables with the one stored in the central memory in order to make its updates
visible to all processes. During the execution of a transaction, the process stores
the writes to shared variables only in its local copy and reads only from its
local copy. When a process merges its local copy with the centralized one, it is
required that there were no concurrent updates that occurred after the last fetch
from the central memory to a shared variable that was updated by the current
transaction. Otherwise, the transaction is aborted and its effects discarded.
More precisely, the semantics of a program P under SI is defined as a labeled
transition system [P]SI where transactions are labeled by the set of events
where begin and com label transitions corresponding to the start and the com-
mit of a transaction, respectively. isu and ld label transitions corresponding to
writing, resp., reading, a shared variable during some transaction.
An execution of program P, under snapshot isolation, is a sequence of events
ev 1 · ev 2 · . . . corresponding to a run of [P]CM . The set of executions of P under
SI is denoted by ExSI (P).
292 S. M. Beillahi et al.
4 Robustness Against SI
A trace abstracts the order in which shared-variables are accessed inside a trans-
action and the order between transactions accessing different variables. Formally,
the trace of an execution ρ is obtained by (1) replacing each sub-sequence of
transitions in ρ corresponding to the same transaction, but excluding the com
transition, with a single “macro-event” isu(p, t), and (2) adding several standard
relations between these macro-events isu(p, t) and commit events com(p, t) to
record the data-flow in ρ, e.g. which transaction wrote the value read by another
transaction. The sequence of isu(p, t) and com(p, t) events obtained in the first
step is called a summary of ρ. We say that a transaction t in ρ performs an
external read of a variable x if ρ contains an event ld(p, t, x, v) which is not
preceded by a write on x of t, i.e., an event isu(p, t, x, v). Also, we say that a
transaction t writes a variable x if ρ contains an event isu(p, t, x, v), for some v.
The trace tr(ρ) = (τ, PO, WR, WW, RW, STO) of an execution ρ consists of
the summary τ of ρ along with the program order PO, which relates any two
issue events isu(p, t) and isu(p, t ) that occur in this order in τ , write-read relation
WR (also called read-from), which relates any two events com(p, t) and isu(p , t )
that occur in this order in τ such that t performs an external read of x, and
com(p, t) is the last event in τ before isu(p , t ) that writes to x (to mark the
variable x, we may use WR(x)), the write-write order WW (also called store-
order), which relates any two store events com(p, t) and com(p , t ) that occur
in this order in τ and write to the same variable x (to mark the variable x, we
may use WW(x)), the read-write relation RW (also called conflict), which relates
any two events isu(p, t) and com(p , t ) that occur in this order in τ such that t
reads a value that is overwritten by t , and the same-transaction relation STO,
which relates the issue event with the commit event of the same transaction. The
read-write relation RW is formally defined as RW(x) = WR−1 (x); WW(x) (we
use ; to denote the standard composition of relations) and RW = RW(x). If a
x∈V
transaction t reads the initial value of x then RW(x) relates isu(p, t) to com(p , t )
of any other transaction t which writes to x (i.e., (isu(p, t), com(p , t )) ∈ RW(x))
(note that in the above relations, p and p might designate the same process).
Since we reason about only one trace at a time, to simplify the writing, we
may say that a trace is simply a sequence τ as above, keeping the relations PO,
WR, WW, RW, and STO implicit. The set of traces of executions of a program
P under SI is denoted by Tr(P)SI .
Serializability Semantics. The semantics of a program under serializability
can be defined using a transition system where the configurations keep a single
shared-variable valuation (accessed by all processes) with the standard inter-
pretation of read and write statements. Each transaction executes in isolation.
Alternatively, the serializability semantics can be defined as a restriction of [P]SI
to the set of executions where each transaction is immediately delivered when it
starts, i.e., the start and commit time of transaction coincide t.st = t.ct. Such
executions are called serializable and the set of serializable executions of a pro-
gram P is denoted by ExSER (P). The latter definition is easier to reason about
Checking Robustness Against Snapshot Isolation 293
Since TrSER (P) ⊆ TrX (P), the problem of checking robustness of a program P is
reduced to checking whether there exists a trace tr ∈ TrSI (P) \ TrSER (P).
A trace which is not serializable must contain at least an issue and a commit
event of the same transaction that don’t occur one after the other even after
reordering of “independent” events. Thus, there must exist an event that occur
between the two which is related to both events via the happens-before rela-
tion, forbidding the issue and commit to be adjacent. Otherwise, we can build
another trace with the same happens-before where events are reordered such that
the issue is immediately followed by the corresponding commit. The latter is a
serializable trace which contradicts the initial assumption. We define a program
instrumentation which mimics the delay of transactions by doing the writes on
auxiliary variables which are not visible to other transactions. After the delay of
a transaction, we track happens-before dependencies until we execute a trans-
action that does a “read” on one of the variables that the delayed transaction
writes to (this would expose a read-write dependency to the commit event of
294 S. M. Beillahi et al.
The instrumentation uses four varieties of flags: a) global flags (i.e., HB, atrA ,
astA ), b) flags local to a process (i.e., p.a and p.hbh), and c) flags per shared
variable (i.e., x.event, x.event , and x.eventI). We will explain the meaning of
these flags along with the instrumentation. At the start of the execution, all flags
are initialized to null (⊥).
Whether a process is an attacker or happens-before helper is not enforced
syntactically by the instrumentation. It is set non-deterministically during the
execution using some additional process-local flags. Each process chooses to set
to true at most one of the flags p.a and p.hbh, implying that the process becomes
an attacker or happens-before helper, respectively. At most one process can be
an attacker, i.e., set p.a to true. In the following, we detail the instrumentation
for read and write instructions of the attacker and happens-before helpers.
Figure 3 lists the instrumentation of the write and read instructions of the
attacker. Each process passes through an initial phase where it executes trans-
actions that are visible immediately to all the other processes (i.e., they are not
delayed), and then non-deterministically it can choose to delay a transaction at
which point it sets the flag atrA to true. During the delayed transaction it chooses
non-deterministically a write instruction to a variable x and stores the name of
this variable in the flag astA (line (5)). The values written during the delayed
transaction are stored in the primed variables and are visible only to the current
transaction, in case the transaction reads its own writes. For example, given a
variable z, all writes to z from the original program are transformed into writes
to the primed version z (line (3)). Each time, the attacker writes to z, it sets
the flag z.event = 1. This flag is used later by transactions from happens-before
helpers to avoid writing to variables that the delayed transaction writes to.
Fig. 3. Instrumentation of the Attacker. We use ‘x to denote the name of the shared
variable x.
296 S. M. Beillahi et al.
5.3 Correctness
The role of a process in an execution is chosen non-deterministically at runtime.
Therefore, the final instrumentation of a given program P, denoted by [[P]], is
obtained by replacing each labeled instruction linst with the concatenation
of the instrumentations corresponding to the attacker and the happens-before
helpers, i.e., [[linst]] ::= [[linst]]A [[linst]]HbH
The following theorem states the correctness of the instrumentation.
Theorem 2. P is not robust against SI iff [[P]] reaches the error state.
If a program is not robust, this implies that the execution of the program under
SI results in a trace where the happens-before is cyclic. Which is possible only
if the program contains at least one delayed transaction. In the proof of this
theorem, we show that is sufficient to search for executions that contain a single
delayed transaction.
Notice that in the instrumentation of the attacker, the delayed transaction
must contain a read and write instructions on different variables. Also, the trans-
actions of the happens-before helpers must not contain a write to a variable that
the delayed transaction writes to. The following corollary states the complexity
of checking robustness for finite-state programs1 against snapshot isolation. It is
a direct consequence of Theorem 2 and of previous results concerning the reach-
ability problem in concurrent programs running over a sequentially-consistent
memory, with a fixed [17] or parametric number of processes [22].
(a) (t0 , t1 ) ∈ MRW where t0 is the write-free variation of t0 and t1 does not write
to a variable that t0 writes to;
(b) for all i ∈ [1, n], (ti , ti+1 ) ∈ (PO ∪ MWR ∪ MWW ∪ MRW ), ti and ti+1 do not
write to a shared variable that t0 writes to;
(c) (tn , t0 ) ∈ MRW where t0 is the read-free variation of t0 and tn does not write
to a variable that t0 writes to.
7 Experiments
To test the applicability of our robustness checking algorithms, we have con-
sidered a benchmark of 10 applications extracted from the literature related to
weakly consistent databases in general. A first set of applications are open source
projects that were implemented to be run over the Cassandra database, extracted
from [11]. The second set of applications is composed of: TPC-C [24], an on-line
transaction processing benchmark widely used in the database community, Small-
Bank, a simplified representation of a banking application [2], FusionTicket, a
movie ticketing application [16], Auction, an online auction application [6], and
Courseware, a course registration service extracted from [14,19].
300 S. M. Beillahi et al.
Table 1. An overview of the analysis results. CDG stands for commutativity depen-
dency graph. The columns PO and PT show the number of proof obligations and
proof time in second, respectively. T stands for trivial when the application has only
read-only transactions.
8 Related Work
Decidability and complexity of robustness has been investigated in the context of
relaxed memory models such as TSO and Power [7,9,13]. Our work borrows some
high-level principles from [7] which addresses the robustness against TSO. We
reuse the high-level methodology of characterizing minimal violations according
to some measure and defining reductions to SC reachability using a program
instrumentation. Instantiating this methodology in our context is however very
different, several fundamental differences being:
– SI and TSO admit different sets of relaxations and SI is a model of trans-
actional databases.
– We use a different notion of measure: the measure in [7] counts the number of
events between a write issue and a write commit while our notion of measure
counts the number of delayed transactions. This is a first reason for which
the proof techniques in [7] don’t extend to our context.
– Transactions induce more complex traces: two transactions might be related
by several dependency relations since each transaction may contain multi-
ple reads and writes to different locations. In TSO, each action is a read
or a write to some location, and two events are related by a single depen-
dency relation. Also, the number of dependencies between two transactions
depends on the execution since the set of reads/writes in a transaction
evolves dynamically.
Other works [9,13] define decision procedures which are based on the theory of
regular languages and do not extend to infinite-state programs like in our case.
As far as we know, our work provides
p1: p2:
the first results concerning the decid- t1: [ if (x > y) t2: [ if (y > x)
r1 = x - y || r2 = y - x
ability and the complexity of robustness
x = y ] y = x ]
checking in the context of transactions.
The existing work on the verification Fig. 5. A robust program.
of robustness for transactional programs
provide either over- or under-approximate analyses. Our commutativity depen-
dency graphs are similar to the static dependency graphs used in [6,10–12],
302 S. M. Beillahi et al.
but they are more precise, i.e., reducing the number of false alarms. The static
dependency graphs record happens-before dependencies between transactions
based on a syntactic approximation of the variables accessed by a transaction.
For example, our techniques are able to prove that the program in Fig. 5 is
robust, while this is not possible using static dependency graphs. The latter
would contain a dependency from transaction t1 to t2 and one from t2 to t1 just
because syntactically, each of the two transactions reads both variables and may
write to one of them. Our dependency graphs take into account the semantics
of these transactions and do not include this happens-before cycle. Other over-
and under-approximate analyses have been proposed in [20]. They are based
on encoding executions into first order logic, bounded-model checking for the
under-approximate analysis, and a sound check for proving a cut-off bound on
the size of the happens-before cycles possible in the executions of a program, for
the over-approximate analysis. The latter is strictly less precise than our method
based on commutativity dependency graphs. For instance, extending the TPC-C
application with additional transactions will make the method in [20] fail while
our method will succeed in proving robustness (the three transactions are for
adding a new product, adding a new warehouse based on the number of cus-
tomers and warehouses, and adding a new customer, respectively).
Finally, the idea of using Lipton’s reduction theory for checking robustness
has been also used in the context of the TSO memory model [8], but the tech-
niques are completely different, e.g., the TSO technique considers each update
in isolation and doesn’t consider non-mover cycles like in our commutativity
dependency graphs.
References
1. Adya, A.: Weak consistency: a generalized theory and optimistic implementations
for distributed transactions. Ph.D. thesis (1999)
2. Alomari, M., Cahill, M.J., Fekete, A., Röhm, U.: The cost of serializability on
platforms that use snapshot isolation. In: Alonso, G., Blakeley, J.A., Chen, A.L.P.
(eds.) Proceedings of the 24th International Conference on Data Engineering, ICDE
2008, 7–12 April 2008, Cancún, Mexico, pp. 576–585. IEEE Computer Society
(2008)
3. Barnett, M., Chang, B.-Y.E., DeLine, R., Jacobs, B., Leino, K.R.M.: Boogie: a
modular reusable verifier for object-oriented programs. In: de Boer, F.S., Bon-
sangue, M.M., Graf, S., de Roever, W.-P. (eds.) FMCO 2005. LNCS, vol. 4111, pp.
364–387. Springer, Heidelberg (2006). https://doi.org/10.1007/11804192 17
4. Beillahi, S.M., Bouajjani, A., Enea, C.: Checking robustness against snapshot iso-
lation. CoRR, abs/1905.08406 (2019)
5. Berenson, H., Bernstein, P.A., Gray, J., Melton, J., O’Neil, E.J., O’Neil, P.E.:
A critique of ANSI SQL isolation levels. In: Carey, M.J., Schneider, D.A. (eds.)
Proceedings of the 1995 ACM SIGMOD International Conference on Management
of Data, San Jose, California, USA, 22–25 May 1995, pp. 1–10. ACM Press (1995)
Checking Robustness Against Snapshot Isolation 303
6. Bernardi, G., Gotsman, A.: Robustness against consistency models with atomic
visibility. In: Desharnais, J., Jagadeesan, R. (eds.) 27th International Conference on
Concurrency Theory, CONCUR 2016, 23–26 August 2016, Québec City, Canada.
LIPIcs, vol. 59, pp. 7:1–7:15. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik
(2016)
7. Bouajjani, A., Derevenetc, E., Meyer, R.: Checking and enforcing robustness
against TSO. In: Felleisen, M., Gardner, P. (eds.) ESOP 2013. LNCS, vol. 7792, pp.
533–553. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37036-
6 29
8. Bouajjani, A., Enea, C., Mutluergil, S.O., Tasiran, S.: Reasoning about TSO pro-
grams using reduction and abstraction. In: Chockler, H., Weissenbacher, G. (eds.)
CAV 2018. LNCS, vol. 10982, pp. 336–353. Springer, Cham (2018). https://doi.
org/10.1007/978-3-319-96142-2 21
9. Bouajjani, A., Meyer, R., Möhlmann, E.: Deciding robustness against total store
ordering. In: Aceto, L., Henzinger, M., Sgall, J. (eds.) ICALP 2011. LNCS, vol.
6756, pp. 428–440. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-
642-22012-8 34
10. Brutschy, L., Dimitrov, D., Müller, P., Vechev, M.T.: Serializability for eventual
consistency: criterion, analysis, and applications. In: Castagna, G., Gordon, A.D.
(eds.) Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Pro-
gramming Languages, POPL 2017, Paris, France, 18–20 January 2017, pp. 458–472.
ACM (2017)
11. Brutschy, L., Dimitrov, D., Müller, P., Vechev, M.T.: Static serializability analy-
sis for causal consistency. In: Foster, J.S., Grossman, D. (eds.) Proceedings of the
39th ACM SIGPLAN Conference on Programming Language Design and Imple-
mentation, PLDI 2018, Philadelphia, PA, USA, 18–22 June 2018, pp. 90–104. ACM
(2018)
12. Cerone, A., Gotsman, A.: Analysing snapshot isolation. J. ACM 65(2), 11:1–11:41
(2018)
13. Derevenetc, E., Meyer, R.: Robustness against power is PSpace-complete. In:
Esparza, J., Fraigniaud, P., Husfeldt, T., Koutsoupias, E. (eds.) ICALP 2014.
LNCS, vol. 8573, pp. 158–170. Springer, Heidelberg (2014). https://doi.org/10.
1007/978-3-662-43951-7 14
14. Gotsman, A., Yang, H., Ferreira, C., Najafzadeh, M., Shapiro, M.: ‘cause i’m strong
enough: reasoning about consistency choices in distributed systems. In: Bodı́k, R.,
Majumdar, R. (eds.) Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT
Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg,
FL, USA, 20–22 January 2016, pp. 371–384. ACM (2016)
15. Hawblitzel, C., Petrank, E., Qadeer, S., Tasiran, S.: Automated and modular refine-
ment reasoning for concurrent programs. In: Kroening, D., Păsăreanu, C.S. (eds.)
CAV 2015. LNCS, vol. 9207, pp. 449–465. Springer, Cham (2015). https://doi.org/
10.1007/978-3-319-21668-3 26
16. Holt, B., Bornholt, J., Zhang, I., Ports, D.R.K., Oskin, M., Ceze, L.: Disciplined
inconsistency with consistency types. In: Aguilera, M.K., Cooper, B., Diao, Y.
(eds.) Proceedings of the Seventh ACM Symposium on Cloud Computing, Santa
Clara, CA, USA, 5–7 October 2016, pp. 279–293. ACM (2016)
17. Kozen, D.: Lower bounds for natural proof systems. In: 18th Annual Symposium
on Foundations of Computer Science, Providence, Rhode Island, USA, October
31–1 November 1977, pp. 254–266. IEEE Computer Society (1977)
18. Lipton, R.J.: Reduction: a method of proving properties of parallel programs. Com-
mun. ACM 18(12), 717–721 (1975)
304 S. M. Beillahi et al.
19. Nagar, K., Jagannathan, S.: Automated detection of serializability violations under
weak consistency. In: Schewe, S., Zhang, L. (eds.) 29th International Conference on
Concurrency Theory, CONCUR 2018, 4–7 September 2018, Beijing, China. LIPIcs,
vol. 118, pp. 41:1–41:18. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2018)
20. Nagar, K., Jagannathan, S.: Automatic detection of serializability violations under
weak consistency. In: 29th International Conference on Concurrency Theory (CON-
CUR 2018), September 2018
21. Papadimitriou, C.H.: The serializability of concurrent database updates. J. ACM
26(4), 631–653 (1979)
22. Rackoff, C.: The covering and boundedness problems for vector addition systems.
Theoret. Comput. Sci. 6, 223–231 (1978)
23. Shasha, D.E., Snir, M.: Efficient and correct execution of parallel programs that
share memory. ACM Trans. Program. Lang. Syst. 10(2), 282–312 (1988)
24. TPC: Technical report, Transaction Processing Performance Council, February
2010. http://www.tpc.org/tpc documents current versions/pdf/tpc-c v5.11.0.pdf
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Efficient Verification of Network Fault
Tolerance via Counterexample-Guided
Refinement
Abstract. We show how to verify that large data center networks sat-
isfy key properties such as all-pairs reachability under a bounded num-
ber of faults. To scale the analysis, we develop algorithms that identify
network symmetries and compute small abstract networks from large
concrete ones. Using counter-example guided abstraction refinement, we
successively refine the computed abstractions until the given property
may be verified. The soundness of our approach relies on a novel notion
of network approximation: routing paths in the concrete network are not
precisely simulated by those in the abstract network but are guaranteed
to be “at least as good.” We implement our algorithms in a tool called
Origami and use them to verify reachability under faults for standard
data center topologies. We find that Origami computes abstract net-
works with 1–3 orders of magnitude fewer edges, which makes it possible
to verify large networks that are out of reach of existing techniques.
1 Introduction
This work was supported in part by NSF Grants 1703493 and 1837030, and gifts from
Cisco and Facebook. Any opinions, findings, and conclusions expressed are those of the
authors and do not necessarily reflect those of the NSF, Cisco or Facebook.
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 305–323, 2019.
https://doi.org/10.1007/978-3-030-25543-5_18
306 N. Giannarakis et al.
2 Key Ideas
d
d d
d
b1 b2 b3
b b1 b2 b3 12
b 3
b
a
a a
a
Fig. 1. All graph edges shown correspond to edges in the network topology, and we
draw edges as directed to denote the direction of forwarding eventually determined for
each node by the distributed routing protocols for a fixed destination d. In (a) nodes use
shortest path routing to route to the destination d. (b) shows a compressed network
that precisely captures the forwarding behavior of (a). (c) shows how forwarding is
impacted by a link failure, shown as a red line. (d) shows a compressed network that
is sound approximation of the original network for any single link failure. (Color figure
online)
Though there are a wide variety of routing protocols in use today, they share
a lot in common. Griffin et al. [16] showed that protocols like BGP and others
solve instances of the stable paths problem, a generalization of the shortest paths
problem, and Sobrinho [24] demonstrated their semantics and properties can be
Efficient Verfication of Network Fault Tolerance 309
where the choices from the neighbors of node u are defined as:
The constraints require that every node has selected the best attribute
(according to its preference relation) amongst those available from its neigh-
bors. The destination’s label must always be the initial attribute ad . For ver-
ification, this attribute (or parts of it) may be symbolic, which helps model
potentially unknown routing announcements from peers outside our network.
For other nodes u, the selected attribute a is the minimal attribute from the
choices available to u. Intuitively, to find the choices available to u, we consider
Efficient Verfication of Network Fault Tolerance 311
know that if a node has any route to the destination in the abstract network, so
do its concrete counterparts.
Limitations. Some properties are beyond the scope of our tool (independent of
the preference relation). For example, our model cannot reason about quantita-
tive properties such as bandwidth, probability of congestion, or latency.
d d
a b1 b2
a
b12
c12
c1 c2
g g
Fig. 2. Concrete network (left) and its corresponding abstraction (right). Nodes c1 , c2
prefer to route through b1 (resp. b2 ), or g over a. Node b1 (resp. b2 ) drops routing
messages that have traversed b2 (resp. b1 ). Red lines indicate a failed link. Dotted lines
show a topologically available but unused link. A purple arrow show a route unusable
by traffic from b1 . (Color figure online)
Monotonicity and isotonicity properties are often cited [7,8] as desirable prop-
erties of routing policies because they guarantee network convergence and pre-
vent persistent oscillation. In practice too, prior studies have revealed that almost
all real network configurations have these properties [13,19].
In our case, these properties help establish additional invariants that tie
the routing behavior of concrete and abstract networks together. To gain some
intuition as to why, consider the networks of Fig. 2. The concrete network on
the left runs BGP with the routing policy that node c1 (and c2 ) prefers to route
through node g instead of a, and that b1 drops announcements coming from
b2 . In this scenario, the similarly configured abstract node b12 can reach the
destination—it simply takes a route that happens to be less preferred by ĉ12
than it would if there had been no failure. However, in the concrete analogue,
b1 , is unable to reach the destination because c1 only sends it the route through
b2 , which it cannot use. In this case, the concrete network has more topological
Efficient Verfication of Network Fault Tolerance 313
paths than the abstract network, but, counterintuitively, due to the network’s
routing policy, this turns out to be a disadvantage. Hence having more paths
does not necessarily make nodes more accessible. As a consequence, in general,
abstract networks cannot soundly overapproximate the number of failures in a
concrete network—an important property for the soundness of our theory.
The underlying issue here is that the networks of Fig. 2 are not isotonic: sup-
pose L (c1 ) is the route from c1 to the destination through node a, we have that
L(c1 ) ≺ L (c1 ) but since the transfer function over b1 , c1 drops routes that have
traversed node b2 , we have that trans(b1 , c1 , L(c1 )) ≺ trans(b1 , c1 , L (c1 )).
Notice that L (c1 ) is essentially the route that the abstract network uses i.e.
12 ), hence the formula above implies that h(L(b1 )) ≺ L(
h(L (c1 )) = L(ĉ b̂12 )
which violates the notion of label approximation. Fortunately, if a network is
strictly monotonic and isotonic, such situations never arise. Moreover, we check
these properties via an SMT solver using a local and efficient test.
h(b)
∀a, b. a ≺ b ⇐⇒ h(a) ≺
We prove that when these conditions hold, we can approximate any solution of
the concrete network with a solution of the abstract network.
Theorem 1. Given a well-formed SPPF and its effective approximation SPPF,
for any solution S ∈ SPPF there exists a solution S ∈ SPPF,
such that their
labelling functions are label approximate.
314 N. Giannarakis et al.
1. Search the set of candidate refinements for the smallest plausible abstraction.
2. If the candidate abstraction satisfies the desired property, terminate the pro-
cedure. (We have successfully verified our concrete network.)
3. If not, examine whether the returned counterexample is an actual counterex-
ample. We do so, by computing the number of concrete failures and check
that it does not exceed the desired bound of link failures. (If so, we have
found a property violation.)
4. If not, use the counterexample to learn how to expand the abstract network
into a larger abstraction and repeat.
d d
d d
a1 a2 a3 a4
a
a13
a24
a12
a34
b1 b2 b3 b4
b
b
b12
b34
Fig. 3. Eight nodes in (a) are represented using two nodes in the abstract network (b).
Pictures (c) and (d) show two possible ways to refine the abstract network (b).
Both the search for plausible candidates and the way we learn a new abstrac-
tion to continue the counterexample-guided loop are explained below.
is plausible for one failure, but if b’s routing policy blocks routes of either a13 or
a24 then the abstract network will not be 1-fault tolerant. Indeed, it is the com-
plexity of routing policy that necessitates a heavy-weight verification procedure
in the first place, rather than a simpler graph algorithm alone.
In a plausible abstraction, if the verifier computes a solution to the network
that violates the desired fault-tolerance property, some node could not reach the
destination because one or more of their paths to the destination could not be used
to route traffic. We use the generated counterexample to learn edges that could
not be used to route traffic due to the policy on them. To do so, we inspect the
computed solution to find nodes u that (1) lack a route to the destination (i.e.
L(u) = ∞), (2) have a neighbor v that has a valid route to the destination, and
(3) the link between u and v is not failed. These conditions imply the absence of
a valid route to the destination not because link failures disabled all paths to the
destination, but because the network policy dropped some routes. For example, in
picture Fig. 3c, consider the case where b does not advertise routes from a13 and
a24 ; if the link between
a13 and d fails, then a13 has no route the destination and
we learn that the edge b,
a13 cannot be used. In fact, since a13 and a12 belonged
to the same abstract group a before we split them, their routing policies are equal
modulo the abstraction function by trans-equivalence. Hence, we can infer that in
a symmetric scenario, the link b, a24 will also be unusable.
Given a set of unuseable edges, learned from a counterexample, we restrict the
min cut problems that define the plausible abstractions, by disallowing the use of
those edges. Essentially, we enrich the refinement algorithm’s topological based
analysis (based on min-cut) with knowledge about the policy; the algorithm will
have to generate abstractions that are plausible without using those edges. With
those edges disabled, the refinement process continues as before.
6 Implementation
Origami uses the Batfish network analysis framework [12] to parse network con-
figurations, and then translate them into a pure functional intermediate repre-
sentation (IR) designed for network verification. This IR represents the structure
of routing messages and the semantics of transfer and preference relations using
standard functional data structures.
The translation generates a separate functional program for each destina-
tion subnet. In other words, if a network has 100 top-of-rack switches and each
such switch announces the subnets for 30 adjacent hosts, then Origami gener-
ates 100 functional programs (i.e. problem instances). We separately apply our
algorithms to each problem instance, converting the functional program to an
SMT formula when necessary according to the algorithm described earlier. Since
vendor routing configuration languages have limited expressive power (e.g., no
loops or recursion) the translation requires no user-provided invariants. We use
Z3 [10] to determine satisfiability of the SMT problems. Solving the problems
separately (and in parallel) provides a speedup over solving the routing problem
for all destinations simultaneously: The individual problems are specialized to a
Efficient Verfication of Network Fault Tolerance 317
Topo Con V/E Fail Abs V/E Ratio Abs Time SMT Calls SMT Time
1 9/20 55.5/400 0.1 1 0.1
3 40/192 12.5/41.67 1.0 2 7.6
FT20 500/8000
5 96/720 5.20/11.1 2.5 2 248
10 59/440 8.48/18.18 0.9 - -
1 12/28 166.7/2285.7 0.1 1 0.1
FT40 2000/64000 3 45/220 44.4/290.9 33 2 12.3
5 109/880 18.34/72.72 762.3 2 184.1
1 13/32 153.8/2000 0.2 1 0.1
SP40 2000/64000 3 39/176 51.3/363.6 30.3 1 2
5 79/522 25.3/122.6 372.2 1 22
1 20/66 37.2/164.8 0.1 3 1
FbFT 744/10880 3 57/360 13.05/30.22 1 4 18.3
5 93/684 8/15.9 408.9 - -
Fig. 4. Compression results. Topo: the network topology. Con V/E: Number of
nodes/edges of concrete network. Fail: Number of failures. Abs V/E: Number of
nodes/edges of the best abstraction. Ratio: Compression ratio (nodes/edges). Abs
Time: Time taken to find abstractions (sec.). SMT Calls: Number of calls to the
SMT solver. SMT Time: Time taken by the SMT solver (sec.).
To mitigate the effect of this problem, we could ask the solver to minimize
the returned counterexample, returning a counterexample that corresponds to
the fewest concrete link failures. We could do so by providing the solver with
additional constraints specifying the number of concrete links that correspond
318 N. Giannarakis et al.
to each abstract link and then asking the solver to return a counterexample that
minimizes this sum of concrete failures. Of course, doing so requires we solve a
more expensive optimization problem. Instead, given an initial (possibly spuri-
ous counter-example), we simple ask the solver to find a new counterexample
that (additionally) satisfies this constraint. If it succeeds, we have found a real
counterexample. If it fails, we use it to refine our abstraction.
7 Evaluation
We evaluate Origami on a collection of synthetic data center networks that are
using BGP to implement shortest-paths routing policies over common industrial
datacenter topologies. Data centers are good fit for our algorithms as they can
be very large but are highly symmetrical and designed for fault tolerance. Data
center topologies (often called fattree topologies) are typically organized in lay-
ers, with each layer containing many routers. Each router in a layer is connected
to a number of routers in the layer above (and below) it. The precise number of
neighbors to which a router is connected, and the pattern of said connections,
is part of the topology definition. We focus on two common topologies: fattree
topologies used at Google (labelled FT20, FT40 and SP40 below) and a different
fattree used at Facebook (labelled FB12). These are relatively large data center
topologies ranging from 500 to 2000 nodes and 8000 to 64000 edges.
SP40 uses a pure shortest paths routing policy. For other experiments (FT20,
FT40, FB12), we augment shortest paths with additional policy that selectively
drops routing announcements, for example disabling “valley routing” in various
places which allows up-down-up-down routes through the data centers instead
of just up-down routes. The pure shortest paths policy represents a best-case
scenario for our technology as it gives rise to perfect symmetry and makes our
heuristics especially effective. By adding variations in routing policy, we provide
a greater challenge for our tool.
Experiments were done on a Mac with a 4 GHz i7 CPU and 16 GB memory.
Figure 4 shows the level of compression achieved, along with the required time
for compression and verification. In most cases, we achieve a high compression
ratio especially in terms of links. This drastically reduces the possible failure
combinations for the underlying verification process. The cases of 10 link fail-
ures on FT20 and 5 link failures on FbFT demonstrate another aspect of our
algorithm. Both topologies cannot sustain that many link failures, i.e. some con-
crete nodes have less than 10 (resp. 5) neighbors. We can determine this as we
refine the abstraction; there are (abstract) nodes that do not satisfy the min
cut requirement and we cannot refine them further. This constitutes an actual
counterexample and explains why the abstraction of FT20 for 10 link failures is
smaller than the one for 5 link failures. Importantly, we did not use the SMT
solver to find this counterexample. Likewise, we did not need to run a min cut on
Efficient Verfication of Network Fault Tolerance 319
the much larger concrete topology. Intuitively, the rest of the network remained
abstract, while the part that led to the counterexample became fully concrete.
– Heuristics off means that (almost) all heuristics are turned off. We still try
to split nodes that are on the cut-set.
FT20 Abstractions
2624 2624
Abstraction size (# nodes)
272 2080 2080 2080 2080 1920 1920 2176 1920 2176 1920
250
1 5 15 25
Search Breadth
Heuristics off Reachable off Common off All Heuristics
Fig. 5. The initial abstraction of FT20 for 5 link failures using different heuristics and
search breadth. On top of the bars is the number of edges of each abstraction.
320 N. Giannarakis et al.
The results of this experiment show that in order to achieve effective compres-
sion ratios we need to employ both smart heuristics and a wide search through
the space of abstractions. It is possible that increasing the search breadth would
make the heuristics redundant, however, in most cases this would make the
refinement process exceed acceptable time limits.
Use of Counterexamples. We now assess how important it is to (1) use sym-
metries in policy to infer more information from counterexamples, and (2) min-
imize the counterexample provided by the solver.
We see in Fig. 6 that disabling them increases number of refinement itera-
tions. While each of these refinements is performed quickly, the same cannot
be guaranteed of the verification process that runs between them. Hence, it is
important to keep refinement iterations as low as possible.
8 Related Work
to such tools; it is used to alleviate the scaling problem that Minesweeper faces
with large networks.
With respect to verification of fault tolerance, ARC [13] translates a limited
class of routing policies to a weighted graph where fault-tolerance properties can
be checked using graph algorithms. However, ARC only handles shortest path
routing and cannot support stateful features such as BGP communities, or local
preference, etc. While ARC applies graph algorithms on a statically-computed
graph, we use graph algorithms as part of a refinement loop in conjunction with
a general purpose solver.
9 Conclusions
We present a new theory of distributed routing protocols in the presence of
bounded link failures, and we use the theory to develop algorithms for network
compression and counterexample-guided verification of fault tolerance proper-
ties. In doing so, we observe that (1) even though abstract networks route differ-
ently from concrete ones in the presence of failures, the concrete routes wind up
being “at least as good” as the abstract ones when networks satisfy reasonable
well-formedness constraints, and (2) using efficient graph algorithms (min cut)
in the middle of the CEGAR loop speeds the search for refinements.
We implemented our algorithms in a network verification tool called Origami.
Evaluation of the tool on synthetic networks shows that our algorithms accelerate
verification of fault tolerance properties significantly, making it possible to verify
networks out of reach of other state-of-the-art tools.
References
1. Ball, T., Majumdar, R., Millstein, T.D., Rajamani, S.K.: Automatic predicate
abstraction of C programs. In: Proceedings of the 2001 ACM SIGPLAN Confer-
ence on Programming Language Design and Implementation (PLDI), pp. 203–213
(2001)
2. Beckett, R., Gupta, A., Mahajan, R., Walker, D.: A general approach to network
configuration verification. In: SIGCOMM, August 2017
3. Beckett, R., Gupta, A., Mahajan, R., Walker, D.: Control plane compression. In:
Proceedings of the 2018 Conference of the ACM Special Interest Group on Data
Communication, pp. 476–489. ACM (2018)
4. Clarke, E.M., Filkorn, T., Jha, S.: Exploiting symmetry in temporal logic model
checking. In: Courcoubetis, C. (ed.) CAV 1993. LNCS, vol. 697, pp. 450–462.
Springer, Heidelberg (1993). https://doi.org/10.1007/3-540-56922-7 37
5. Clarke, E., Grumberg, O., Jha, S., Lu, Y., Veith, H.: Counterexample-guided
abstraction refinement. In: Emerson, E.A., Sistla, A.P. (eds.) CAV 2000. LNCS,
vol. 1855, pp. 154–169. Springer, Heidelberg (2000). https://doi.org/10.1007/
10722167 15
6. Clarke, E.M., Grumberg, O., Long, D.E.: Model checking and abstraction. ACM
Trans. Program. Lang. Syst. 16(5), 1512–1542 (1994)
7. Daggitt, M.L., Gurney, A.J.T., Griffin, T.G.: Asynchronous convergence of policy-
rich distributed bellman-ford routing protocols. In: SIGCOMM, pp. 103–116 (2018)
322 N. Giannarakis et al.
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
On the Complexity of Checking
Consistency for Replicated Data Types
1 Introduction
Recent distributed systems have introduced variations of familiar abstract
data types (ADTs) like counters, registers, flags, and sets, that provide high
availability and partition tolerance. These conflict-free replicated data types
(CRDTs) [33] efficiently resolve the effects of concurrent updates to replicated
data. Naturally they weaken consistency guarantees to achieve availability and
partition-tolerance, and various notions of weak consistency capture such guar-
antees [8,11,29,35,36].
In this work we study the tractability of CRDT consistency checking; Fig. 1
summarizes our results. In particular, we consider runtime verification: deciding
This work is supported in part by the European Research Council (ERC) under the
European Union’s Horizon 2020 research and innovation programme (grant agreement
No 678177).
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 324–343, 2019.
https://doi.org/10.1007/978-3-030-25543-5_19
On the Complexity of Checking Consistency for Replicated Data Types 325
Fig. 1. The complexity of consistency checking for various replicated data types. We
demonstrate intractability and tractability results in Sects. 3 and 4, respectively.
the variables of a given set of clauses such that exactly one literal per clause
is assigned true. Our reductions essentially simulate the existential choice of a
truth assignment with the existential choice of the read-from and happens-before
relations of an abstract execution. For a given 1-in-3 SAT instance, we construct
a history of replicas obeying carefully-tailored synchronization protocols, which
is consistent exactly when the corresponding SAT instance is positive.
Third, we develop tractable consistency-checking algorithms for individual
data types and special cases: replicated growing arrays; multi-value and last-
writer-wins registers, when each value is written only once; counters, when repli-
cas are bounded; and sets and flags, when their sizes are also bounded. While
the algorithms for each case are tailored to the algebraic properties of the data
types they handle, they essentially all function by constructing abstract execu-
tions incrementally, processing replicas’ operations in prefix order.
The remainder of this article is organized around our three key contributions:
– a read-from binary relation rf over operations in Op, which identifies the set
of updates needed to “explain” a certain return value, e.g., a write operation
explaining the return value of a read,
– a strict partial happens-before order hb, which includes ro and rf, representing
the causality constraints in an execution, and
– a strict total linearization order lin, which includes hb, used to model conflict
resolution policies based on timestamps.
In this work, we consider replicated data types which satisfy causal consis-
tency [26], i.e., updates which are related by cause and effect relations are
observed by all replicas in the same order. This follows from the fact that the
happens-before order is constrained to be a partial order, and thus transitive
(other forms of weak consistency don’t pose this constraint). Some of the repli-
cated data types we consider in this paper do not consider resolution policies
based on timestamps and in those cases, the linearization order can be ignored.
Fig. 2. The axiomatic semantics of replicated data types. Quantified variables are
implicitly distinct, and ∃!o denotes the existence of a unique operation o.
define them are listed in Figs. 2 and 3. These axioms use the function symbols
meth-od, arg-ument, and ret-urn interpreted over operation labels, whose seman-
tics is self-explanatory.
The Add-Wins Set and Remove-Wins Set [34] are two implementations of a repli-
cated set with operations add(x), remove(x), and contains(x) for adding, removing,
and checking membership of an element x. Although the meaning of these meth-
ods is self-evident from their names, the result of conflicting concurrent operations
is not evident. When concurrent add(x) and remove(x) operations are delivered to
a certain replica, the Add-Wins Set chooses to keep the element x in the set, so
every subsequent invocation of contains(x) on this replica returns true, while the
Remove-Wins Set makes the dual choice of removing x from the set.
The formal definition of their semantics uses abstract executions where
the read-from relation associates sets of add(x) and remove(x) operations to
contains(x) operations. Therefore, the predicate ReadOk(o1 , o2 ) is defined by
ReadFrom(ReadOk) ∧ ReadFromMaximal(ReadOk) ∧
ReadAllMaximals(ReadOk) ∧ RetvalSet(contains, true, add)
MVR returns all the values written by concurrent writes which are maximal
among its happens-before predecessors, therefore, leaving the responsibility for
solving conflicts between concurrent writes to the client, a read(x) operation of
LWW returns a single value chosen using a conflict-resolution policy based on
timestamps. Each written value is associated to a timestamp, and a read oper-
ation returns the most recent value w.r.t. the timestamps. This order between
timestamps is modeled using the linearization order of an abstract execution.
Therefore, the predicate ReadOk(o1 , o2 ) is defined by
meth(o1 ) = write ∧ meth(o2 ) = read ∧ arg1 (o1 ) = arg(o2 ) ∧ arg2 (o1 ) ∈ ret(o2 )
(we use arg1 (o1 ) to denote the first argument of a write operation, i.e., the register
name, and arg2 (o1 ) to denote its second argument, i.e., the written value) and
the MVR is defined by the following set of axioms:
ReadFrom(ReadOk) ∧ ReadFromMaximal(ReadOk) ∧
ReadAllMaximals(ReadOk) ∧ RetvalReg
where RetvalReg ensures that a read(x) operation reads from a write(x,v)
operation, for each value v in the set of returned values1 .
LWW is obtained from the definition of MVR by replacing ReadAllMax-
imals with the axiom LinLWW which ensures that every write(x, ) operation
which happens-before a read(x) operation is linearized before the write(x, ) oper-
ation from where the read(x) takes its value (when these two write operations
are different). This definition of LWW is inspired by the “bad-pattern” charac-
terization in [6], corresponding to their causal convergence criterion.
1
For simplicity, we assume that every history contains a set of write operations writing
the initial values of variables, which precede every other operation in replica order.
330 R. Biswas et al.
– every addAfter(a,b) operation reads-from the addAfter( ,a) adding the char-
acter a, except when a = ◦ which denotes the “root” element of the list3 ,
– every remove(a) operation reads-from the operation adding a, and
– every read operation returning a list containing a reads-from the operation
addAfter( ,a) adding a.
Fig. 4. The encoding of a 1-in-3 SAT problem m i=1 (αi ∨βi ∨γi ) over variables x1 , . . . , xn
as a 3-replica history of a flag data type. Besides the flag variable xj for each propo-
sitional variable xj , the encoding adds per-replica variables yj for synchronization
barriers.
Proof. We demonstrate
m a reduction from the 1-in-3 SAT problem. For a given
problem p = i=1 (αi ∨ βi ∨ γi ) over variables x1 , . . . , xn , we construct a 3-replica
history hp of the flag data type — either enable- or disable-wins — as illustrated
in Fig. 4. The encoding includes a flag variable xj for each propositional variable
xj , along with a per-replica flag variable yj used to implement synchronization
barriers. Intuitively, executions of hp proceed in m + 1 rounds: the first round
corresponds to the assignment of a truth valuation, while subsequent rounds
check the validity of each clause given the assignment. The reductions to sets
and registers are slight variations on this proof, in which the Read, Enable, and
Disable operations are replaced with Contains, Add, and Remove, respectively,
and Read and Writes of values 1 and 0, respectively.
It suffices to show that the constructed history hp is admitted if and only if
the given problem p is satisfiable. Since the flag data type does not constrain
the linearization relation of its abstract executions, we regard only the read-
from and happens-before components. It is straightforward to verify that the
happens-before relations of hp ’s abstract executions necessarily order:
true if ∃i.(ri = 0 ∧ αi = xj ) ∨ (ri = 1 ∧ βi = xj ) ∨ (ri = 2 ∧ γi = xj )
xj =
false otherwise,
is a satisfying assignment to p.
Theorem 1 establishes intractability of consistency for the aforementioned
sets, flags, and registers, independently from the number of replicas. In contrast,
our proof of Theorem 2 for counter data types depends on the number of replicas,
since our encoding requires two replicas per propositional variable. Intuitively,
since counter increments and decrements are commutative, the initial round in
the previous encoding would have fixed all counter values to zero. Instead, the
next encoding isolates initial increments and decrements to independent replicas.
The weaker result is indeed tight since checking counter consistency with a fixed
number of replicas is polynomial time, as Sect. 5 demonstrates.
Theorem 2. The admissibility problem for the Counter data type is NP-hard.
Proof. We demonstrate
m a reduction from the 1-in-3 SAT problem. For a given
problem p = i=1 (αi ∨ βi ∨ γi ) over variables x1 , . . . , xn , we construct a history
hp of the counter data type over 2n + 3 replicas, as illustrated in Fig. 5.
Besides the differences imposed due to the commutativity of counter incre-
ments and decrements, our reduction follows the same strategy as in the proof of
334 R. Biswas et al.
The consistency checking algorithm for RGA, LWW, and MVR is listed in
Algorithm 1. It computes the three relations rf, hb, and lin of an abstract execu-
tion using the datatype’s axioms. The history is declared consistent iff there exist
satisfying rf and hb relations, and the relations hb and lin computed this way are
acyclic. The acyclicity requirement comes from the definition of abstract execu-
tions where hb and lin are required to be partial/total orders. While an abstract
execution would require that lin is a total order, this algorithm computes a par-
tial linearization order. However, any total order compatible with this partial
linearization would satisfy the axioms of the datatype.
ComputeRF computes the read-from relation rf satisfying the ReadFrom
and Retval axioms. In the case of LWW and MVR, it defines rf as the set
On the Complexity of Checking Consistency for Replicated Data Types 335
Fig. 5. The encoding of a 1-in-3 SAT problem m i=1 (αi ∨βi ∨γi ) over variables x1 , . . . , xn
as the history of a counter over 2n+3 replicas. Besides the counter variables xj encoding
propositional variables xj , the encoding adds a variable y encoding the number of initial
increments and decrements, and a variable z to implement synchronization barriers.
of all pairs formed of write(x,v) and read(x) operations where v belongs to the
return value of the read. By Retval , each read(x) operation must be associated
to at least one write(x, ) operation. Also, the fact that each value is written
at most once implies that this rf relation is uniquely defined, e.g., for LWW,
it is not possible to find two write operations that could be rf related to the
same read operation. In general, if there exists no rf relation satisfying these
axioms, then ComputeRF returns a distinguished value ⊥ to signal a consistency
violation. Note that the computation of the read-from for LWW and MVR is
quadratic time4 since the constraints imposed by the axioms relate only to the
operation labels, the methods they invoke or their arguments. The case of RGA
is slightly more involved because the axiom RetvalRGA introduces more read-
from constraints based on the happens-before order which includes ro and the
rf itself. In this case, the computation of rf relies on a fixpoint computation,
which converges in at most quadratic time (the maximal size of rf), described
in Algorithm 2. Essentially, we use the axiom ReadFromRGA to populate the
4
Assuming constant time lookup/insert operations (e.g., using hashmaps), this com-
plexity is linear time.
336 R. Biswas et al.
read-from relation and then, apply the axiom RetvalRGA iteratively, using the
read-from constraints added in previous steps, until the computation converges.
After computing the read-from relation, our algorithm defines the happens-
before relation hb as the transitive closure of ro union rf. This is sound because
none of the axioms of these datatypes enforce new happens-before constraints,
which are not already captured by ro and rf. Then, it checks whether the hb
defined this way is acyclic and satisfies the datatype’s axioms that constrain hb,
i.e., ReadFromMaximal and ReadAllMaximals (when they are present).
Finally, in the case of LWW and RGA, the algorithm computes a (partial)
linearization order that satisfies the corresponding Lin axioms. Starting from
an initial linearization order which is exactly the happens-before, it computes
new constraints by instantiating the universally quantified axioms LinLWW
and LinRGA. Since these axioms are not “recursive”, i.e., they don’t enforce
linearization order constraints based on other linearization order constraints,
a standard instantiation of these axioms is enough to compute a partial lin-
earization order such that any extension to a total order satisfies the datatype’s
axioms.
Corollary 1. The admissibility problem is polynomial time for RGA, and for
LWW and MVR on differentiated histories.
On the Complexity of Checking Consistency for Replicated Data Types 337
Input: History h = (Op, ro), prefix map m, and set seen of invalid prefix maps
Output: true iff there exists read-from and happens-before relations rf and hb
such that m ⊆ hb, and h, rf, hb satisfies the counter axioms.
1 if m is complete then return true;
2 foreach replica i do
3 foreach replica j = i do
4 m ← m[i ← m(i) ∪ m(j)];
5 if m ∈ seen and checkCounter(h, m , seen) then
6 return true;
7 seen ← seen ∪ {m };
8 if ∃o1 . ro1 (lasti (m), o1 ) then
9 if meth(o1 ) = read and
arg(o1 ) = x ∧ ret(o1 ) = |{o ∈ m[i]|o = inc(x)}| − |{o ∈ m[i]|o = dec(x)}|
then
10 return false;
11 m ← m[i ← m(i) ∪ {o1 }];
12 if m ∈ seen and checkCounter(h, m , seen) then
13 return true;
14 seen ← seen ∪ {m };
15 return false;
Algorithm 3. The procedure checkCounter, where ro1 denotes immediate
ro-successor, and f [a ← b] updates function f with mapping a → b.
When the number of replicas is fixed, the number of prefix maps becomes
polynomial in the size of the history. This follows from the fact that prefixes are
uniquely defined by their ro-maximal operations, whose number is fixed.
While Theorem 1 shows that the admissibility problem is NP-complete for repli-
cated sets and flags even if the number of replicas is fixed, we show that this
problem becomes polynomial time when additionally, the number of values added
to the set, or the number of flags, is also fixed. Note that this doesn’t limit the
number of operations in the input history which can still be arbitrarily large. In
the following, we focus on the Add-Wins Set, the other cases being very similar.
We propose an algorithm for checking consistency which is actually an exten-
sion of the one presented in Sect. 5 for replicated counters. The additional com-
plexity in checking consistency for the Add-Wins Set comes from the validity
of contains(x) return values which requires identifying the maximal predecessors
in the happens-before relation that add or remove x (which are not necessarily
the maximal hb-predecessors all-together). In the case of counters, it was enough
just to count happens-before predecessors. Therefore, we extend the algorithm
for replicated counters such that along with the prefix map, we also keep track
of the hb-maximal add(x) and remove(x) operations for each element x and
each replica i. When extending a prefix map with a contains operation, these
hb-maximal operations (which define a witness for the read-from relation) are
enough to verify the RetValSet axiom. Extending the prefix of a replica with
an add or remove operation (issued on the same replica), or by merging the prefix
of another replica, may require an update of these hb-maximal predecessors.
When the number of replicas and elements are fixed, the number of read-
from maps is polynomial in the size of the history — recall that the number
of operations associated by a read-from map to a replica and set element is
bounded by the number of replicas. Combined with the number of prefix maps
being polynomial when the number of replicas is fixed, we obtain the following
result.
7 Related Work
Many have considered consistency models applicable to CRDTs, including causal
consistency [26], sequential consistency [27], linearizability [24], session consis-
tency [35], eventual consistency [36], and happens-before consistency [29]. Bur-
ckhardt et al. [8,11] propose a unifying framework to formalize these models.
Many have also studied the complexity of verifying data-type agnostic notions
of consistency, including serializability, sequential consistency and linearizabil-
ity [1,2,4,18,20,22,30], as well as causal consistency [6]. Our definition of the
replicated LWW register corresponds to the notion of causal convergence in [6].
This work studies the complexity of the admissibility problem for the repli-
cated LWW register. It shows that this problem is NP-complete in general and
polynomial time when each value is written only once. Our NP-completeness
result is stronger since it assumes a fixed number of replicas, and our algo-
rithm for the case of unique values is more general and can be applied uni-
formly to MVR and RGA. While Bouajjani et al. [5,14] consider the com-
plexity for individual linearizable collection types, we are the first to establish
(in)tractability of individual replicated data types. Others have developed effec-
tive consistency checking algorithms for sequential consistency [3,9,23,31], seri-
alizability [12,17,18,21], linearizability [10,16,28,37], and even weaker notions
like eventual consistency [7] and sequential happens-before consistency [13,15].
In contrast, we are the first to establish precise polynomial-time algorithms for
runtime verification of replicated data types.
8 Conclusion
By developing novel logical characterizations of replicated data types, reduc-
tions from propositional satisfiability checking, and tractable algorithms, we have
established a frontier of tractability for checking consistency of replicated data
types. As far as we are aware, our results are the first to characterize the asymp-
totic complexity consistency checking for CRDTs.
References
1. Alur, R., McMillan, K.L., Peled, D.A.: Model-checking of correctness conditions
for concurrent objects. Inf. Comput. 160(1–2), 167–188 (2000). https://doi.org/
10.1006/inco.1999.2847
2. Bingham, J.D., Condon, A., Hu, A.J.: Toward a decidable notion of sequential
consistency. In: Rosenberg, A.L., auf der Heide, F.M. (eds.) SPAA 2003: Proceed-
ings of the Fifteenth Annual ACM Symposium on Parallelism in Algorithms and
Architectures, San Diego, California, USA, (part of FCRC 2003), 7–9 June 2003,
pp. 304–313. ACM (2003). https://doi.org/10.1145/777412.777467
3. Bingham, J., Condon, A., Hu, A.J., Qadeer, S., Zhang, Z.: Automatic verification
of sequential consistency for unbounded addresses and data values. In: Alur, R.,
Peled, D.A. (eds.) CAV 2004. LNCS, vol. 3114, pp. 427–439. Springer, Heidelberg
(2004). https://doi.org/10.1007/978-3-540-27813-9 33
On the Complexity of Checking Consistency for Replicated Data Types 341
4. Bouajjani, A., Emmi, M., Enea, C., Hamza, J.: Verifying concurrent programs
against sequential specifications. In: Felleisen, M., Gardner, P. (eds.) ESOP 2013.
LNCS, vol. 7792, pp. 290–309. Springer, Heidelberg (2013). https://doi.org/10.
1007/978-3-642-37036-6 17
5. Bouajjani, A., Emmi, M., Enea, C., Hamza, J.: On reducing linearizability to state
reachability. Inf. Comput. 261(Part), 383–400 (2018). https://doi.org/10.1016/j.
ic.2018.02.014
6. Bouajjani, A., Enea, C., Guerraoui, R., Hamza, J.: On verifying causal consistency.
In: Castagna, G., Gordon, A.D. (eds.) Proceedings of the 44th ACM SIGPLAN
Symposium on Principles of Programming Languages, POPL 2017, Paris, France,
18–20 January 2017, pp. 626–638. ACM (2017). http://dl.acm.org/citation.cfm?
id=3009888
7. Bouajjani, A., Enea, C., Hamza, J.: Verifying eventual consistency of optimistic
replication systems. In: Jagannathan, S., Sewell, P. (eds.) The 41st Annual ACM
SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL
2014, San Diego, CA, USA, 20–21 January 2014, pp. 285–296. ACM (2014).
https://doi.org/10.1145/2535838.2535877
8. Burckhardt, S.: Principles of eventual consistency. Found. Trends Program. Lang.
1(1–2), 1–150 (2014). https://doi.org/10.1561/2500000011
9. Burckhardt, S., Alur, R., Martin, M.M.K.: Checkfence: checking consistency of
concurrent data types on relaxed memory models. In: Ferrante, J., McKinley, K.S.
(eds.) Proceedings of the ACM SIGPLAN 2007 Conference on Programming Lan-
guage Design and Implementation, San Diego, California, USA, 10–13 June 2007,
pp. 12–21. ACM (2007). https://doi.org/10.1145/1250734.1250737
10. Burckhardt, S., Dern, C., Musuvathi, M., Tan, R.: Line-up: a complete and auto-
matic linearizability checker. In: Zorn, B.G., Aiken, A. (eds.) Proceedings of the
2010 ACM SIGPLAN Conference on Programming Language Design and Imple-
mentation, PLDI 2010, Toronto, Ontario, Canada, 5–10 June 2010, pp. 330–340.
ACM (2010). https://doi.org/10.1145/1806596.1806634
11. Burckhardt, S., Gotsman, A., Yang, H., Zawirski, M.: Replicated data types: spec-
ification, verification, optimality. In: Jagannathan, S., Sewell, P. (eds.) The 41st
Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Lan-
guages, POPL 2014, San Diego, CA, USA, 20–21 January 2014, pp. 271–284. ACM
(2014). https://doi.org/10.1145/2535838.2535848
12. Cohen, A., O’Leary, J.W., Pnueli, A., Tuttle, M.R., Zuck, L.D.: Verifying correct-
ness of transactional memories. In: Proceedings of the 7th International Confer-
ence on Formal Methods in Computer-Aided Design, FMCAD 2007, Austin, Texas,
USA, 11–14 November 2007, pp. 37–44. IEEE Computer Society (2007). https://
doi.org/10.1109/FAMCAD.2007.40
13. Emmi, M., Enea, C.: Monitoring weak consistency. In: Chockler, H., Weissenbacher,
G. (eds.) CAV 2018, Part I. LNCS, vol. 10981, pp. 487–506. Springer, Cham (2018).
https://doi.org/10.1007/978-3-319-96145-3 26
14. Emmi, M., Enea, C.: Sound, complete, and tractable linearizability monitoring for
concurrent collections. PACMPL 2(POPL), 25:1–25:27 (2018). https://doi.org/10.
1145/3158113
15. Emmi, M., Enea, C.: Weak-consistency specification via visibility relaxation.
PACMPL 3(POPL), 60:1–60:28 (2019). https://dl.acm.org/citation.cfm?id=
3290373
342 R. Biswas et al.
16. Emmi, M., Enea, C., Hamza, J.: Monitoring refinement via symbolic reasoning. In:
Grove, D., Blackburn, S. (eds.) Proceedings of the 36th ACM SIGPLAN Confer-
ence on Programming Language Design and Implementation, Portland, OR, USA,
15–17 June 2015, pp. 260–269. ACM (2015). https://doi.org/10.1145/2737924.
2737983
17. Emmi, M., Majumdar, R., Manevich, R.: Parameterized verification of transac-
tional memories. In: Zorn, B.G., Aiken, A. (eds.) Proceedings of the 2010 ACM
SIGPLAN Conference on Programming Language Design and Implementation,
PLDI 2010, Toronto, Ontario, Canada, 5–10 June 2010, pp. 134–145. ACM (2010).
https://doi.org/10.1145/1806596.1806613
18. Farzan, A., Madhusudan, P.: Monitoring atomicity in concurrent programs. In:
Gupta, A., Malik, S. (eds.) CAV 2008. LNCS, vol. 5123, pp. 52–65. Springer, Hei-
delberg (2008). https://doi.org/10.1007/978-3-540-70545-1 8
19. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory
of NP-Completeness. W. H. Freeman, New York (1979)
20. Gibbons, P.B., Korach, E.: Testing shared memories. SIAM J. Comput. 26(4),
1208–1244 (1997). https://doi.org/10.1137/S0097539794279614
21. Guerraoui, R., Henzinger, T.A., Jobstmann, B., Singh, V.: Model checking transac-
tional memories. In: Gupta, R., Amarasinghe, S.P. (eds.) Proceedings of the ACM
SIGPLAN 2008 Conference on Programming Language Design and Implementa-
tion, Tucson, AZ, USA, 7–13 June 2008, pp. 372–382. ACM (2008). https://doi.
org/10.1145/1375581.1375626
22. Hamza, J.: On the complexity of linearizability. In: Bouajjani, A., Fauconnier, H.
(eds.) NETYS 2015. LNCS, vol. 9466, pp. 308–321. Springer, Cham (2015). https://
doi.org/10.1007/978-3-319-26850-7 21
23. Henzinger, T.A., Qadeer, S., Rajamani, S.K.: Verifying sequential consistency on
shared-memory multiprocessor systems. In: Halbwachs, N., Peled, D. (eds.) CAV
1999. LNCS, vol. 1633, pp. 301–315. Springer, Heidelberg (1999). https://doi.org/
10.1007/3-540-48683-6 27
24. Herlihy, M., Wing, J.M.: Linearizability: a correctness condition for concurrent
objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990). https://doi.
org/10.1145/78969.78972
25. Kingsbury, K.: Jepsen: Distributed systems safety research (2016). https://jepsen.io
26. Lamport, L.: Time, clocks, and the ordering of events in a distributed system.
Commun. ACM 21(7), 558–565 (1978). https://doi.org/10.1145/359545.359563
27. Lamport, L.: How to make a multiprocessor computer that correctly executes mul-
tiprocess programs. IEEE Trans. Comput. 28(9), 690–691 (1979). https://doi.org/
10.1109/TC.1979.1675439
28. Lowe, G.: Testing for linearizability. Concurr. Comput. Pract. Exp. 29(4) (2017).
https://doi.org/10.1002/cpe.3928
29. Manson, J., Pugh, W., Adve, S.V.: The java memory model. In: Palsberg, J.,
Abadi, M. (eds.) Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium
on Principles of Programming Languages, POPL 2005, Long Beach, California,
USA, 12–14 January 2005, pp. 378–391. ACM (2005). https://doi.org/10.1145/
1040305.1040336
30. Papadimitriou, C.H.: The serializability of concurrent database updates. J. ACM
26(4), 631–653 (1979). https://doi.org/10.1145/322154.322158
31. Qadeer, S.: Verifying sequential consistency on shared-memory multiprocessorsby
model checking. IEEE Trans. Parallel Distrib. Syst. 14(8), 730–741 (2003). https://
doi.org/10.1109/TPDS.2003.1225053
On the Complexity of Checking Consistency for Replicated Data Types 343
32. Roh, H., Jeon, M., Kim, J., Lee, J.: Replicated abstract data types: building blocks
for collaborative applications. J. Parallel Distrib. Comput. 71(3), 354–368 (2011).
https://doi.org/10.1016/j.jpdc.2010.12.006
33. Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M.: Conflict-free replicated data
types. In: Défago, X., Petit, F., Villain, V. (eds.) SSS 2011. LNCS, vol. 6976, pp.
386–400. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24550-
3 29
34. Shapiro, M., Preguiça, N.M., Baquero, C., Zawirski, M.: Convergent and
commutative replicated data types. Bull. EATCS 104, 67–88 (2011).
http://eatcs.org/beatcs/index.php/beatcs/article/view/120
35. Terry, D.B., Demers, A.J., Petersen, K., Spreitzer, M., Theimer, M., Welch, B.B.:
Session guarantees for weakly consistent replicated data. In: Proceedings of the
Third International Conference on Parallel and Distributed Information Systems
(PDIS 1994), Austin, Texas, USA, 28–30 September 1994, pp. 140–149. IEEE Com-
puter Society (1994). https://doi.org/10.1109/PDIS.1994.331722
36. Terry, D.B., Theimer, M., Petersen, K., Demers, A.J., Spreitzer, M., Hauser, C.:
Managing update conflicts in bayou, a weakly connected replicated storage system.
In: Jones, M.B. (ed.) Proceedings of the Fifteenth ACM Symposium on Operat-
ing System Principles, SOSP 1995, Copper Mountain Resort, Colorado, USA, 3–6
December 1995, pp. 172–183. ACM (1995). https://doi.org/10.1145/224056.224070
37. Wing, J.M., Gong, C.: Testing and verifying concurrent objects. J. Parallel Distrib.
Comput. 17(1–2), 164–182 (1993). https://doi.org/10.1006/jpdc.1993.1015
38. Wolper, P.: Expressing interesting properties of programs in propositional temporal
logic. In: Conference Record of the Thirteenth Annual ACM Symposium on Prin-
ciples of Programming Languages, St. Petersburg Beach, Florida, USA, January
1986, pp. 184–193. ACM Press (1986). https://doi.org/10.1145/512644.512661
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Communication-Closed Asynchronous
Protocols
1 Introduction
Fault tolerance protocols provide dependable services on top of unreliable com-
puters and networks. One distinguishes asynchronous vs. synchronous pro-
tocols based on the semantics of parallel composition. Asynchronous proto-
cols are crucial parts of many distributed systems for their better perfor-
mance when compared against the synchronous ones. However, their correct-
ness is very hard to obtain, due to the challenges of concurrency, faults,
buffered message queues, and message loss and re-ordering at the network
[5,19,21,26,31,35,37,42]. In contrast, reasoning about synchronous round-based
semantics is simpler, as one only has to consider specific global states at round
boundaries [1,8,10,11,13,17,29,32,40].
The question we address is how to connect both worlds, in order to exploit
the advantage of verification in synchronous semantics when reasoning about
asynchronous protocols. We consider asynchronous protocols that work in unre-
liable networks, which may lose and reorder messages, and where processes may
crash. We focus on a class of protocols that solve state machine replication.
Due to the absence of a global clock, fault tolerance protocols implement an
abstract notion of time to coordinate. The local state of a process maintains the
Supported by: Austrian Science Fund (FWF) via NFN RiSE (S11405) and project
PRAVDA (P27722); WWTF grant APALACHE (ICT15-103); French National
Research Agency ANR project SAFTA (12744-ANR-17-CE25-0008-01).
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 344–363, 2019.
https://doi.org/10.1007/978-3-030-25543-5_20
Communication-Closed Asynchronous Protocols 345
value of the abstract time (potentially implicit), and a process timestamps the
messages it sends accordingly. Synchronous algorithms do not need to imple-
ment an abstract notion of time: it is embedded in the definition of any syn-
chronous computational model [9,15,18,28], and it is called the round number.
The key insight of our results is the existence of a correspondence between val-
ues of the abstract clock in the asynchronous systems and round numbers in
the synchronous ones. Using this correspondence, we make explicit the “hidden”
round-based synchronous structure of an asynchronous algorithm.
out(2,p2)
out(2,p2) P1
P1
out(2,p2)
out(2,p2) P2
P2
P3NewBallot 1
P3 AckBallot 1 NewBallot 2 AckBallot 2
(a) (b)
AckBallot 20
NewBallot 20
AckBallot 20 out(20,p2)
NewBallot 20
out(20,p2) P1
P1
P2
out(20,p2) P2
out(20,p2)
P3 P3
out(20,p2) Ballot 1 Ballot 2 Ballot 19 Ballot 20 out(20,p2)
(a) (b)
Fig. 3. Control flow graph of asynchronous leader election. (Color figure online)
P2, P3’s NewBallot message from ballot 1. However, P2 ignores it, because of
the receive statement in line 14 that only accepts messages for greater or equal
(ballot, label) pairs. The message from ballot 1 arrived too “late” because P2
already is in ballot 2. Thus, the messages from ballot 1 have the same effect as if
they were dropped, as in Fig. 1(b). The executions are equivalent from the local
perspective of the processes: By applying a “rubber band transformation” [30],
one can reorder transitions, while maintaining the local control flow and the
send/receive causality.
Another case of equivalent executions is given in Fig. 2. While P1 and P2
made progress, P3 was disconnected. In Fig. 2(a), while P3 is waiting for ballot 1
messages, the networks delivers a message for ballot 20. P3 receives this message
in line 29 and updates ballot in line 35. P3 thus “jumps forward in time”,
acknowledging P2’s leadership in ballot 20. In Fig. 2(b), P3’s timeout expires in
all ballots from 1 to 19, without P3 receiving any messages. Thus, it does not
change its local state (except the ballot number) in these ballots. For P3, these
two executions are stutter equivalent. Reducing verification to verification of
executions as the ones to the right — i.e., synchronous executions — reduces the
number of interleavings and drastically simplifies verification. In the following
we discuss conditions on the code that allow such a reduction.
Communication Closure. In our example, the variables ballot and label encode
abstract time: Let b and be their assigned values. Then abstract time ranges
over T = {(b, ) : b ∈ N, ∈ {NewBallot, AckBallot}}. We fix NewBallot to
be less than AckBallot, and consider the lexicographical order over T . The
sequence of (b, ) induced by an execution at a process is monotonically increas-
ing; thus (b, ) encodes a notion of time. A protocol is communication-closed if
(i) each process sends only messages timestamped with the current time, and (ii)
each process receives only messages timestamped with the current or a higher
time value. For such protocols we show in Sect. 5 that for each asynchronous exe-
cution, there is an equivalent (processes go through the same sequence of local
states) synchronous one. We use ideas from [17], but we allow reacting to future
messages, which is a more permissive form of communication closure. This is
essential for jumping forward, and thus for liveness in fault tolerance protocols.
The challenge is to check communication closure at the code level. For this,
we rely on user-provided “tag” annotations that specify the variables and the
message fields representing local time and timestamps. A system of assertions
formalizes that the user-provided annotations encode time and that the protocol
is communication-closed w.r.t. this definition of time. In the example, the user
provides (ballot, label) for local time and msg->bal and msg->lab for times-
tamps. In Fig. 3, we give example assertions that we add for the send and receive
conditions (i) and (ii). These assertions only consider the local state, i.e., we do
not need to capture the states of other processes or the message pool. We check
the assertions with the static verifier Verifast [22].
Fig. 4. Control flow graph of synchronous leader election. (Color figure online)
2 Asynchronous Protocols
All processes execute the same code, written in the core language in Fig. 5. The
communication between processes is done via typed messages. Message payloads,
denoted M, are wrappers of primitive or composite type. We denote by M the set
of message types. Wrappers are used to distinguish payload types. Send instruc-
tions take as input an object of some payload type and the receivers identity
or corresponding to a send to all. Receives statements are non-blocking, and
return an object of payload type or NULL. Receive statements are parameterized
by conditions (i.e., pointers to function) on the values in the received messages
(e.g., timestamp). At most one message is received at a time. If no message has
been delivered or satisfies the condition, receive returns NULL. In Fig. 3, we give
the definition of the function eq, used to filter messages acknowledging the lead-
ership of a process. The followers use also geq that checks if the received message
is timestamped with a value higher or equal to the local time. We assume that
each loop contains at least one send or receive statement. The iterative sequen-
tial computations are done in local functions, i.e., f( #»e ). The instructions in()
and out() are used to communicate with an external environment.
The semantics of a protocol P is the asynchronous parallel composition of n
copies of the same code, one copy per process, where n is a parameter. Formally,
the state of a protocol P is a tuple s, msg where: s ∈ [P → (Vars ∪ {pc}) → D]
is a valuation in some data domain D of the variables in P, where pc is represents
control location, where Loc is the set of all protocol locations, and
the current
msg ⊆ M∈M (P × D(M) × P ) is the multiset of messages in transit (the network
may lose and reorder messages). Given a process p ∈ P , s(p) is the local state
of p, which is a valuation of p’s local variables, i.e., s(p) ∈ Varsp ∪ {pcp } → D.
The state of a crashed process is a wildcard state that matches any state. The
messages sent by a process are added to the global pool of messages msg, and
350 A. Damian et al.
a receive statement removes a messages from the pool. The interface operations
in and out do not modify the local state of a process. An execution is an
infinite sequence s0 A0 s1 A1 . . . such that ∀i ≥ 0, si is a protocol state,
Ai ∈ A is a local statement, whose execution creates a transition of the form
I,O
s, msg −→ s , msg where {I, O} are the observable events generated by the
Ai (if any). We denote by [[P]] the set of executions of the protocol P.
For every 1 ≤ i ≤ m, v2i−1 is called a phase tag and v2i is called step tag.
Given an execution π ∈ [[P]], a transition sAs in π is tagged by [[tagm]]m if
A is send(m) or m = recv(∗cond), or A is tagged by [[tags]]s otherwise.
For Fig. 3, SyncV = (v1 , v2 ), and tags matches v1 and v2 with ballot
and label, resp., at all control locations, i.e., a process is in step NewBallot
of phase 3, when ballot = 3 and label = NewBallot. For the type msg,
tagm matches the field ballot and lab with v1 and v2 , resp., i.e., a message
(3, NewBallot, 5) is a phase 3 step NewBallot message. To capture that mes-
sages of type A are sent locally before messages of type B, the tagging function
tagm(B) should be defined on the same synchronization variables as tagm(A).
Condition (I.) states that SyncV is not decreased by any local statement (it is
a notion of time). Further, one synchronization pair is modified at a time, except
a reset (i.e., a pair is set to its minimal value) when the value of a preceding
pair is updated. Checking this, translates into checking a transition invariant,
stating that the value of the synchronization tuple SyncV is increased by any
assignment. To state this invariant we introduce “old synchronization variables”
that maintain the value of the synchronization variables before the update.
Condition (II.) states that any message sent is tagged with a timestamp that
equals the current local time. Checking it, reduces to an assert statement that
expresses that for every v ∈ SyncV, tagm(M)(v) = tags(pc)(v), where M is the
type of the message m which is sent, and pc is the program location of the send.
Condition (III.) states that any message received is tagged with a timestamp
greater than or equal to the current time of the process. To check it, we need
to consider the implementation of the functions passed as argument to a recv
statement. These functions (e.g., eq and geq in Fig. 3) implement the filtering of
the messages delivered by the network. We inline their code and prove Condition
(III.) by comparing the tagged fields of message variables with the phase and
354 A. Damian et al.
The reduction preserves so-called local properties [7], among which are con-
sensus and state machine replication.
Proof Sketch. There are two cases to consider. Case (1): every receive transi-
m=recv (∗cond)
tion s −−−−−−−−−−→ sr in ae satisfies that [[tags]]sr = [[tagm]]m , i.e., all mes-
sages received are timestamped with the current local tag of the receiver. We
use commutativity arguments to reorder transitions so that we obtain an indis-
tinguishable asynchronous execution in which the transition tags are globally
non-decreasing: The interesting case is if a send comes before a lower tagged
receive in ae. Then the tags of the two transitions imply that the transitions
concern different messages so that swapping them cannot violate send/receive
causality.
We exploit that in the protocols we consider, no correct process locally keeps
the tags unchanged forever (e.g., stays in a ballot forever) to arrive at an execu-
tion where the subsequence of transitions with the same tag is finite. Still, the
resulting execution is not an mHO execution; e.g., for the same tag a receive
may happen before a send on a different process. Condition (V.) ensures that
mHO send-receive-update order is respected locally at each process. From this,
together with the observation that sends are left movers, and updates are right
movers, we obtain a global send-receive-update order which implies that the
resulting execution is a mHO execution.
m=recv (∗cond)
Case (2): there is a transition s −−−−−−−−−−→ sr in ae such that [[tags]]sr ≺
[[tagm]]m , that is, a process receives a message with tag k , higher than its state
tag k. In mHO, a process only receives for its current round. To bring the asyn-
chronous execution in such a form, we use Condition (IV.) and mHO semantics,
where each process goes through all rounds. First, Condition (IV.) ensures that
the process must update the tag variables to k at some point t after receiving it,
if it wants to use the content of the message. It ensures that the process stutters
during the time instance between k and k , w.r.t. the values of the variables
which are not of message type. That is, for the intermediate values of abstract
time, between k and k , no messages are sent, received, and no computation
is performed. We split ae at point t and add empty send instructions, receive
instructions, and instructions that increment the synchronization variables, until
the tag reaches k . If we do this for each jump in ae, we arrive at an indistin-
guishable asynchronous execution that falls into the Case (1).
356 A. Damian et al.
Case 1: If the CFG is like in Fig. 8(a), i.e., it consists of one loop, where the
phase tag ph is incremented once at the beginning of each loop iteration, and for
every value of the step tag st there is exactly one assignment in the loop body
(the same on all paths). In this case, the phase tag takes the same values as the
Communication-Closed Asynchronous Protocols 357
loop iteration counter (maybe shifted with some initial value). Therefore, the
loop body defines the code of an mHO-phase. It is easy to structure it into two
mHO-rounds: the code of round A is the part of the CFG from the beginning of
the loop’s body up to the second assignment of the st variable, and round B is
the rest of the code up to the end of the loop body.
Case 2: The CFG is like in Fig. 8(b). It differs from Case 1 in that the same
value is assigned to st in different branches. Each of this assignments marks the
beginning of a mHO round B, which thus has multiple entry points. In mHO, a
round only has one entry point. To simulate the multiple entry points in mHO,
we store in auxiliary variables the values of the conditions along the paths that
led to the entry point. In the figure, the code of round A is given by the red box,
and the code of round B by the condition in the first blue box, expressed on the
auxiliary variable, followed by the respective branches in the blue box.
In our example in Fig. 3, the assignment label = AckBallot appears in the
leader and the follower branch. Followers send and receive AckBallot messages
only if they have received a NewBallot. The rewrite introduces old mbox1 in the
mHO protocol in Fig. 4 to store this information. Also, we eliminate the variables
ballot and label; they are subsumed by the phase and round number of mHO.
Case 3: Let us assume that the CFG is like in Fig. 8(c). It differs from Case 1
because the phase tag ph is assigned twice. We rewrite it into asynchronous code
that falls into Case 1 or 2. The resulting CFG is sketched in Fig. 8(d), with only
one assignment to ph at the beginning of the loop.
If the second assignment changes the value of ph, then there is a jump. In
case of a jump, the beginning of a new phase does not coincide with the first
instruction of the loop. Thus there might be multiple entry points for a phase. We
introduce (non-deterministic) branching in the control flow to capture different
entry points: In case there is no jump, the green followed by the purple edge are
executed within the same phase. In case of a jump, the rewritten code allows the
green and the purple paths to be executed in different phases; first the green,
and then the purple in a later phase. We add empty loops to simulate the phases
that are jumped over. As a pure non-deterministic choice at the top of the loop
would be too imprecise, we use the variable jump to make sure that the purple
edge is executed only once prior to green edge. In case of multiple assignments,
we perform this transformation iteratively for each assignment.
The protocol in Fig. 4 is obtained using two optimizations of the previous
construction: First we do not need empty loops. They are subsumed by the
mHO semantics as all local state changes are caused by some message reception.
Thus, an empty loop is simulated by the execution of a phase with empty HO
sets. Second, instead of adding jump variables, we reuse the non-deterministic
value of mbox. This is possible as the jump is preconditioned by a cardinality
constraint on the mbox, and the green edge is empty (assignments to ballot
and label correspond to ph++ and reception loops have been reduced to havoc
statements).
358 A. Damian et al.
Nesting. Cases 1–3 capture loops without nesting. Nested loops are rewritten
into inter-procedural mHO protocols, using the structure of the tag annotations
from Sect. 4. Each loop is rewritten into one protocol, starting with the most
inner loop using the procedure above. For each outer loop, it first replaces the
nested loop with a call to the computed mHO protocol, and then applies the same
rewriting procedure. Interpreting each loop as a protocol is pessimistic, and our
rewriting may generate deeper nesting than necessary. Inner loops appearing on
different branches may belong to the same sub-protocol, so that these different
loops exchange messages. If tags associates different synchronization variables
to different loops then the rewriting builds one (sub-)protocol for each loop.
Otherwise, the rewriting merges the loops into one mHO protocol. To soundly
merge several loops into the same mHO protocol, the rewrite algorithm identifies
the context in which the inner loop is executed.
7 Experimental Results
Fig. 9. Benchmarks. The superscript * identifies protocols that jump over phases. The
superscript V marks protocols whose synchronous counterpart we verified.
Benchmarks. Our tool has rewritten several challenging benchmarks: the algo-
rithm from [6, Fig. 6] solves consensus using a failure detector. The algorithm
jumps to a specific decision round, if a special decision message is received. Multi-
Paxos is the Paxos algorithm from [25] over sequences, without fast paths, where
the classic path is repeated as long as the leader is stable. Roughly, it does a
leader election similar to our running example (NewBallot is Phase1a), except
that the last all-to-all round is replaced by one back-and-forth communication
between the leader and its quorum: the leader receives n/2 acknowledgments that
contain also the log of its followers (Phase1b). The leader computes the maximal
log and sends it to all (Phase1aStart). In a subprotocol, a stable leader accepts
client requests, and broadcasts them one by one to its followers. The broadcast
is implemented by three rounds, Phase2aClassic, Phase2bClassic, Learn, and is
repeated as long as the leader is stable. ViewChange is a leader election algo-
rithm similar to the one in ViewStamped [34]. Normal-Op is the subprotocol
used in ViewStamped to implement the broadcasting of new commands by a
stable leader. The last column of Fig. 9 gives the size of the mHO protocol com-
puted by the rewriting. The implementation uses pycparser [3], to obtain the
abstract syntax tree of the input protocol.
References
1. Aminof, B., Rubin, S., Stoilkovska, I., Widder, J., Zuleger, F.: Parameterized model
checking of synchronous distributed algorithms by abstraction. In: Dillig, I., Pals-
berg, J. (eds.) VMCAI 2018. LNCS, vol. 10747, pp. 1–24. Springer, Cham (2018).
https://doi.org/10.1007/978-3-319-73721-8 1
2. Bakst, A., von Gleissenthall, K., Kici, R.G., Jhala, R.: Verifying distributed pro-
grams via canonical sequentialization. PACMPL 1(OOPSLA), 110:1–110:27 (2017)
3. Bendersky, E.: pycparser. https://github.com/eliben/pycparser. Accessed 7 Nov
2018
Communication-Closed Asynchronous Protocols 361
4. Bouajjani, A., Enea, C., Ji, K., Qadeer, S.: On the completeness of verifying mes-
sage passing programs under bounded asynchrony. In: Chockler, H., Weissenbacher,
G. (eds.) CAV 2018, Part II. LNCS, vol. 10982, pp. 372–391. Springer, Cham
(2018). https://doi.org/10.1007/978-3-319-96142-2 23
5. Chandra, T.D., Griesemer, R., Redstone, J.: Paxos made live: an engineering per-
spective. In: PODC, pp. 398–407 (2007)
6. Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed sys-
tems. J. ACM 43(2), 225–267 (1996)
7. Chaouch-Saad, M., Charron-Bost, B., Merz, S.: A reduction theorem for the veri-
fication of round-based distributed algorithms. In: Bournez, O., Potapov, I. (eds.)
RP 2009. LNCS, vol. 5797, pp. 93–106. Springer, Heidelberg (2009). https://doi.
org/10.1007/978-3-642-04420-5 10
8. Charron-Bost, B., Debrat, H., Merz, S.: Formal verification of consensus algorithms
tolerating malicious faults. In: Défago, X., Petit, F., Villain, V. (eds.) SSS 2011.
LNCS, vol. 6976, pp. 120–134. Springer, Heidelberg (2011). https://doi.org/10.
1007/978-3-642-24550-3 11
9. Charron-Bost, B., Schiper, A.: The heard-of model: computing in distributed sys-
tems with benign faults. Distrib. Comput. 22(1), 49–71 (2009)
10. Chou, C., Gafni, E.: Understanding and verifying distributed algorithms using
stratified decomposition. In: PODC, pp. 44–65 (1988)
11. Debrat, H., Merz, S.: Verifying fault-tolerant distributed algorithms in the heard-of
model. In: Archive of Formal Proofs 2012 (2012)
12. Desai, A., Garg, P., Madhusudan, P.: Natural proofs for asynchronous programs
using almost-synchronous reductions. In: Proceedings of the 2014 ACM Interna-
tional Conference on Object Oriented Programming Systems Languages & Appli-
cations, OOPSLA 2014, Part of SPLASH 2014, Portland, OR, USA, 20–24 October
2014, pp. 709–725 (2014)
13. Drăgoi, C., Henzinger, T.A., Veith, H., Widder, J., Zufferey, D.: A logic-based
framework for verifying consensus algorithms. In: McMillan, K.L., Rival, X. (eds.)
VMCAI 2014. LNCS, vol. 8318, pp. 161–181. Springer, Heidelberg (2014). https://
doi.org/10.1007/978-3-642-54013-4 10
14. Drăgoi, C., Henzinger, T.A., Zufferey, D.: PSync: a partially synchronous language
for fault-tolerant distributed algorithms. In: POPL, pp. 400–415 (2016)
15. Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial syn-
chrony. JACM 35(2), 288–323 (1988)
16. Elmas, T., Qadeer, S., Tasiran, S.: A calculus of atomic actions. In: Proceedings
of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming
Languages, POPL 2009, Savannah, GA, USA, 21–23 January 2009, pp. 2–15 (2009)
17. Elrad, T., Francez, N.: Decomposition of distributed programs into communication-
closed layers. Sci. Comput. Program. 2(3), 155–173 (1982)
18. Gafni, E.: Round-by-round fault detectors: unifying synchrony and asynchrony
(extended abstract). In: PODC, pp. 143–152 (1998)
19. Garcı́a-Pérez, Á., Gotsman, A., Meshman, Y., Sergey, I.: Paxos consensus, decon-
structed and abstracted. In: Ahmed, A. (ed.) ESOP 2018. LNCS, vol. 10801, pp.
912–939. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-89884-1 32
20. von Gleissenthall, K., Gökhan Kici, R., Bakst, A., Stefan, D., Jhala, R.: Pre-
tend synchrony: synchronous verification of asynchronous distributed programs.
PACMPL 3(POPL), 59:1–59:30 (2019)
21. Hawblitzel, C., et al.: IronFleet: proving safety and liveness of practical distributed
systems. Commun. ACM 60(7), 83–92 (2017)
362 A. Damian et al.
22. Jacobs, B., Smans, J., Piessens, F.: A quick tour of the verifast program verifier. In:
Ueda, K. (ed.) APLAS 2010. LNCS, vol. 6461, pp. 304–311. Springer, Heidelberg
(2010). https://doi.org/10.1007/978-3-642-17164-2 21
23. Konnov, I.V., Lazic, M., Veith, H., Widder, J.: A short counterexample property for
safety and liveness verification of fault-tolerant distributed algorithms. In: POPL,
pp. 719–734 (2017)
24. Kragl, B., Qadeer, S., Henzinger, T.A.: Synchronizing the asynchronous. In: CON-
CUR, pp. 21:1–21:17 (2018)
25. Lamport, L.: Generalized consensus and paxos. Technical report, March 2005.
https://www.microsoft.com/en-us/research/publication/generalized-consensus-
and-paxos/
26. Lesani, M., Bell, C.J., Chlipala, A.: Chapar: certified causally consistent distributed
key-value stores. In: POPL, pp. 357–370 (2016)
27. Lipton, R.J.: Reduction: a method of proving properties of parallel programs. Com-
mun. ACM 18(12), 717–721 (1975)
28. Lynch, N.: Distributed Algorithms. Morgan Kaufman, San Francisco (1996)
29. Marić, O., Sprenger, C., Basin, D.: Cutoff bounds for consensus algorithms. In:
Majumdar, R., Kunčak, V. (eds.) CAV 2017, Part II. LNCS, vol. 10427, pp. 217–
237. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63390-9 12
30. Mattern, F.: On the relativistic structure of logical time in distributed systems. In:
Parallel and Distributed Algorithms, pp. 215–226 (1989)
31. Moraru, I., Andersen, D.G., Kaminsky, M.: There is more consensus in Egalitarian
parliaments. In: SOSP, pp. 358–372 (2013)
32. Moses, Y., Rajsbaum, S.: A layered analysis of consensus. SIAM J. Comput. 31(4),
989–1021 (2002)
33. de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R.,
Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg
(2008). https://doi.org/10.1007/978-3-540-78800-3 24
34. Oki, B.M., Liskov, B.: Viewstamped replication: a general primary copy. In: PODC,
pp. 8–17 (1988)
35. Ongaro, D., Ousterhout, J.K.: In search of an understandable consensus algorithm.
In: 2014 USENIX Annual Technical Conference, USENIX ATC 2014, pp. 305–319
(2014)
36. Padon, O., McMillan, K.L., Panda, A., Sagiv, M., Shoham, S.: Ivy: safety verifica-
tion by interactive generalization. In: PLDI, pp. 614–630 (2016)
37. Rahli, V., Guaspari, D., Bickford, M., Constable, R.L.: Formal specification, veri-
fication, and implementation of fault-tolerant systems using EventML. ECEASST
72 (2015)
38. Sergey, I., Wilcox, J.R., Tatlock, Z.: Programming and proving with distributed
protocols. PACMPL 2(POPL), 28:1–28:30 (2018)
39. Stoilkovska, I., Konnov, I., Widder, J., Zuleger, F.: Verifying safety of synchronous
fault-tolerant algorithms by bounded model checking. In: Vojnar, T., Zhang, L.
(eds.) TACAS 2019, Part II. LNCS, vol. 11428, pp. 357–374. Springer, Cham
(2019). https://doi.org/10.1007/978-3-030-17465-1 20
40. Tsuchiya, T., Schiper, A.: Verification of consensus algorithms using satisfiability
solving. Distrib. Comput. 23(5–6), 341–358 (2011)
41. Wilcox, J.R., et al.: Verdi: a framework for implementing and formally verifying
distributed systems. In: PLDI, pp. 357–368 (2015)
42. Woos, D., Wilcox, J.R., Anton, S., Tatlock, Z., Ernst, M.D., Anderson, T.E.: Plan-
ning for change in a formal verification of the RAFT consensus protocol. In: CPP,
pp. 154–165 (2016)
Communication-Closed Asynchronous Protocols 363
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Verification and Invariants
Interpolating Strong Induction
1 Introduction
The principle of strong induction, also known as k-induction, is a generalization
of (simple) induction that extends the base- and inductive-cases to k steps of a
transition system [27]. A safety property P is k-inductive in a transition system
T iff (a) P is true in the first (k − 1) steps of T , and (b) if P is assumed to hold
for (k − 1) consecutive steps, then P holds in k steps of T . Simple induction
is equivalent to 1-induction. Unlike induction, strong induction is complete for
safety properties: a property P is safe in a transition system T iff there exists a
natural number k such that P is k-inductive in T (assuming the usual restriction
to simple paths). This makes k-induction a powerful method for unbounded SAT-
based Model Checking (SMC).
Unlike other SMC techniques, strong induction reduces model checking to
pure SAT that does not require any additional features such as solving with
assumptions [12], interpolation [24], resolution proofs [17], Maximal Unsatis-
fiable Subsets (MUS) [2], etc. It easily integrates with existing SAT-solvers
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 367–385, 2019.
https://doi.org/10.1007/978-3-030-25543-5_21
368 H. G. Vediramana Krishnan et al.
Related Work. kAvy builds on top of the ideas of IC3 [7] and Pdr [13]. The
use of interpolation for generating an inductive trace is inspired by Avy [29].
While conceptually, our algorithm is similar to Avy, its proof of correctness is
non-trivial and is significantly different from that of Avy. We are not aware of
any other work that combines interpolation with strong induction.
There are two prior attempts enhancing Pdr-style algorithms with k-
induction. Pd-Kind [19] is an SMT-based Model Checking algorithm for infinite-
state systems inspired by IC3/Pdr. It infers k-inductive invariants driven by
the property whereas kAvy infers 1-inductive invariants driven by k-induction.
Pd-Kind uses recursive blocking with interpolation and model-based projection
to block bad states, and k-induction to propagate (push) lemmas to next level.
While the algorithm is very interesting it is hard to adapt it to SAT-based setting
(i.e. SMC), and impossible to compare on HWMCC instances directly.
The closest related work is KIC3 [16]. It modifies the counter example queue
management strategy in IC3 to utilize k-induction during blocking. The main
limitation is that the value for k must be chosen statically (k = 5 is reported for
the evaluation). kAvy also utilizes k-induction during blocking but computes
the value for k dynamically. Unfortunately, the implementation is not available
publicly and we could not compare with it directly.
2 Background
In this section, we present notations and background that is required for the
description of our algorithm.
370 H. G. Vediramana Krishnan et al.
Example 1. For Fig. 1, F = [c = 0, c < 66] is a safe trace of size 1. The formula
(c < 66) ∧ Tr ∧ ¬(c < 66) is satisfiable. Therefore, there does not exists an
extension trace at level 1. Since (c = 0) ∧ Tr ∧ (c < 66) ∧ T r ∧ (c ≥ 66) is
unsatisfiable, the trace is extendable at level 0. For example, a valid extension
trace at level 0 is G = [c = 0, c < 2, c < 66].
Both Pdr and Avy iteratively extend a safe trace either until the extension
is closed or a counterexample is found. However, they differ in how exactly the
trace is extended. In the rest of this section, we present Avy and Pdr through
the lens of extension level. The goal of this presentation is to make the paper self-
contained. We omit many important optimization details, and refer the reader
to the original papers [7,13,29].
Pdr maintains a monotone, clausal trace F with Init as the first frame (F0 ).
The trace F is extended by recursively computing and blocking (if possible)
states that can reach Bad (called bad states). A bad state is blocked at the largest
level possible. Algorithm 1 shows PdrBlock, the backward search procedure
that identifies and blocks bad states. PdrBlock maintains a queue of states
and the levels at which they have to be blocked. The smallest level at which
blocking occurs is tracked in order to show the construction of the extension
trace. For each state s in the queue, it is checked whether s can be blocked by
the previous frame Fd−1 (line 5). If not, a predecessor state t of s that satisfies
Fd−1 is computed and added to the queue (line 7). If a predecessor state is found
at level 0, the trace is not extendable and an empty trace is returned. If the state
s is blocked at level d, PdrIndGen, is called to generate a clause that blocks
s and possibly others. The clause is then added to all the frames at levels less
than or equal to d. PdrIndGen is a crucial optimization to Pdr. However, we
do not explain it for the sake of simplicity. The procedure terminates whenever
there are no more states to be blocked (or a counterexample was found at line 4).
By construction, the output trace G is an extension trace of F at the extension
level w. Once Pdr extends its trace, PdrPush is called to check if the clauses
it learnt are also true at higher levels. Pdr terminates when the trace is closed.
Interpolating Strong Induction 373
4 Interpolating k-Induction
In this section, we present kAvy, an SMC algorithm that uses the principle
of strong induction to extend an inductive trace. The section is structured as
follows. First, we introduce a concept of extending a trace using relative k-
induction. Second, we present kAvy and describe the details of how k-induction
is used to compute an extended trace. Third, we describe two techniques for com-
puting maximal parameters to apply strong induction. Unless stated otherwise,
we assume that all traces are monotone.
A safe trace F , with |F | = N , is strongly extendable with respect to (i, k),
where 1 ≤ k ≤ i + 1 ≤ N + 1, iff there exists a safe inductive trace G stronger
than F such that |G| > N and Tr [Fi ]k ⇒ Gi+1 . We refer to the pair (i, k) as a
strong extension level (SEL), and to the trace G as an (i, k)-extension trace, or
simply a strong extension trace (SET) when (i, k) is not important. Note that
for k = 1, G is just an extension trace.
Example 2. For Fig. 1, the trace F = [c = 0, c < 66] is strongly extendable at
level 1. A valid (1, 2)-extension trace is G = [c = 0, (c = 65) ∧ (c < 66), c < 66].
Note that (c < 66) is 2-inductive relative to F1 , i.e. Tr [F1 ]2 ⇒ (c < 66).
374 H. G. Vediramana Krishnan et al.
We write K(F ) for the set of all SELs of F . We define an order on SELs by:
(i1 , k1 ) (i2 , k2 ) iff (i) i1 < i2 ; or (ii) i1 = i2 ∧ k1 > k2 . The maximal SEL is
max(K(F )).
Note that the existence of a SEL (i, k) means that an unrolling of the i-suffix
with Fi repeated k times does not contain any bad states. We use Tr F i k to
denote this characteristic formula for SEL (i, k):
Tr [Fi ]i+1
i+1−k ∧ Tr [F
i+1
] if 0 ≤ i < N
Tr F i k = N +1 (6)
Tr [FN ]N +1−k if i = N
Proposition 2. Let F be a safe trace, where |F | = N . Then, (i, k), 1 ≤ k ≤
i+1 ≤ N +1, is an SEL of F iff the formula Tr F i k ∧Bad (v̄N +1 ) is unsatisfiable.
The level i in the maximal SEL (i, k) of a given trace F is greater or equal
to the maximal extension level of F :
Lemma 1. Let (i, k) = max(K(F )), then i ≥ max(W(F )).
Hence, extensions based on maximal SEL are constructed from frames at higher
level compared to extensions based on maximal extension level.
Example 3. For Fig. 1, the trace [c = 0, c < 66] has a maximum extension level
of 0. Since (c < 66) is 2-inductive, the trace is strongly extendable at level 1 (as
was seen in Example 2).
kAvy Algorithm. kAvy is shown in Fig. 3. It starts with an inductive trace
F = [Init] and iteratively extends F using SELs. A counterexample is returned
if the trace cannot be extended (line 4). Otherwise, kAvy computes the largest
extension level (line 5) (described in Sect. 4.2). Then, it constructs a strong
extension trace using kAvyExtend (line 6) (described in Sect. 4.1). Finally,
PdrPush is called to check whether the trace is closed. Note that F is a mono-
tone, clausal, safe inductive trace throughout the algorithm.
Interpolating Strong Induction 375
Note that in (♥), both i and k are fixed—they are the (i, k)-extension level.
Furthermore, in the top row Fi is fixed as well.
The conjunction of the first k interpolants in I is k-inductive relative to the
frame Fi :
i+1
Lemma 2. The formula Fi+1 ∧ Im is k-inductive relative to Fi .
m=i−k+2
Proof. Since Fi and Fi+1 are consecutive frames of a trace, Fi ∧Tr ⇒ Fi+1 . Thus,
j
∀i−k+2 ≤ j ≤ i·Tr [Fi ]i−k+2 ⇒ Fi+1 (v̄j+1 ). Moreover, by (♥), Fi ∧Tr ⇒ Ii−k+2
and ∀i − k + 2 ≤ j ≤ i + 1 · (Fi ∧ Ij ) ∧ Tr ⇒ Ij+1 . Equivalently, ∀i − k + 2 ≤
j ≤ i + 1 · Tr [Fi ]ji−k+2 ⇒ Ij+1 (v̄j+1 ). By induction over the difference between
i+1
(i + 1) and (i − k + 2), we show that Tr [Fi ]i+1
i−k+2 ⇒ (Fi+1 ∧ m=i−k+2 Im )(v̄i+1 ),
which concludes the proof.
We use Lemma 2 to define a strong extension trace G:
Lemma 3. Let G = [G0 , . . . , GN +1 ], be an inductive trace defined as follows:
⎧
⎪
⎪ F if 0 ≤ j < i − k + 2
⎪ j
⎪
⎪
⎪ j
⎨F ∧ Im if i − k + 2 ≤ j < i + 2
j
Gj = m=i−k+2
⎪
⎪
⎪(Fj ∧ Ij )
⎪ if i + 2 ≤ j < N + 1
⎪
⎪
⎩
IN +1 if j = (N+1)
376 H. G. Vediramana Krishnan et al.
By (♥), ∀i < j ≤ N · (Fj ∧ Ij ) ∧ Tr ⇒ Ij+1 . Again, since F is a trace, we
conclude that ∀i < j < N · (Fj ∧ Ij ) ∧ Tr ⇒ (Fj+1 ∧ Ij+1 ) . Combining the above,
Gj ∧ Tr ⇒ Gj+1 for 0 ≤ j ≤ N . Since F is safe and IN +1 ⇒ ¬Bad , then G is
safe and stronger than F .
Correctness of Phase 1 (line 5) follows from the loop invariant Inv2 . It holds
on loop entry since Gi ∧ Tr ⇒ Ii−k+2 (since Gi = Fi and (♥)) and Gi ∧ Tr ⇒
Gi+1 (since G is initially a trace). Let Gi and G∗i be the ith frame before and
after execution of iteration j of the loop, respectively. PdrBlock blocks ¬Pj
at iteration j of the loop. Assume that Inv2 holds at the beginning of the loop.
Then, G∗i ⇒ Gi ∧ Pj since PdrBlock strengthens Gi . Since Gj ⇒ Gi and
Gi ⇒ Gi+1 , this simplifies to G∗i ⇒ Gj ∨ (Gi ∧ Ij+1 ). Finally, since G is a trace,
Inv2 holds at the end of the iteration.
Inv2 ensures that the trace given to PdrBlock at line 5 can be made safe
relative to Pj . From the post-condition of PdrBlock, it follows that at iteration
j, Gi+1 is strengthened to G∗i+1 such that G∗i+1 ⇒ Pj and G remains a monotone
clausal trace. At the end of Phase 1, [G0 , . . . , Gi+1 ] is a clausal monotone trace.
Interestingly, the calls to PdrBlock in this phase do not satisfy an expected
pre-condition: the frame Gi in [Init, Gi , Gi+1 ] might not be safe for property Pj .
However, we can see that Init ⇒ Pj and from Inv2 , it is clear that Pj is inductive
relative to Gi . This is a sufficient precondition for PdrBlock.
378 H. G. Vediramana Krishnan et al.
Phase 2. This phase strengthens Gi+1 using the interpolant Ii+1 . After Phase 2,
Gi+1 is k-inductive relative to Fi .
Algorithm 5. A top down alg. for the Algorithm 6. A bottom up alg. for
maximal SEL. the maximal SEL.
Input: A transition system Input: A transition system
T = (Init, Tr , Bad ) T = (Init, Tr , Bad )
Input: An extendable monotone clausal Input: An extendable monotone
safe trace F of size N clausal safe trace F of size N
Output: max(K(F )) Output: max(K(F ))
1 i←N 1 j←N
2 while i > 0 do 2 while j > 0 do
3 if ¬isSat(Tr F i i+1 ∧ Bad (v̄N +1 )) 3 if ¬isSat(Tr F j 1 ∧ Bad (v̄N +1 ))
then break then break
4 i ← (i − 1) 4 j ← (j − 1)
5 k←1 5 (i, k) ← (j, 1) ; j ← (j + 1) ; ← 2
6 while k < i + 1 do 6 while ≤ (j + 1) ∧ j ≤ N do
7 if ¬isSat(Tr F i k ∧ Bad (v̄N +1 )) then 7 if isSat(Tr F j ∧ Bad (v̄N +1 ))
break then ← ( + 1)
8 k ← (k + 1) 8 else
9 return (i, k) 9 (i, k) ← (j, )
10 j ← (j + 1)
11 return (i, k)
Note that k depends on i. For a SEL (i, k) ∈ K(F ), we refer to the formula
Tr [F i ] as a suffix and to number k as the depth of induction. Thus, the search
can be split into two phases: (a) find the smallest suffix while using the maximal
depth of induction allowed (for that suffix), and (b) minimizing the depth of
induction k for the value of i found in step (a). This is captured in Algorithm 5.
The algorithm requires at most (N + 1) sat queries. One downside, however, is
that the formulas constructed in the first phase (line 3) are large because the
depth of induction is the maximum possible.
Fig. 2. Runtime comparison on SAFE HWMCC instances (a) and shift instances (b).
5 Evaluation
We implemented kAvy on top of the Avy Model Checker1 . For line 5 of Algo-
rithm 3 we used Algorithm 5. We evaluated kAvy’s performance against a version
of Avy [29] from the Hardware Model Checking Competition 2017 [5], and the
Pdr engine of ABC [13]. We have used the benchmarks from HWMCC’14, ’15,
and ’17. Benchmarks that are not solved by any of the solvers are excluded from
the presentation. The experiments were conducted on a cluster running Intel E5-
2683 V4 CPUs at 2.1 GHz with 8 GB RAM limit and 30 min time limit.
The results are summarized in Table 1. The HWMCC has a wide variety of
benchmarks. We aggregate the results based on the competition, and also bench-
mark origin (based on the name). Some named categories (e.g., intel ) include
benchmarks that have not been included in any competition. The first column in
Table 1 indicates the category. Total is the number of all available benchmarks,
ignoring duplicates. That is, if a benchmark appeared in multiple categories,
it is counted only once. Numbers in brackets indicate the number of instances
that are solved uniquely by the solver. For example, kAvy solves 14 instances
in oc8051 that are not solved by any other solver. The VBS column indicates
the Virtual Best Solver —the result of running all the three solvers in parallel
and stopping as soon as one solver terminates successfully.
Overall, kAvy solves more safe instances than both Avy and Pdr, while
taking less time than Avy (we report time for solved instances, ignoring time-
outs). The VBS column shows that kAvy is a promising new strategy, signifi-
cantly improving overall performance. In the rest of this section, we analyze the
1
All code, benchmarks, and results are available at https://arieg.bitbucket.io/avy/.
Interpolating Strong Induction 381
Table 1. Summary of instances solved by each tool. Timeouts were ignored when
computing the time column.
results in more detail, provide detailed run-time comparison between the tools,
and isolate the effect of the new k-inductive strategy.
To compare the running time, we present scatter plots comparing kAvy
and Avy (Fig. 3a), and kAvy and Pdr (Fig. 3b). In both figures, kAvy is at
the bottom. Points above the diagonal are better for kAvy. Compared to Avy,
whenever an instance is solved by both solvers, kAvy is often faster, sometimes
by orders of magnitude. Compared to Pdr, kAvy and Pdr perform well on
very different instances. This is similar to the observation made by the authors
of the original paper that presented Avy [29]. Another indicator of performance
is the depth of convergence. This is summarized in Fig. 3d and e. kAvy often
converges much sooner than Avy. The comparison with Pdr is less clear which
is consistent with the difference in performance between the two. To get the
whole picture, Fig. 2a presents a cactus plot that compares the running times of
the algorithms on all these benchmarks.
To isolate the effects of k-induction, we compare kAvy to a version of kAvy
with k-induction disabled, which we call vanilla. Conceptually, vanilla is
similar to Avy since it extends the trace using a 1-inductive extension trace,
but its implementation is based on kAvy. The results for the running time and
the depth of convergence are shown in Fig. 3c and f, respectively. The results
are very clear—using strong extension traces significantly improves performance
and has non-negligible affect on depth of convergence.
Finally, we discovered one family of benchmarks, called shift, on which kAvy
performs orders of magnitude better than all other techniques. The benchmarks
come from encoding bit-vector decision problem into circuits [21,30]. The shift
family corresponds to deciding satisfiability of (x + y) = (x << 1) for two
382 H. G. Vediramana Krishnan et al.
Fig. 3. Comparing running time ((a), (b), (c)) and depth of convergence ((d), (e), (f))
of Avy, Pdr and vanilla with kAvy. kAvy is shown on the x-axis. Points above the
diagonal are better for kAvy. Only those instances that have been solved by both
solvers are shown in each plot.
6 Conclusion
2
We used the k-induction engine ind in Abc [8].
Interpolating Strong Induction 383
Acknowledgements. We thank the anonymous reviewers and Oded Padon for their
thorough review and insightful comments. This research was enabled in part by sup-
port provided by Compute Ontario (https://computeontario.ca/), Compute Canada
(https://www.computecanada.ca/) and the grants from Natural Sciences and Engi-
neering Research Council Canada.
References
1. Audemard, G., Lagniez, J.-M., Szczepanski, N., Tabary, S.: An adaptive parallel
SAT solver. In: Rueher, M. (ed.) CP 2016. LNCS, vol. 9892, pp. 30–48. Springer,
Cham (2016). https://doi.org/10.1007/978-3-319-44953-1 3
2. Belov, A., Marques-Silva, J.: MUSer2: an efficient MUS extractor. JSAT 8(3/4),
123–128 (2012)
3. Berryhill, R., Ivrii, A., Veira, N., Veneris, A.G.: Learning support sets in IC3 and
Quip: the good, the bad, and the ugly. In: 2017 Formal Methods in Computer Aided
Design, FMCAD 2017, Vienna, Austria, 2–6 October 2017, pp. 140–147 (2017)
4. Biere, A., Cimatti, A., Clarke, E., Zhu, Y.: Symbolic model checking without
BDDs. In: Cleaveland, W.R. (ed.) TACAS 1999. LNCS, vol. 1579, pp. 193–207.
Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49059-0 14
5. Biere, A., van Dijk, T., Heljanko, K.: Hardware model checking competition 2017.
In: Stewart, D., Weissenbacher, G. (eds.) 2017 Formal Methods in Computer Aided
Design, FMCAD 2017, Vienna, Austria, 2–6 October 2017, p. 9. IEEE (2017)
6. Bjørner, N., Gurfinkel, A., McMillan, K., Rybalchenko, A.: Horn clause solvers for
program verification. In: Beklemishev, L.D., Blass, A., Dershowitz, N., Finkbeiner,
B., Schulte, W. (eds.) Fields of Logic and Computation II. LNCS, vol. 9300, pp.
24–51. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23534-9 2
7. Bradley, A.R.: SAT-based model checking without unrolling. In: Jhala, R.,
Schmidt, D. (eds.) VMCAI 2011. LNCS, vol. 6538, pp. 70–87. Springer, Heidel-
berg (2011). https://doi.org/10.1007/978-3-642-18275-4 7
8. Brayton, R., Mishchenko, A.: ABC: an academic industrial-strength verification
tool. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS, vol. 6174, pp.
24–40. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14295-6 5
9. Champion, A., Mebsout, A., Sticksel, C., Tinelli, C.: The Kind 2 model checker.
In: Chaudhuri, S., Farzan, A. (eds.) CAV 2016. LNCS, vol. 9780, pp. 510–517.
Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41540-6 29
10. Craig, W.: Three uses of the Herbrand-Gentzen theorem in relating model theory
and proof theory. J. Symb. Log. 22(3), 269–285 (1957)
11. de Moura, L., et al.: SAL 2. In: Alur, R., Peled, D.A. (eds.) CAV 2004. LNCS,
vol. 3114, pp. 496–500. Springer, Heidelberg (2004). https://doi.org/10.1007/978-
3-540-27813-9 45
12. Eén, N., Mishchenko, A., Amla, N.: A single-instance incremental SAT formulation
of proof- and counterexample-based abstraction. In: Proceedings of 10th Interna-
tional Conference on Formal Methods in Computer-Aided Design, FMCAD 2010,
Lugano, Switzerland, 20–23 October, pp. 181–188 (2010)
13. Eén, N., Mishchenko, A., Brayton, R.K.: Efficient implementation of prop-
erty directed reachability. In: International Conference on Formal Methods in
Computer-Aided Design, FMCAD 2011, Austin, TX, USA, October 30–02 Novem-
ber 2011, pp. 125–134 (2011)
384 H. G. Vediramana Krishnan et al.
14. Garoche, P.-L., Kahsai, T., Tinelli, C.: Incremental invariant generation using logic-
based automatic abstract transformers. In: Brat, G., Rungta, N., Venet, A. (eds.)
NFM 2013. LNCS, vol. 7871, pp. 139–154. Springer, Heidelberg (2013). https://
doi.org/10.1007/978-3-642-38088-4 10
15. Gurfinkel, A., Ivrii, A.: Pushing to the top. In: Formal Methods in Computer-
Aided Design, FMCAD 2015, Austin, Texas, USA, 27–30 September 2015, pp.
65–72 (2015)
16. Gurfinkel, A., Ivrii, A.: K-induction without unrolling. In: 2017 Formal Methods
in Computer Aided Design, FMCAD 2017, Vienna, Austria, 2–6 October 2017, pp.
148–155 (2017)
17. Heule, M., Hunt Jr., W.A., Wetzler, N.: Trimming while checking clausal proofs. In:
Formal Methods in Computer-Aided Design, FMCAD 2013, Portland, OR, USA,
20–23 October 2013, pp. 181–188 (2013)
18. Järvisalo, M., Heule, M.J.H., Biere, A.: Inprocessing rules. In: Gramlich, B., Miller,
D., Sattler, U. (eds.) IJCAR 2012. LNCS (LNAI), vol. 7364, pp. 355–370. Springer,
Heidelberg (2012). https://doi.org/10.1007/978-3-642-31365-3 28
19. Jovanovic, D., Dutertre, B.: Property-directed k-induction. In: 2016 Formal Meth-
ods in Computer-Aided Design, FMCAD 2016, Mountain View, CA, USA, 3–6
October 2016, pp. 85–92 (2016)
20. Kahsai, T., Ge, Y., Tinelli, C.: Instantiation-based invariant discovery. In: Bobaru,
M., Havelund, K., Holzmann, G.J., Joshi, R. (eds.) NFM 2011. LNCS, vol. 6617, pp.
192–206. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20398-
5 15
21. Kovásznai, G., Fröhlich, A., Biere, A.: Complexity of fixed-size bit-vector logics.
Theory Comput. Syst. 59(2), 323–376 (2016)
22. Liang, J.H., Ganesh, V., Poupart, P., Czarnecki, K.: Learning rate based branching
heuristic for SAT solvers. In: Creignou, N., Le Berre, D. (eds.) SAT 2016. LNCS,
vol. 9710, pp. 123–140. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-
40970-2 9
23. Liang, J.H., Oh, C., Mathew, M., Thomas, C., Li, C., Ganesh, V.: Machine
learning-based restart policy for CDCL SAT solvers. In: Beyersdorff, O., Win-
tersteiger, C.M. (eds.) SAT 2018. LNCS, vol. 10929, pp. 94–110. Springer, Cham
(2018). https://doi.org/10.1007/978-3-319-94144-8 6
24. McMillan, K.L.: Interpolation and SAT-based model checking. In: Hunt, W.A.,
Somenzi, F. (eds.) CAV 2003. LNCS, vol. 2725, pp. 1–13. Springer, Heidelberg
(2003). https://doi.org/10.1007/978-3-540-45069-6 1
25. McMillan, K.L.: Interpolation and model checking. In: Clarke, E., Henzinger, T.,
Veith, H., Bloem, R. (eds.) Handbook of Model Checking, pp. 421–446. Springer,
Cham (2018)
26. Mebsout, A., Tinelli, C.: Proof certificates for SMT-based model checkers for
infinite-state systems. In: 2016 Formal Methods in Computer-Aided Design,
FMCAD 2016, Mountain View, CA, USA, 3–6 October 2016, pp. 117–124 (2016)
27. Sheeran, M., Singh, S., Stålmarck, G.: Checking safety properties using induction
and a SAT-solver. In: Hunt, W.A., Johnson, S.D. (eds.) FMCAD 2000. LNCS,
vol. 1954, pp. 127–144. Springer, Heidelberg (2000). https://doi.org/10.1007/3-
540-40922-X 8
28. Vizel, Y., Grumberg, O.: Interpolation-sequence based model checking. In: Pro-
ceedings of 9th International Conference on Formal Methods in Computer-Aided
Design, FMCAD 2009, 15–18 November 2009, Austin, Texas, USA, pp. 1–8 (2009)
Interpolating Strong Induction 385
29. Vizel, Y., Gurfinkel, A.: Interpolating property directed reachability. In: Biere, A.,
Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 260–276. Springer, Cham (2014).
https://doi.org/10.1007/978-3-319-08867-9 17
30. Vizel, Y., Nadel, A., Malik, S.: Solving linear arithmetic with SAT-based model
checking. In: 2017 Formal Methods in Computer Aided Design, FMCAD 2017,
Vienna, Austria, 2–6 October 2017, pp. 47–54 (2017)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Verifying Asynchronous Event-Driven
Programs Using Partial Abstract
Transformers
1 Introduction
Asynchronous event-driven (AED) programming refers to a style of programming
multi-agent applications. The agents communicate shared work via messages.
Each agent waits for a message to arrive, and then processes it, possibly sending
messages to other agents, in order to collectively achieve a goal. This program-
ming style is common for distributed systems as well as low-level designs such as
device drivers [11]. Getting such applications right is an arduous task, due to the
inherent concurrency: the programmer must defend against all possible interleav-
ings of messages between agents. In response to this challenge, recent years have
seen multiple approaches to verifying AED-like programs, e.g. by delaying send
actions, or temporarily bounding their number (to keep queue sizes small) [7,10],
Work supported by the US National Science Foundation under Grant No. 1253331,
and by Microsoft Research India while hosting the second author for a sabbatical.
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 386–404, 2019.
https://doi.org/10.1007/978-3-030-25543-5_22
Verifying AED Programs Using Partial Abstract Transformers 387
have observed that spurious abstract states are often due to violations of simple
machine invariants: invariants that do not depend on the behavior of other
machines. By their nature, they can be proved using a cheap sequential analysis.
We can eliminate an abstract state (e.g. produced by Im) if all its concretiza-
tions violate a machine invariant. In this paper, we propose a domain-specific
temporal logic to express invariants over machines with event queues and, more
importantly, an algorithm that decides the above abstract queue invariant check-
ing problem, by reducing it efficiently to a plain model checking problem. We
have used this technique to ensure the convergence in “hard” cases that otherwise
defy convergence of the abstract reachable states sequence.
We have implemented our technique for the P language and empirically eval-
uated it on an extensive set of benchmark programs. The experimental results
support the following conclusions: (i) for our benchmark programs, the sequence
of abstractions often converges fully automatically, in hard cases with minimal
designer support in the form of separately dischargeable invariants; (ii) almost all
examples converge at a small value of kmax ; and (iii) the overhead our technique
adds to the bounding technique is small: the bulk is spent on the exhaustive
bounded exploration itself.
Proofs and other supporting material can be found in the Appendix of [23].
2 Overview
We illustrate the main ideas of this paper using an example in the P language.
A machine in a P program consists of multiple states. Each state defines an entry
code block that is executed when the machine enters the state. The state also
defines handlers for each event type e that it is prepared to receive. A handler
can either be on e do foo (executing foo on receiving e), or ignore e (dequeuing
and dropping e). A state can also have a defer e declaration; the semantics is that
a machine dequeues the first non-deferred event in its queue. As a result, a queue
in a P program is not strictly FIFO. This relaxation is an important feature of
P that helps programmers express their logic compactly [11]. Figure 1 shows a P
program named PiFl , in which a Sender (eventually) floods a Receiver’s queue
with Ping events. This queue is the only source of unboundedness in PiFl .
A critical property for P programs is (bounded) responsiveness: the receiving
machine must have a handler (e.g. on, defer, ignore) for every event arriving at the
queue head; otherwise the event will come as a “surprise” and crash the machine.
To prove responsiveness for PiFl , we have to demonstrate (among others) that
in state Ignore it, the Done event is never at the head of the Receiver’s queue.
We cannot perform exhaustive model checking, since the set of reachable states
is infinite. Instead, we will compute a conservative abstraction of this set that is
precise enough to rule out Done events at the queue head in this state.
We first define a suitable abstraction function α that collapses repeated occur-
rences of events to each event’s first occurrence. For instance, the queue
Q = Prime.Prime.Prime.Done.Ping.Ping.Ping.Ping (1)
Verifying AED Programs Using Partial Abstract Transformers 389
Fig. 1. PiFl : a Ping-Flood scenario. The Sender and the Receiver communicate via
events of types Prime, Done, and Ping. After sending some Prime events and one
Done, the Sender floods the Receiver with Pings. The Receiver initially defers Primes.
Upon receiving Done it enters a state in which it ignores Ping.
to the send Prime statement. The behavior of other machines is irrelevant for
this invariant; we call it a machine invariant. We pass the invariant to our tool
via the command line using the expression
Discussion. The abstract state space is finite since the queue prefix is of fixed
size, and each event in the suffix is recorded at most once (the event alphabet is
finite). The sets of reachable abstract states grow monotonously with increasing
queue size bound k, since the sets of reachable concrete states do:
Recall that finiteness and monotonicity of the sequence (Rk )∞ k=0 guarantee its
convergence, so nothing seems more suggestive than to compute the limit. We
summarize our overall procedure to do so in Algorithm 1. The procedure iter-
atively increases the queue bound k and computes the concrete and (per αp -
projection) the abstract reachability sets Rk and Rk . If, for some k, an error is
detected, the procedure terminates (Lines 4–5; in practice implemented as an
on-the-fly check).
The key of the algorithm is reflected in Lines 6–9 and is based on the fol-
lowing idea (all claims are proved as part of Theorem 4 below). If the computa-
tion of Rk reveals no new abstract states in round k (Line 6; by monotonicity,
394 P. Liu et al.
“same size” implies “same sets”), we apply the best abstract transformer [9,27]
Im := αp ◦ Im → ◦ γp to Rk : if the result is contained in Rk , the abstract reach-
ability sequence has converged. However, we can do better: we can restrict the
successor function Im → of the CQS to dequeue actions, denoted Im deq in Line 7.
The ultimate reason is that firing a local or transmit action on two αp -equivalent
states r and s results again in αp -equivalent states r and s . This fact does not
hold for dequeue actions: the successors r and s of dequeues depend on the
abstracted parts of r and s, resp., which may differ and become “visible” during
the dequeue (e.g. the event behind the queue head moves into the head position).
Our main result therefore is: if Rk = Rk−1 and dequeue actions do not create
new abstract states (Lines 7 and 8), sequence (Rk )∞ k=0 has converged:
The line applies the best abstract transformer, restricted to dequeue actions,
to Rk . This result cannot be computed as defined in (5), since γp (Rk ) is typically
infinite. However, Rk is finite, so we can iterative over r̄ ∈ Rk , and little informa-
tion is actually needed to determine the abstract successors of r̄. The “infinite
fragment” of r̄ remains unchanged, which makes the action implementable.
Formally, let r̄ = (, Q) with Q = e0 e1 . . . ep−1 | ep ep+1 . . . ez−1 . To apply a
dequeue action to r̄, we first perform local-state updates on as required by the
action, resulting in . Now consider Q. The first suffix event, ep , moves into the
prefix due to the dequeue. We do not know whether there are later occurrences
of ep before or after the first suffix occurrences of ep+1 . . . ez−1 . This information
determines the possible abstract queues resulting from the dequeue. To compute
the exact best abstract transformer, we enumerate these possibilities:
⎧ ⎫
Im deq ({(, Q)}) = ⎪ e1 . . . ep | ep+1 ep+2 . . . ez−1 ⎪
⎪
⎪ ⎪
⎪ e1 . . . ep | ep ep+1 ep+2 . . . ez−1 ⎪
⎪ ⎪
⎪
⎪
⎨ ⎪
⎬
{ ( , Q ) : Q ∈ e1 . . . ep | ep+1 e p ep+2 . . . ez−1 }
⎪
⎪ .. ⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪ . ⎪
⎪
⎩e . . . e |e e ⎭
1 p p+1 p+2 . . . ez−1 ep
The first case for Q applies if there are no occurrences of ep in the suffix
after the dequeue. The remaining cases enumerate possible positions of the first
occurrence of ep (boxed, for readability) in the suffix after the dequeue. The cost
of this enumeration is linear in the length of the suffix of the abstract queue.
Since our list abstraction maintains the first occurrence of each event, the
semantics of defer (see the Discussion in Sect. 4.1) can be implemented abstractly
without loss of information (not shown above, for simplicity).
where |·| denotes set cardinality and is interpreted as the standard integer arith-
metic relational operator. In the following we write Q[i →] (read: “Q from i”)
for the queue obtained from queue Q by dropping the first i events.
– Q |= true.
– for e ∈ Σ, Q |= e iff |Q| > 0 and Q[0] = e.
– for a queue relational expression E, Q |= E iff V (E) = true.
– Q |= X φ iff |Q| > 0 and Q[1 →] |= φ.
– Q |= F φ iff there exists i ∈ N such that 0 ≤ i < |Q| and Q[i →] |= φ.
– Q |= G φ iff for all i ∈ N such that 0 ≤ i < |Q|, Q[i →] |= φ.
Verifying AED Programs Using Partial Abstract Transformers 397
For example, we have bb | ba |=2 G(a ⇒ G ¬b) since for instance bbba ∈ γ2 (bb |
ba) satisfies the formula. See App. B of [23] for more examples.
Fig. 2. LTS for Q = bb | abc (p = 2), with label sets written below each state. The
blue and red parts encode the concretizations of the prefix and suffix of Q, resp. (Color
figure online)
6 Empirical Evaluation
We implemented the proposed approaches in C# atop the bounded model
checker PTester [11], an analysis tool for P programs. PTester employs a bounded
exploration strategy similar to Zing [4]. We denote by Pat the implementation
of Algorithm 1, and by Pat+I the version with queue invariants (“Pat+ Invari-
ants”). A detailed introduction to tool design and implementation is available
online [22].
6–7: two device drivers where OSR is used for testing USB devices [10].
8–14: miscellaneous: 8–10 [25], 11 [15], 12 is the example from Sect. 2, 13–14
are the buggy and fixed versions of an Elevator controller [11].
Results. Table 1 shows that Pat converges on almost all safe examples (and
successfully exposes the bugs for unsafe ones). Second, in most cases, the kmax
where convergence was detected is small, 5 or less. This is what enables the use
of this technique in practice: the exploration space grows fast with k, so early
convergence is critical. Note that kmax is guaranteed to be the smallest value for
which the respective example converges. If convergent, the verification succeeded
fully automatically: the queue abstraction prefix parameter p is incremented in
a loop whenever the current value of p caused a spurious abstract state.
The German protocol does not converge in reasonable time. In this case, we
request minimal manual assistance from the designer. Our tool inspects spurious
abstract states, compares them to actually reached abstract states, and suggests
candidate invariants to exclude them. We describe the process of invariant dis-
covery, and why and how they are easy to prove, in [22].
The following table shows the invariants that make the German protocol
converge, and the resulting times and memory consumption.
The invariant states that there is always at most one exclusive request and
at most one shared request in the Server or Client machine’s queue.
extra cost. Therefore, as for improving Pat’s scalability, the focus should be
on the efficiency of the Rk computation (Line 3 in Algorithm 1). Techniques
that lend themselves here are partial order reduction [2,28] or symmetry reduc-
tion [29]. Note that our proposed approach is orthogonal to how these sets are
computed.
7 Related Work
Automatic verification for asynchronous event-driven programs communicating
via unbounded FIFO queues is undecidable [8], even when the agents are finite-
state machines. To sidestep the undecidability, various remedies are proposed.
One is to underapproximate program behaviors using various bounding tech-
niques; examples include depth- [17] and context-bounded analysis [19,20,26],
delay-bounding [13], bounded asynchrony [15], preemption-bounding [24], and
phase-bounded analysis [3,6]. It has been shown that most of these bounding
techniques admit a decidable model checking problem [19,20,26] and thus have
been successfully used in practice for finding bugs.
Gall et al. proposed an abstract interpretation of FIFO queues in terms of
regular languages [16]. While our works share some basic insights about taming
queues, the differences are fundamental: our abstract domain is finite, guaran-
teeing convergence of our sequence. In [16] the abstract domain is infinite; they
propose a widening operator for fixpoint computation. More critically, we use
the abstract domain only for convergence detection; the set of reachable states
returned is in the end exact. As a result, we can prove and refute properties but
may not terminate; [16] is inexact and cannot refute but always returns.
Several partial verification approaches for asynchronous message-passing pro-
grams have been presented recently [5,7,10]. In [5], Bakst et al. propose canon-
ical sequentialization, which avoids exploring all interleavings by sequentializing
concurrent programs. Desai et al. [10] propose an alternative way, namely by pri-
oritizing receive actions over send actions. The approach is complete in the sense
that it is able to construct almost-synchronous invariants that cover all reach-
able local states and hence suffice to prove local assertions. Similarly, Bouajjani
et al. [7] propose an iterative analysis that bounds send actions in each interac-
tion phase. It approaches the completeness by checking a program’s synchroniz-
ability under the bounds. Similar to our work, the above three works are sound
but incomplete. An experimental comparison against the techniques reported in
[7,10] fails due to the unavailability of a tool that implements them. While tools
implementing these techniques are not available [7,10], a comparison based on
what is reported in the papers suggests that our approach is competitive in both
performance and precision.
Our approach can be categorized as a cutoff detection technique [1,12,14,28].
Cutoffs are, however, typically determined statically, often leaving them too large
for practical verification. Aiming at minimal cutoffs, our work is closer in nature
to earlier dynamic strategies [18,21], which targeted different forms of concurrent
programs. The generator technique proposed in [21] is unlikely to work for P
programs, due to the large local state space of machines.
402 P. Liu et al.
8 Conclusion
We have presented a method to verify safety properties of asynchronous event-
driven programs of agents communicating via unbounded queues. Our approach
is sound but incomplete: it can both prove (or, by encountering bugs, disprove)
such properties but may not terminate. We empirically evaluate our method on
a collection of P programs. Our experimental results showcase our method can
successfully prove the correctness of programs; such proof is achieved with little
extra resource costs compared to plain state exploration. Future work includes
an extension to P programs with other sources of unboundedness than the queue
length (e.g. messages with integer payloads).
Acknowledgments. We thank Dr. Vijay D’Silva (Google, Inc.), for enlightening dis-
cussions about partial abstract transformers.
References
1. Abdulla, A.P., Haziza, F., Holı́k, L.: All for the price of few (parameterized verifi-
cation through view abstraction). In: VMCAI, pp. 476–495 (2013)
2. Abdulla, P., Aronis, S., Jonsson, B., Sagonas, K.: Optimal dynamic partial order
reduction. In: POPL, pp. 373–384 (2014)
3. Abdulla, P.A., Atig, M.F., Cederberg, J.: Analysis of message passing programs
using SMT-solvers. In: Van Hung, D., Ogawa, M. (eds.) ATVA 2013. LNCS, vol.
8172, pp. 272–286. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-
02444-8 20
4. Andrews, T., Qadeer, S., Rajamani, S.K., Rehof, J., Xie, Y.: Zing: a model checker
for concurrent software. In: Alur, R., Peled, D.A. (eds.) CAV 2004. LNCS, vol.
3114, pp. 484–487. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-
540-27813-9 42
5. Bakst, A., Gleissenthall, K.v., Kici, R.G., Jhala, R.: Verifying distributed programs
via canonical sequentialization. PACMPL 1(OOPSLA), 110:1–110:27 (2017)
6. Bouajjani, A., Emmi, M.: Bounded phase analysis of message-passing programs.
Int. J. Softw. Tools Technol. Transf. 16(2), 127–146 (2014)
7. Bouajjani, A., Enea, C., Ji, K., Qadeer, S.: On the completeness of verifying mes-
sage passing programs under bounded asynchrony. In: Chockler, H., Weissenbacher,
G. (eds.) CAV 2018. LNCS, vol. 10982, pp. 372–391. Springer, Cham (2018).
https://doi.org/10.1007/978-3-319-96142-2 23
8. Brand, D., Zafiropulo, P.: On communicating finite-state machines. J. ACM 30(2),
323–342 (1983)
9. Cousot, P., Cousot, R.: Systematic design of program analysis frameworks. In:
POPL, pp. 269–282 (1979)
10. Desai, A., Garg, P., Madhusudan, P.: Natural proofs for asynchronous programs
using almost-synchronous reductions. In: OOPSLA, pp. 709–725 (2014)
11. Desai, A., Gupta, V., Jackson, E., Qadeer, S., Rajamani, S., Zufferey, D.: P: safe
asynchronous event-driven programming. In: PLDI, pp. 321–332 (2013)
12. Emerson, E.A., Kahlon, V.: Reducing model checking of the many to the few. In:
McAllester, D. (ed.) CADE 2000. LNCS (LNAI), vol. 1831, pp. 236–254. Springer,
Heidelberg (2000). https://doi.org/10.1007/10721959 19
Verifying AED Programs Using Partial Abstract Transformers 403
13. Emmi, M., Qadeer, S., Rakamarić, Z.: Delay-bounded scheduling. In: POPL, pp.
411–422 (2011)
14. Farzan, A., Kincaid, Z., Podelski, A.: Proof spaces for unbounded parallelism. In:
POPL, pp. 407–420 (2015)
15. Fisher, J., Henzinger, T.A., Mateescu, M., Piterman, N.: Bounded asynchrony:
concurrency for modeling cell-cell interactions. In: Fisher, J. (ed.) FMSB 2008.
LNCS, vol. 5054, pp. 17–32. Springer, Heidelberg (2008). https://doi.org/10.1007/
978-3-540-68413-8 2
16. Le Gall, T., Jeannet, B., Jéron, T.: Verification of communication protocols using
abstract interpretation of FIFO queues. In: Johnson, M., Vene, V. (eds.) AMAST
2006. LNCS, vol. 4019, pp. 204–219. Springer, Heidelberg (2006). https://doi.org/
10.1007/11784180 17
17. Godefroid, P.: Model checking for programming languages using VeriSoft. In:
POPL, pp. 174–186 (1997)
18. Kaiser, A., Kroening, D., Wahl, T.: Dynamic cutoff detection in parameterized
concurrent programs. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS,
vol. 6174, pp. 645–659. Springer, Heidelberg (2010). https://doi.org/10.1007/978-
3-642-14295-6 55
19. La Torre, S., Parthasarathy, M., Parlato, G.: Analyzing recursive programs using
a fixed-point calculus. In: PLDI, pp. 211–222 (2009)
20. Lal, A., Reps, T.: Reducing concurrent analysis under a context bound to sequential
analysis. Form. Methods Syst. Des. 35(1), 73–97 (2009)
21. Liu, P., Wahl, T.: CUBA: interprocedural context-unbounded analysis of concur-
rent programs. In: PLDI, pp. 105–119 (2018)
22. Liu, P., Wahl, T., Lal, A.: (2019). www.khoury.northeastern.edu/home/lpzun/
quba
23. Liu, P., Wahl, T., Lal, A.: Verifying asynchronous event-driven programs using
partial abstract transformers (extended manuscript). CoRR abs/1905.09996 (2019)
24. Musuvathi, M., Qadeer, S.: Iterative context bounding for systematic testing of
multithreaded programs. In: PLDI, pp. 446–455 (2007)
25. P-GitHub: The P programming langugage (2019). https://github.com/p-org/P
26. Qadeer, S., Rehof, J.: Context-bounded model checking of concurrent software.
In: Halbwachs, N., Zuck, L.D. (eds.) TACAS 2005. LNCS, vol. 3440, pp. 93–107.
Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31980-1 7
27. Reps, T., Sagiv, M., Yorsh, G.: Symbolic implementation of the best transformer.
In: Steffen, B., Levi, G. (eds.) VMCAI 2004. LNCS, vol. 2937, pp. 252–266.
Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24622-0 21
28. Sousa, M., Rodrı́guez, C., D’Silva, V., Kroening, D.: Abstract interpretation with
unfoldings. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10427, pp.
197–216. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63390-9 11
29. Wahl, T., Donaldson, A.: Replication and abstraction: symmetry in automated
formal verification. Symmetry 2(2), 799–847 (2010)
404 P. Liu et al.
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Inferring Inductive Invariants
from Phase Structures
1 Introduction
Infinite-state systems such as distributed protocols remain challenging to verify despite
decades of work developing interactive and automated proof techniques. Such proofs
rely on the fundamental notion of an inductive invariant. Unfortunately, specifying
inductive invariants is difficult for users, who must often repeatedly iterate through
candidate invariants before achieving an inductive invariant. For example, the Verdi
project’s proof of the Raft consensus protocol used an inductive invariant with 90 con-
juncts and relied on significant manual proof effort [61, 62].
The dream of invariant inference is that users would instead be assisted by auto-
matic procedures that could infer the required invariants. While other domains have
seen successful applications of invariant inference, using techniques such as abstract
interpretation [18] and property-directed reachability [10, 21], existing inference tech-
niques fall short for interesting distributed protocols, and often diverge while searching
for an invariant. These limitations have hindered adoption of invariant inference.
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 405–425, 2019.
https://doi.org/10.1007/978-3-030-25543-5_23
406 Y. M. Y. Feldman et al.
Our Approach. The idea of this paper is that invariant inference can be made dras-
tically more effective by utilizing user-guidance in the form of phase structures. We
propose user-guided invariant inference, in which the user provides some additional
information to guide the tool towards an invariant. An effective guidance method must
(1) match users’ high-level intuition of the proof, and (2) convey information in a way
that an automatic inference tool can readily utilize to direct the search. In this setting
invariant inference turns a partial, high-level argument accessible to the user into a full,
formal correctness proof, overcoming scenarios where procuring the proof completely
automatically is unsuccessful.
Our approach places phase invariants at the heart of both user interaction and algo-
rithmic inference. Phase invariants have an automaton-based form that is well-suited to
the domain of distributed protocols. They allow the user to convey a high-level tempo-
ral intuition of why the protocol is correct in the form of a phase structure. The phase
structure provides hints that direct the search and allow a more targeted generalization
of states to invariants, which can facilitate inference where it is otherwise impossible.
This paper makes the following contributions:
2 Preliminaries
In this section we provide background on first-order transition systems. Sorts are omit-
ted for simplicity. Our results extend also to logics with a background theory.
Notation. FV(ϕ) denotes the set of free variables of ϕ. FΣ (V ) denotes the set of first-
order formulas over vocabulary Σ with FV(ϕ) ⊆ V . We write ∀V. ϕ =⇒ ψ to denote
that the formula ∀V. ϕ → ψ is valid. We sometimes use fa as a shorthand for f (a).
Transition Systems. We represent transition systems symbolically, via formulas in first-
order logic. The definitions are standard. A vocabulary Σ consisting of constant, func-
tion, and relation symbols is used to represent states. Post-states of transitions are rep-
resented by a copy of Σ denoted Σ = {a | a ∈ Σ}. A first-order transition system
over Σ is a tuple TS = (Init, TR), where Init ∈ FΣ (∅) describes the initial states, and
TR ∈ FΣ̂ (∅) with Σ̂ = Σ Σ describes the transition relation. The states of TS are
first-order structures over Σ. A state s is initial if s |= Init. A transition of TS is a
pair of states s1 , s2 over a shared domain such that (s1 , s2 ) |= TR, (s1 , s2 ) being the
structure over that domain in which Σ in interpreted as in s1 and Σ as in s2 . s1 is also
called the pre-state and s2 the post-state. Traces are finite sequences of states σ1 , σ2 , . . .
starting from an initial state such that there is a transition between each pair of consec-
utive states. The reachable states are those that reside on traces starting from an initial
state.
Safety. A safety property P is a formula in FΣ (∅). We say that TS is safe, and that P is
an invariant, if all the reachable states satisfy P . Inv ∈ FΣ (∅) is an inductive invariant
if (i) Init =⇒ Inv (initiation), and (ii) Init ∧ TR =⇒ Inv (consecution), where Inv is
obtained from Inv by replacing each symbol from Σ with its primed counterpart. If also
(iii) Inv =⇒ P (safety), then it follows that TS is safe.
Fig. 1. Sharded key-value store with retransmissions (KV-R) in a first-order relational modeling.
and destination of the message, and the rest carry the message’s payload. For example,
ack_msg is a relation over two nodes and a sequence number, with the intended meaning
that a tuple (c1 , c2 , s) is in ack_msg exactly when there is a message in the network from
c1 to c2 acknowledging a message with sequence number s.
The initial states are specified in Lines 17 to 18. Transitions are specified by the
actions declared in Lines 20 to 66. Actions can fire nondeterministically at any time when
their precondition (require statements) holds. Hence, the transition relation comprises
of the disjunction of the transition relations induced by the actions. The state is mutated
by modifying the relations. For example, message sends are modeled by inserting a tuple
into the corresponding relation (e.g. line 27), while message receives are modeled by
requiring a tuple to be in the relation (e.g. line 32), and then removing it (e.g. line 33).
The updates in lines 61 and 65 remove a set of tuples matching the pattern.
Transferring keys between nodes begins by sending a transfer_msg from the owner
to a new node (line 20), which stores the key-value pair when it receives the message
(line 39). Upon sending a transfer message the original node cedes ownership (line 26)
and does not send new transfer messages. Transfer messages may be dropped (line 30).
To ensure that the key-value pair is not lost, retransmissions are performed (line 35) with
the same sequence number until the target node acknowledges (which occurs in line
47). Acknowledge messages themselves may be dropped (line 53). Sequence numbers
protect from delayed transfer messages, which might contain old values (line 42).
Inferring Inductive Invariants from Phase Structures 409
Lines 68 to 71 specify the key safety property: at most one value is associated with
any key, anywhere in the network. Intuitively, the protocol satisfies this because each
key k is either currently (1) owned by a node, in which case this node is unique, or
(2) it is in the process of transferring between nodes, in which case the careful use of
sequence numbers ensures that the destination of the key is unique. As is typical, it is
not straightforward to translate this intuition into a full correctness proof. In particular,
it is necessary to relate all the different components of the state, including clients’ local
state and pending messages.
Invariant inference strives to automatically find an inductive invariant establish-
ing safety. This example is challenging for existing inference techniques (Sect. 6). This
paper proposes user-guided invariant inference based on phase-invariants to overcome
this challenge. The rest of the paper describes our approach, in which inference is pro-
vided with the phase structure in Fig. 2, matching the high level intuitive explanation
above. The algorithm then automatically infers facts about each phase to obtain an
inductive invariant. Sect. 4 describes phase structures and inductive phase invariants,
and Sect. 5 explains how these are used in user-guided invariant inference.
In this section we introduce phase structures and inductive phase invariants. These are
used for guiding automatic invariant inference in Sect. 5. Proofs appear in [24].
Example 1. Figure 2 shows a phase automaton for the running example, with the view
of a single key k. It describes the protocol as transitioning between two distinct (logical)
410 Y. M. Y. Feldman et al.
Fig. 2. Phase structure for key-value store (top) and phase characterizations (bottom). The user
provides the phase structure, and inference automatically produces the phase characterizations,
forming a safe inductive phase automaton.
phases of k: owned (O[k]) and transferring (T[k]). The edges are labeled by actions of
the system. A wildcard * means that the action is executed with an arbitrary argument.
The two central actions are (i) reshard, which transitions from O[k] to T[k], but cannot
execute in T[k], and (ii) recv_transfer_message, which does the opposite. The rest
of the actions do not cause a phase change and appear on a self loop in each phase.
Actions related to keys other than k are considered as self-loops, and omitted here for
brevity. Some actions are disallowed in certain phases, namely, do not label any outgo-
ing edge from a phase, such as recv_transfer_msg(k) in O[k]. Characterizations
for each phase are depicted in Fig. 2 (bottom). Without them, Fig. 2 represents a phase
structure, which serves as the input to our inference algorithm. We remark that the
choice of automaton aims to reflect the safety property of interest. In our example, one
might instead imagine taking the view of a single node as it interacts with multiple keys,
which might seem intuitive from the standpoint of implementing the system. However,
it is not appropriate for the proof of value uniqueness, since keys pass in and out of the
view of a single client.
We now formally define phase invariants as phase automata that overapproximate
the behaviors of the original system.
4.2 Establishing Safety and Phase Invariants with Inductive Phase Invariants
To establish phase invariants, we use inductiveness:
Definition 4 (Inductive Phase Invariant). A is inductive w.r.t. TS = (Init, TR) if:
Initiation: Init =⇒ (∀V. ϕι ) .
Inductiveness: for all (q, p) ∈ R, ∀V. ϕq ∧ δ(q,p) =⇒ ϕp .
Edge Covering: for every q ∈ Q, ∀V. ϕq ∧ TR =⇒ (q,p)∈R δ(q,p) .
Example 3. The phase automaton in Fig. 2 is an inductive phase invariant. For example,
the only disallowed transition in O[k] is recv_transfer_message, which indeed
cannot execute in O[k] according to the characterization in line 75. Further, if, for
example, a protocol’s transition from O[k] matches the labeling of the edge to T[k]
(i.e. a reshard action on k), the post-state necessarily satisfies the characterizations of
T[k]: for instance, the post-state satisfies the uniqueness of unreceived transfer mes-
sages (line 82) because in the pre-state there are none (line 75).
Lemma 1. If A is inductive w.r.t. TS then it is a phase invariant for TS.
Remark 1. The careful reader may notice that the inductiveness requirement is stronger
than needed to ensure that the characterizations forma phase invariant. It could be
weakened to require for every q ∈ Q: ∀V. ϕq ∧TR =⇒ (q,p)∈R δ(q,p) ∧ ϕp . However,
as we explain in Sect. 5, our notion of inductiveness is crucial for inferring inductive
phase automata, which is the goal of this paper. Furthermore, for deterministic phase
automata, the two requirements coincide.
Inductive Invariants vs. Inductive Phase Invariants. Inductive invariants and inductive
phase invariants are closely related:
Lemma 2. If A is inductive w.r.t. TS then ∀V. q∈Q ϕq is an inductive invariant
for TS. If Inv is an inductive invariant for TS, then the phase automaton AInv =
({q}, {q}, ∅, δ, ϕ), where δ(q,q) = TR and ϕq = Inv is an inductive phase automa-
ton w.r.t. TS.
In this sense, phase inductive invariants are as expressive as inductive invariants. How-
ever, as we show in this paper, their structure can be used by a user as an intuitive way
to guide an automatic invariant inference algorithm.
Safe Inductive Phase Invariants. Next we show that an inductive phase invariant can
be used to establish safety.
412 Y. M. Y. Feldman et al.
where V denotes the quantifiers of A. All the constraints are linear, namely at most
one unknown predicate appears at the lefthand side of each implication.
Constraint (4) captures the original safety requirement, whereas (3) can be under-
stood as additional safety properties that are specified by the phase automaton (since no
unknown predicates appear in the righthand side of the implications).
A solution I to the CHC system associates each predicate Iq with a formula ψq over
Σ (with FV(ψq ) ⊆ V) such that when ψq is substituted for Iq , all the constraints are
satisfied (i.e., the corresponding first-order formulas are valid). A solution to the system
induces a safe inductive phase automaton through characterizing each phase q by the
interpretation of Iq , and vice versa. Formally:
Inferring Inductive Invariants from Phase Structures 413
Therefore, to infer a safe inductive phase invariant over a given phase structure, we
need to solve the corresponding CHC system. In Sect. 6.1 we explain our approach for
doing so for the class of universally quantified phase characterizations. Note that the
weaker definition of inductiveness discussed in Remark 1 would prevent the reduction
to CHC as it would result in clauses that are not Horn clauses.
Completeness of Inductive Phase Invariants. There are cases where a given phase
structure induces a safe phase invariant A, but not an inductive one, making the CHC
system unsatisfiable. However, a strengthening into an inductive phase invariant can
always be used to prove that A is an invariant if (i) the language of invariants is unre-
stricted, and (ii) the phase structure is deterministic, namely, does not cover the same
transition in two outgoing edges. Determinism of the automaton does not lose gener-
ality in the context of safety verification since every inductive phase automaton can be
converted to a deterministic one; non-determinism is in fact unbeneficial as it mandates
the same state to be characterized by multiple phases (see also Remark 1). These topics
are discussed in detail in the extended version [24].
Remark 2. Each phase is associated with a set of states that can reach it, where a state
σ can reach phase q if there is a sequence of program transitions that results in σ and
can lead to q according to the automaton’s transitions. This makes a phase structure
different from a simple syntactical disjunctive template for inference, in which such
semantic meaning is unavailable.
(1) Phase decomposition. Inference of an inductive phase invariant aims to find charac-
terizations that overapproximate the set of states reachable in each phase (Remark
2). The distinction between phases is most beneficial when there is a considerable
difference between the sets associated with different phases and their characteriza-
tions. For instance, in the running example, all states without unreceived transfer
messages are associated with O[k], whereas all states in which such messages
exist are associated with T[k]—a distinction captured by the characterizations in
lines 75 and 82 in Fig. 2.
1
As an illustration, the extended version [24] includes an inductive invariant for the running
example which is comparable in complexity to the inductive phase invariant in Fig. 2.
414 Y. M. Y. Feldman et al.
Differences between phases would have two consequences. First, since each phase
corresponds to fewer states than all reachable states, generalization—the key ingre-
dient in inference procedures—is more focused. The second consequence stems
from the fact that inductive characterizations of different phases are correlated. It
is expected that a certain property is more readily learnable in one phase, while
related facts in other phases are more complex. For instance, the characterization
in line 75 in Fig. 2 is more straightforward than the one in line 82. Simpler facts
in one phase can help characterize an adjacent phase when the algorithm analyzes
how that property evolves along the edge. Thus utilizing the phase structure can
improve the gradual construction of overapproximations of the sets of states reach-
able in each phase.
(2) Disabled transitions. A phase automaton explicitly states which transitions of the
system are enabled in each phase, while the rest are disabled. Such impossible
transitions induce additional safety properties to be established by the inferred
phase characterizations. For example, the phase invariant in Fig. 2 forbids a
recv_transfer_message(k) in O[k], a fact that can trigger the inference of
the characterization in line 75. These additional safety properties direct the search
for characterizations that are likely to be important for the proof.
(3) Phase-awareness. Finally, while a phase structure can be encoded in several ways
(such as ghost code), a key aspect of our approach is that the phase decomposi-
tion and disabled transitions are explicitly encoded in the CHC system in Sect. 5.1,
ensuring that they guide the otherwise heuristic search.
In Sect. 6.2 we demonstrate the effects of aspects (1)–(3) on guidance.
6.2 Evaluation
We evaluate our approach for user-guided invariant inference by comparing Phase-
PDR∀ to standard PDR∀ . We implemented PDR∀ and Phase-PDR∀ in MYPYVY [2],
a new system for invariant inference inspired by Ivy [45], over Z3 [46]. We study:
1. Can Phase-PDR∀ converge to a proof when PDR∀ does not (in reasonable time)?
2. Is Phase-PDR∀ faster than PDR∀ ?
3. Which aspects of Phase-PDR∀ contribute to its performance benefits?
Protocols. We applied PDR∀ and Phase-PDR∀ to the most challenging examples admit-
ting universally-quantified invariants, which previous works verified using deductive
techniques. The protocols we analyzed are listed below and in Table 1. The full mod-
els appear in [1]. The KV-R protocol analyzed is taken from one of the two realistic
systems studied by the IronFleet paper [33] using deductive verification.
416 Y. M. Y. Feldman et al.
Phase Structures. The phase structures we used appear in [1]. In all our examples, it
was straightforward to translate the existing high-level intuition of important and rele-
vant distinctions between phases in the protocol into the phase structures we report. For
example, it took us less than an hour to finalize an automaton for KV-R. We emphasize
that phase structures do not include phase characterizations; the user need not supply
them, nor has to understand the inference procedure. Our exposition of the phase struc-
tures below refers to an intuitive meaning of each phase, but this is not part of the phase
structure provided to the tool.
Table 1. Running times in seconds of PDR∀ and Phase-PDR∀ , presented as the mean and standard
deviation (in parentheses) over 16 different Z3 random seeds. “∗ ” indicates that some runs did not
converge after 1 h and were not included in the summary statistics. “> 1 h” means that no runs of
the algorithm converged in 1 h. #p refers to the number of phases and #v to the number of view
quantifiers in the phase structure. #r refers to the number of relations and |a| to the maximal arity.
The remaining columns describe the inductive invariant/phase invariant obtained in inference. |f|
is the maximal frame reached. #c, #q are the mean number of clauses and quantifiers (excluding
view quantifiers) per phase, ranging across the different phases.
(1) Achieving Convergence Through Phases. In this section we consider the effect of
phases on inference for examples on which standard PDR∀ does not converge in 1 hr.
Examples. Sharded key-value store with retransmissions (KV-R): see Sect. 3 and Exam-
ple 1. This protocol has not been modeled in decidable logic before.
Cache Coherence. This example implements the classic MESI protocol for maintaining
cache coherence in a shared-memory multiprocessor [36], modeled in decidable logic
for the first time. Cores perform reads and writes to memory, and caches snoop on each
other’s requests using a shared bus and maintain the invariant that there is at most one
writer of a particular cache line. For simplicity, we consider only a single cache line, and
yet the example is still challenging for PDR∀ . Standard explanations of this protocol in
the literature already use automata to describe this invariant, and we directly exploit
this structure in our phase automaton. Phase Structure: There are 10 phases in total,
Inferring Inductive Invariants from Phase Structures 417
grouped into three parts corresponding to the modified, exclusive, and shared states
in the classical description. Within each group, there are additional phases for when a
request is being processed by the bus. For example, in the shared group, there are phases
for handling reads by cores without a copy of the cache line, writes by such cores, and
also writes by cores that do have a copy. Overall, the phase structure is directly derived
from textbook descriptions, taking into account that use of the shared bus is not atomic.
Results and Discussion. Measurements for these examples appear in Table 1. Standard
PDR∀ fails to converge in less than an hour on 13 out of 16 seeds for KV-R and all 16
seeds for the cache. In contrast, Phase-PDR∀ converges to a proof in a few minutes in all
cases. These results demonstrate that phase structures can effectively guide the search
and obtain an invariant quickly where standard inductive invariant inference does not.
(2) Enhancing Performance Through Phases. In this section we consider the use of
phase structures to improve the speed of convergence to a proof.
Examples. Distributed lock service, adapted from [61], allows clients to acquire and
release locks by sending requests to a central server, which guarantees that only one
client holds each lock at a time. Phase structure: for each lock, the phases follow the
4 steps by which a client completes a cycle of acquire and release. We also consider a
simpler variant with only a single lock, reducing the arity of all relations and removing
the need for an automaton view. Its phase structure is the same, only for a single lock.
Simple quorum-based consensus, based on the example in [60]. In this protocol, nodes
propose themselves and then receive votes from other nodes. When a quorum of votes
for a node is obtained, it becomes the leader and decides on a value. Safety requires that
decided values are unique. The phase structure distinguishes between the phases before
any node is elected leader, once a node is elected, and when values are decided. Note
that the automaton structure is unquantified.
Leader election in a ring [13, 51], in which nodes are organized in a directional ring
topology with unique IDs, and the safety property is that an elected leader is a node
with the highest ID. Phase structure: for a view of two nodes n1 , n2 , in the first phase,
messages with the ID of n1 are yet to advance in the ring past n2 , while in the second
phase, a message advertising n1 has advanced past n2 . The inferred characterizations
include another quantifier on nodes, constraining interference (see Sect. 7).
Sharded key-value store (KV) is a simplified version of KV-R above, without mes-
sage drops and the retransmission mechanism. The phase structure is exactly as in
KV-R, omitting transitions related to sequence numbers and acknowledgment. This pro-
tocol has not been modeled in decidable logic before.
Results and Discussion. We compare the performance of standard PDR∀ and Phase-
PDR∀ on the above examples, with results shown in Table 1. For each example, we ran
the two algorithms on 16 different Z3 random seeds. Measurements were performed
on a 3.4GHz AMD Ryzen Threadripper 1950X with 16 physical cores, running Linux
4.15.0, using Z3 version 4.7.1. By disabling hyperthreading and frequency scaling and
pinning tasks to dedicated cores, variability across runs of a single seed was negligible.
In all but one example, Phase-PDR∀ improves performance, sometimes drastically;
for example, performance for leader election in a ring is improved by a factor of 60.
418 Y. M. Y. Feldman et al.
Phase-PDR∀ also improves the robustness of inference [27] on this example, as the
standard deviation falls from 39 in PDR∀ to 0.04 in Phase-PDR∀ .
The only example in which a phase structure actually diminishes inference effec-
tiveness is simple consensus. We attribute this to an automaton structure that does not
capture the essence of the correctness argument very well, overlooking votes and quo-
rums. This demonstrates that a phase structure might guide the search towards coun-
terproductive directions if the user guidance is “misleading”. This suggests that better
resiliency of interactive inference framework could be achieved by combining phase-
based inference with standard inductive invariant-based reasoning. We are not aware of
a single “good” automaton for this example. The correctness argument of this example
is better captured by the conjunction of two automata (one for votes and one for accu-
mulating a quorum) with different views, but the problem of inferring phase invariants
for mutually-dependent automata is a subject for future work.
(3) Anatomy of the Benefit of Phases. We now demonstrate that each of the beneficial
aspects of phases discussed in Sect. 5.2 is important for the benefits reported above.
Phase Decomposition. Is there a benefit from a phase structure even without disabled
transitions? An example to a positive answer to this question is leader election in a ring,
which demonstrates a huge performance benefit even without disabled transitions.
Disabled Transitions. Is there a substantial gain from exploiting disabled transitions?
We compare Phase-PDR∀ on the structure with disabled transitions and a structure
obtained by (artificially) adding self loops labeled with the originally impossible tran-
sitions, on the example of lock service with multiple locks (Sect. 6.2), seeing that it
demonstrates a performance benefit using Phase-PDR∀ and showcases several disabled
transitions in each phase. The result is that without disabled transitions, the mean run-
ning time of Phase-PDR∀ on this example jumps from 2.73 s to 6.24 s. This demon-
strates the utility of the additional safety properties encompassed in disabled transitions.
Phase-Awareness. Is it important to treat phases explicitly in the inference algorithm,
as we do in Phase-PDR∀ (Sect. 6.1)? We compare our result on convergence of KV-
R with an alternative in which standard PDR∀ is applied to an encoding of the phase
decomposition and disabled transition by ghost state: each phase is modeled by a rela-
tion over possible view assignments, and the model is augmented with update code
mimicking phase changes; the additional safety properties derived from disabled transi-
tions are provided; and the view and the appropriate modification of the safety property
are introduced. This translation expresses all information present in the phase structure,
but does not explicitly guide the inference algorithm to use this information. The result
is that with this ghost-based modeling the phase-oblivious PDR∀ does not converge in
1 h on KV-R in any of the 16 runs, whereas it converges when Phase-PDR∀ explicitly
directs the search using the phase structure.
7 Related Work
Phases in Distributed Protocols. Distributed protocols are frequently described in
informal descriptions as transitioning between different phases. Recently, PSync [19]
Inferring Inductive Invariants from Phase Structures 419
used the Heard-Of model [14], which describes protocols as operating in rounds, as
a basis for the implementation and verification of fault-tolerant distributed protocols.
Typestates [e.g.] [25, 59] also bear some similarity to the temporal aspect of phases.
State machine refinement [3, 28] is used extensively in the design and verification of
distributed systems (see e.g. [33, 47]). The automaton structure of a phase invariant is
also a form of state machine; our focus is on inference of characterizations establishing
this.
Interaction in Verification. Interactive proof assistants such as Coq [8] and
Isabelle/HOL [48] interact with users to aid them as they attempt to prove candidate
inductive invariants. This differs from interaction through phase structures and coun-
terexample traces. Ivy uses interaction for invariant inference by interactive generaliza-
tion from counterexamples [51]. This approach is less automatic as it requires interac-
tion for every clause of the inductive invariant. In terminology from synthesis [30], the
use of counterexamples is synthesizer-driven interaction with the tool, while interaction
via phase structures is mainly user-driven. Abstract counterexample traces returned by
the tool augment this kind of interaction. As [38] has shown, interactive invariant infer-
ence, when considered as a synthesis problem (see also [27, 55]) is related to inductive
learning.
Template-Based Invariant Inference. Many works employ syntactical templates for
invariants, used to constrain the search [e.g.] [7, 16, 54, 57, 58]. The different phases in a
phase structure induce a disjunctive form, but crucially each disjunct also has a distinct
semantic meaning, which inference overapproximates, as explained in Sect. 5.2.
Automata in Safety Verification. Safety verification through an automaton-like refine-
ment of the program’s control has been studied in a number of works. We focus on
related techniques for proof automation. The Automizer approach to the verification
of sequential programs [34, 35] is founded on the notion of a Floyd-Hoare automa-
ton, which is an unquantified inductive phase automaton; an extension to parallel pro-
grams [22] uses thread identifiers closed under the symmetry rule, which are related
to view quantifiers. Their focus is on the automatic, incremental construction of such
automata as a union of simpler automata, where each automaton is obtained from
generalizing the proof/infeasibility of a single trace. In our approach the structure of
the automaton is provided by the user as a means of conveying their intuition of the
proof, while the annotations are computed automatically. A notable difference is that
in Automizer, the generation of characterizations in an automaton constructed from a
single trace does not utilize the phase structure (beyond that of the trace), whereas in
our approach the phase structure is central in generalization from states to character-
izations. In trace partitioning [44, 53], abstract domains based on transition systems
partitioning the program’s control are introduced. The observation is that recording his-
torical information forms a basis for case-splitting, as an alternative to fully-disjunctive
abstractions. This differs from our motivation of distinguishing between different pro-
tocol phases. The phase structure of the domain is determined by the analyser, and can
also be dynamic. In our work the phase structure is provided by the user as guidance. We
use a variant of PDR∀ , rather than abstract interpretation [17], to compute universally
quantified phase characterizations. Techniques such as predicate abstraction [26, 29]
420 Y. M. Y. Feldman et al.
and existential abstraction [15], as well as the safety part of predicate diagrams [11],
use finite languages for the set of possible characterizations and lack the notion of views,
both essential for handling unbounded numbers of processes and resources. Finally,
phase splitter predicates [56] share our motivation of simplifying invariant inference
by exposing the different phases the loop undergoes. Splitter predicates correspond to
inductive phase characterizations [56, Theorem 1], and are automatically constructed
according to program conditionals. In our approach, decomposition is performed by
the user using potentially non-inductive conditions, and the inductive phase charac-
terizations are computed by invariant inference. Successive loop splitting results in a
sequence of phases, whereas our approach utilizes arbitrary automaton structures. Bor-
ralleras et al. [9] also refine the control-flow graph throughout the analysis by splitting
on conditions, which are discovered as preconditions for termination (the motivation
is to expose termination proof goals to be established): in a sense, the phase structure
is grown from candidate characterizations implying termination. This differs from our
approach in which the phase structure is used to guide the inference of characterizations.
Quantified Invariant Inference. We focus here on the works on quantifiers in auto-
matic verification most closely related to our work. In predicate abstraction, quanti-
fiers can be used internally as part of the definitions of predicates, and also externally
through predicates with free variables [26, 42]. Our work uses quantifiers both internally
in phases characterizations and externally in view quantifiers. The view is also related
to the bounded number of quantifiers used in view abstraction [5, 6]. In this work we
observe that it is useful to consider views of entities beyond processes or threads, such
as a single key in the store. Quantifiers are often used to their full extent in verification
conditions, namely checking implication between two quantified formulas, but they are
sometimes employed in weaker checks as part of thread-modular proofs [4, 39]. This
amounts to searching for invariants provable using specific instantiations of the quan-
tifiers in the verification conditions [31, 37]. In our verification conditions, the view
quantifiers are localized, in effect performing a single instantiation. This is essential for
exploiting the disjunctive structure under the quantifiers, allowing inference to consider
a single automaton edge in each step, and reflecting an intuition of correctness. When
necessary to constrain interference, quantifiers in phase characterizations can be used
to establish necessary facts about interfering views. Finally, there exist algorithms other
than PDR∀ for solving CHC by predicates with universal invariants [e.g. 20, 32].
8 Conclusion
Invariant inference techniques aiming to verify intricate distributed protocols must
adjust to the diverse correctness arguments on which protocols are based. In this paper
we have proposed to use phase structures as means of conveying users’ intuition of
the proof, to be used by an automatic inference tool as a basis for a full formal proof.
We found that inference guided by a phase structure can infer proofs for distributed
protocols that are beyond reach for state of the art inductive invariant inference meth-
ods, and can also improve the speed of convergence. The phase decomposition induced
by the automaton, the use of disabled transitions, and the explicit treatment of phases
in inference, all combine to direct the search for the invariant. We are encouraged by
Inferring Inductive Invariants from Phase Structures 421
our experience of specifying phase structures for different protocols. It would be inter-
esting to integrate the interaction via phase structures with other verification methods
and proof logics, as well as interaction schemes based on different, complementary,
concepts. Another important direction for future work is inference beyond universal
invariants, required for example for the proof of Paxos [50].
References
1. Examples code. https://github.com/wilcoxjay/mypyvy/tree/master/examples/cav19
2. mypyvy repository. https://github.com/wilcoxjay/mypyvy
3. Abadi, M., Lamport, L.: The existence of refinement mappings. Theor. Comput. Sci. 82(2),
253–284 (1991). https://doi.org/10.1016/0304-3975(91)90224-P
4. Abadi, M., Lamport, L.: Conjoining specifications. ACM Trans. Program. Lang. Syst. 17(3),
507–534 (1995)
5. Abdulla, P.A., Haziza, F., Holík, L.: All for the price of few. In: Giacobazzi, R., Berdine,
J., Mastroeni, I. (eds.) VMCAI 2013. LNCS, vol. 7737, pp. 476–495. Springer, Heidelberg
(2013). https://doi.org/10.1007/978-3-642-35873-9_28
6. Abdulla, P.A., Haziza, F., Holík, L.: Parameterized verification through view abstraction.
STTT 18(5), 495–516 (2016). https://doi.org/10.1007/s10009-015-0406-x
7. Alur, R., et al.: Syntax-guided synthesis. In: Dependable Software Systems Engineering, pp.
1–25 (2015)
8. Bertot, Y., Castéran, P.: Interactive Theorem Proving and Program Development - Coq’Art:
The Calculus of Inductive Constructions. TTCS. Springer, Heidelberg (2004). https://doi.
org/10.1007/978-3-662-07964-5
9. Borralleras, C., Brockschmidt, M., Larraz, D., Oliveras, A., Rodríguez-Carbonell, E., Rubio,
A.: Proving termination through conditional termination. In: Legay, A., Margaria, T. (eds.)
TACAS 2017. LNCS, vol. 10205, pp. 99–117. Springer, Heidelberg (2017). https://doi.org/
10.1007/978-3-662-54577-5_6
10. Bradley, A.R.: SAT-based model checking without unrolling. In: Jhala, R., Schmidt, D. (eds.)
VMCAI 2011. LNCS, vol. 6538, pp. 70–87. Springer, Heidelberg (2011). https://doi.org/10.
1007/978-3-642-18275-4_7
11. Cansell, D., Méry, D., Merz, S.: Predicate diagrams for the verification of reactive systems.
In: Grieskamp, W., Santen, T., Stoddart, B. (eds.) IFM 2000. LNCS, vol. 1945, pp. 380–397.
Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-40911-4_22
12. Chang, C., Keisler, H.: Model Theory. Studies in Logic and the Foundations of Mathematics.
Elsevier Science, Amsterdam (1990)
422 Y. M. Y. Feldman et al.
13. Chang, E., Roberts, R.: An improved algorithm for decentralized extrema-finding in circular
configurations of processes. Commun. ACM 22(5), 281–283 (1979)
14. Charron-Bost, B., Schiper, A.: The heard-of model: computing in distributed systems with
benign faults. Distrib. Comput. 22(1), 49–71 (2009). https://doi.org/10.1007/s00446-009-
0084-6
15. Clarke, E.M., Grumberg, O., Peled, D.A.: Model Checking. MIT Press, Cambridge (2001).
http://books.google.de/books?id=Nmc4wEaLXFEC
16. Colón, M.A., Sankaranarayanan, S., Sipma, H.B.: Linear invariant generation using non-
linear constraint solving. In: Hunt, W.A., Somenzi, F. (eds.) CAV 2003. LNCS, vol. 2725,
pp. 420–432. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45069-6_39
17. Cousot, P., Cousot, R.: Systematic design of program analysis frameworks. In: Symposium
on Principles of Programming Languages, pp. 269–282. ACM Press, New York (1979)
18. Cousot, P., Cousot, R.: Abstract interpretation: a unified lattice model for static analysis of
programs by construction or approximation of fixpoints. In: Conference Record of the Fourth
ACM Symposium on Principles of Programming Languages, Los Angeles, California, USA,
January 1977, pp. 238–252 (1977). https://doi.org/10.1145/512950.512973. http://doi.acm.
org/10.1145/512950.512973
19. Dragoi, C., Henzinger, T.A., Zufferey, D.: Psync: a partially synchronous language for
fault-tolerant distributed algorithms. In: Proceedings of the 43rd Annual ACM SIGPLAN-
SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Peters-
burg, FL, USA, 20–22 January 2016, pp. 400–415 (2016). https://doi.org/10.1145/2837614.
2837650. http://doi.acm.org/10.1145/2837614.2837650
20. Drews, S., Albarghouthi, A.: Effectively propositional interpolants. In: Chaudhuri, S.,
Farzan, A. (eds.) CAV 2016. LNCS, vol. 9780, pp. 210–229. Springer, Cham (2016). https://
doi.org/10.1007/978-3-319-41540-6_12
21. Eén, N., Mishchenko, A., Brayton, R.K.: Efficient implementation of property directed
reachability. In: International Conference on Formal Methods in Computer-Aided Design,
FMCAD 2011, Austin, TX, USA, October 30–02 November 2011, pp. 125–134 (2011)
22. Farzan, A., Kincaid, Z., Podelski, A.: Proof spaces for unbounded parallelism. In: Proceed-
ings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Program-
ming Languages, POPL 2015, Mumbai, India, 15–17 January 2015, pp. 407–420 (2015).
https://doi.org/10.1145/2676726.2677012. http://doi.acm.org/10.1145/2676726.2677012
23. Feldman, Y.M.Y., Padon, O., Immerman, N., Sagiv, M., Shoham, S.: Bounded quantifier
instantiation for checking inductive invariants. In: Legay, A., Margaria, T. (eds.) TACAS
2017. LNCS, vol. 10205, pp. 76–95. Springer, Heidelberg (2017). https://doi.org/10.1007/
978-3-662-54577-5_5
24. Feldman, Y.M.Y., Wilcox, J.R., Shoham, S., Sagiv, M.: Inferring inductive invariants from
phase structures. Technical report (2019). https://arxiv.org/abs/1905.07739
25. Field, J., Goyal, D., Ramalingam, G., Yahav, E.: Typestate verification: abstraction tech-
niques and complexity results. Sci. Comput. Program. 58(1–2), 57–82 (2005)
26. Flanagan, C., Qadeer, S.: Predicate abstraction for software verification. In: Conference
Record of POPL 2002: The 29th SIGPLAN-SIGACT Symposium on Principles of Program-
ming Languages, Portland, OR, USA, 16–18 January 2002, pp. 191–202 (2002). https://doi.
org/10.1145/503272.503291. http://doi.acm.org/10.1145/503272.503291
27. Garg, P., Löding, C., Madhusudan, P., Neider, D.: ICE: a robust framework for learning
invariants. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 69–87. Springer,
Cham (2014). https://doi.org/10.1007/978-3-319-08867-9_5
28. Garland, S.J., Lynch, N.: Using I/O automata for developing distributed systems. In: Foun-
dations of Component-Based Systems, pp. 285–312. Cambridge University Press, New York
(2000). http://dl.acm.org/citation.cfm?id=336431.336455
Inferring Inductive Invariants from Phase Structures 423
29. Graf, S., Saidi, H.: Construction of abstract state graphs with PVS. In: Grumberg, O. (ed.)
CAV 1997. LNCS, vol. 1254, pp. 72–83. Springer, Heidelberg (1997). https://doi.org/10.
1007/3-540-63166-6_10
30. Gulwani, S.: Synthesis from examples: interaction models and algorithms. In: 14th Inter-
national Symposium on Symbolic and Numeric Algorithms for Scientific Computing,
SYNASC 2012, Timisoara, Romania, 26–29 September 2012, pp. 8–14 (2012). https://doi.
org/10.1109/SYNASC.2012.69
31. Gurfinkel, A., Shoham, S., Meshman, Y.: SMT-based verification of parameterized systems.
In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of
Software Engineering, FSE 2016, Seattle, WA, USA, 13–18 November 2016, pp. 338–
348 (2016). https://doi.org/10.1145/2950290.2950330. http://doi.acm.org/10.1145/2950290.
2950330
32. Gurfinkel, A., Shoham, S., Vizel, Y.: Quantifiers on demand. In: Lahiri, S.K., Wang, C. (eds.)
ATVA 2018. LNCS, vol. 11138, pp. 248–266. Springer, Cham (2018). https://doi.org/10.
1007/978-3-030-01090-4_15
33. Hawblitzel, C., et al.: Ironfleet: proving practical distributed systems correct. In: Proceedings
of the 25th Symposium on Operating Systems Principles, SOSP 2015, Monterey, CA, USA,
4–7 October 2015, pp. 1–17 (2015). https://doi.org/10.1145/2815400.2815428. http://doi.
acm.org/10.1145/2815400.2815428
34. Heizmann, M., Hoenicke, J., Podelski, A.: Refinement of trace abstraction. In: Palsberg, J.,
Su, Z. (eds.) SAS 2009. LNCS, vol. 5673, pp. 69–85. Springer, Heidelberg (2009). https://
doi.org/10.1007/978-3-642-03237-0_7
35. Heizmann, M., Hoenicke, J., Podelski, A.: Software model checking for people who love
automata. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 36–52.
Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39799-8_2
36. Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 6th edn.
Morgan Kaufmann, San Francisco (2017)
37. Hoenicke, J., Majumdar, R., Podelski, A.: Thread modularity at many levels: a pearl in com-
positional verification. In: Proceedings of the 44th ACM SIGPLAN Symposium on Princi-
ples of Programming Languages, POPL 2017, Paris, France, 18–20 January 2017, pp. 473–
485 (2017). http://dl.acm.org/citation.cfm?id=3009893
38. Jha, S., Seshia, S.A.: A theory of formal synthesis via inductive learning. Acta Inf. 54(7),
693–726 (2017). https://doi.org/10.1007/s00236-017-0294-5
39. Jones, C.B.: Tentative steps toward a development method for interfering programs. ACM
Trans. Program. Lang. Syst. 5(4), 596–619 (1983). https://doi.org/10.1145/69575.69577.
http://doi.acm.org/10.1145/69575.69577
40. Karbyshev, A., Bjørner, N., Itzhaky, S., Rinetzky, N., Shoham, S.: Property-directed infer-
ence of universal invariants or proving their absence. J. ACM 64(1), 7:1–7:33 (2017). https://
doi.org/10.1145/3022187. http://doi.acm.org/10.1145/3022187
41. Korovin, K.: iProver – an instantiation-based theorem prover for first-order logic (system
description). In: Armando, A., Baumgartner, P., Dowek, G. (eds.) IJCAR 2008. LNCS
(LNAI), vol. 5195, pp. 292–298. Springer, Heidelberg (2008). https://doi.org/10.1007/978-
3-540-71070-7_24
42. Lahiri, S.K., Bryant, R.E.: Predicate abstraction with indexed predicates. ACM Trans. Com-
put. Log. 9(1), 4 (2007). https://doi.org/10.1145/1297658.1297662. http://doi.acm.org/10.
1145/1297658.1297662
43. Lamport, L.: Specifying Systems. The TLA+ Language and Tools for Hardware and Soft-
ware Engineers. Addison-Wesley (2002)
44. Mauborgne, L., Rival, X.: Trace partitioning in abstract interpretation based static analyzers.
In: Sagiv, M. (ed.) ESOP 2005. LNCS, vol. 3444, pp. 5–20. Springer, Heidelberg (2005).
https://doi.org/10.1007/978-3-540-31987-0_2
424 Y. M. Y. Feldman et al.
45. McMillan, K.L., Padon, O.: Deductive verification in decidable fragments with ivy. In: Podel-
ski, A. (ed.) SAS 2018. LNCS, vol. 11002, pp. 43–55. Springer, Cham (2018). https://doi.
org/10.1007/978-3-319-99725-4_4
46. de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J.
(eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://
doi.org/10.1007/978-3-540-78800-3_24
47. Newcombe, C., Rath, T., Zhang, F., Munteanu, B., Brooker, M., Deardeuff, M.: How amazon
web services uses formal methods. Commun. ACM 58(4), 66–73 (2015). https://doi.org/10.
1145/2699417. http://doi.acm.org/10.1145/2699417
48. Nipkow, T., Wenzel, M., Paulson, L.C. (eds.): Isabelle/HOL. LNCS, vol. 2283. Springer,
Heidelberg (2002). https://doi.org/10.1007/3-540-45949-9
49. Padon, O., Immerman, N., Shoham, S., Karbyshev, A., Sagiv, M.: Decidability of inferring
inductive invariants. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Sym-
posium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA,
20–22 January 2016, pp. 217–231 (2016). https://doi.org/10.1145/2837614.2837640. http://
doi.acm.org/10.1145/2837614.2837640
50. Padon, O., Losa, G., Sagiv, M., Shoham, S.: Paxos made EPR: decidable reasoning about
distributed protocols. PACMPL 1(OOPSLA), 108:1–108:31 (2017). https://doi.org/10.1145/
3140568. http://doi.acm.org/10.1145/3140568
51. Padon, O., McMillan, K.L., Panda, A., Sagiv, M., Shoham, S.: Ivy: safety verification by
interactive generalization. In: Proceedings of the 37th ACM SIGPLAN Conference on Pro-
gramming Language Design and Implementation, PLDI 2016, Santa Barbara, CA, USA,
13–17 June 2016, pp. 614–630 (2016)
52. Ramsey, F.P.: On a problem in formal logic. In: Proceedings on London Mathematical Soci-
ety (1930)
53. Rival, X., Mauborgne, L.: The trace partitioning abstract domain. ACM Trans. Program.
Lang. Syst. 29(5), 26 (2007). https://doi.org/10.1145/1275497.1275501. http://doi.acm.org/
10.1145/1275497.1275501
54. Sankaranarayanan, S., Sipma, H.B., Manna, Z.: Constraint-based linear-relations analysis.
In: Giacobazzi, R. (ed.) SAS 2004. LNCS, vol. 3148, pp. 53–68. Springer, Heidelberg (2004).
https://doi.org/10.1007/978-3-540-27864-1_7
55. Sharma, R., Aiken, A.: From invariant checking to invariant inference using randomized
search. Formal Methods Syst. Des. 48(3), 235–256 (2016). https://doi.org/10.1007/s10703-
016-0248-5
56. Sharma, R., Dillig, I., Dillig, T., Aiken, A.: Simplifying loop invariant generation using split-
ter predicates. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp.
703–719. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_57
57. Srivastava, S., Gulwani, S.: Program verification using templates over predicate abstraction.
In: Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design
and Implementation, PLDI 2009, Dublin, Ireland, 15–21 June 2009, pp. 223–234 (2009)
58. Srivastava, S., Gulwani, S., Foster, J.S.: Template-based program verification and program
synthesis. STTT 15(5–6), 497–518 (2013)
59. Strom, R.E., Yemini, S.: Typestate: a programming language concept for enhancing software
reliability. IEEE Trans. Softw. Eng. 12(1), 157–171 (1986)
60. Taube, M., et al.: Modularity for decidability of deductive verification with applications to
distributed systems. In: Proceedings of the 39th ACM SIGPLAN Conference on Program-
ming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, 18–22 June
2018, pp. 662–677 (2018). https://doi.org/10.1145/3192366.3192414. http://doi.acm.org/10.
1145/3192366.3192414
Inferring Inductive Invariants from Phase Structures 425
61. Wilcox, J.R., et al.: Verdi: a framework for implementing and formally verifying dis-
tributed systems. In: Proceedings of the 36th ACM SIGPLAN Conference on Programming
Language Design and Implementation, Portland, OR, USA, 15–17 June 2015, pp. 357–
368 (2015). https://doi.org/10.1145/2737924.2737958. http://doi.acm.org/10.1145/2737924.
2737958
62. Woos, D., Wilcox, J.R., Anton, S., Tatlock, Z., Ernst, M.D., Anderson, T.E.: Planning for
change in a formal verification of the raft consensus protocol. In: Proceedings of the 5th
ACM SIGPLAN Conference on Certified Programs and Proofs, Saint Petersburg, FL, USA,
20–22 January 2016, pp. 154–165 (2016). https://doi.org/10.1145/2854065.2854081. http://
doi.acm.org/10.1145/2854065.2854081
Open Access This chapter is licensed under the terms of the Creative Commons Attribution
4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use,
sharing, adaptation, distribution and reproduction in any medium or format, as long as you
give appropriate credit to the original author(s) and the source, provide a link to the Creative
Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder.
Termination of Triangular Integer Loops
is Decidable
1 Introduction
We consider affine integer loops of the form
while ϕ do x ← A x + a. (1)
Here, A ∈ Z for some dimension d ≥ 1, x is a column vector of pairwise
d×d
Funded by DFG grant 389792660 as part of TRR 248 and by DFG grant GI 274/6.
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 426–444, 2019.
https://doi.org/10.1007/978-3-030-25543-5_24
Termination of Triangular Integer Loops is Decidable 427
2
The proofs for real or rational numbers do not carry over to the integers since [15]
uses Brouwer’s Fixed Point Theorem which is not applicable if the variables range
over Z and [4] relies on the density of Q in R.
3
Similarly, one could of course also use other termination-preserving pre-processings
and try to transform a given program into a triangular loop.
428 F. Frohn and J. Giesl
benefit from integrating our decision procedure and applying it whenever a sub-
program is an affine triangular loop.
Note that triangularity and diagonalizability of matrices do not imply each
other. As we consider loops with arbitrary dimension, this means that the class
of loops considered in this paper is not covered by [3,14]. Since we consider affine
instead of linear loops, it is also orthogonal to [4].
To see the difference between our and previous results, note that a triangular
matrix A where c1 , . . . , ck are the distinct entries on the diagonal is diagonaliz-
able iff (A − c1 I) . . . (A − ck I) is the zero matrix.4 Here, I is the identity matrix.
So an easy example for a triangular loop where the update matrix is not diago-
nalizable is the following well-known program (see, e.g., [2]):
while x > 0 do x ← x + y; y ← y − 1
4
The reason is that in this case, (x − c1 ) . . . (x − ck ) is the minimal polynomial of
A and diagonalizability is equivalent to the fact that the minimal polynomial is a
product of distinct linear factors.
5
For instance, consider while x > 0 do x ← x + y + z1 + z2 + z3 ; y ← y − 1.
Termination of Triangular Integer Loops is Decidable 429
⎡⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤
w 0000 w 2
⎢x⎥ ⎢0 1 0 0⎥⎢ x ⎥ ⎢ 2⎥
while y + z > 0 ∧ −w − 2 · y + x > 0 do ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣ y ⎦ ← ⎣ 2 0 4 0 ⎦ ⎣ y ⎦ + ⎣ −2 ⎦
z 0100 z 1
Lemma 5 is needed to prove that (2) is an nnt-loop if (1) is triangular.
Lemma 5 (Squares of Triangular Matrices). For every triangular matrix
A, A2 is a triangular matrix whose diagonal entries are non-negative.
Corollary 6 (Chaining Loops). If (1) is triangular, then (2) is an nnt-loop.
Proof. Immediate consequence of Definition 3 and Lemma 5.
Lemma 7 (Equivalence of Chaining). (1) terminates ⇐⇒ (2) terminates.
Proof. By Definition 1, (1) does not terminate iff
∃c ∈ Zd . ∀n ∈ N. ϕ[x/f n (c)]
⇐⇒ ∃c ∈ Zd . ∀n ∈ N. ϕ[x/f 2·n (c)] ∧ ϕ[x/f 2·n+1 (c)]
⇐⇒ ∃c ∈ Zd . ∀n ∈ N. ϕ[x/f 2·n (c)] ∧ ϕ[x/A f 2·n (c) + a] (by Definition of f ),
c
As n ranges over N, we use n > c as syntactic sugar for i=0 n = i. So
an example for a poly-exponential expression is
n > 2 · (2 · x + 3 · y − 1) · n3 · 3n + n = 2 · (x − y).
Moreover, note that if ψ contains a positive literal (i.e., a literal of the form
“n = c” for some number c ∈ N), then ψ is equivalent to either 0 or n = c.
The crux of the proof that poly-exponential expressions can represent closed
forms is to show that certain sums over products of exponential and poly-ex-
ponential expressions can be represented by poly-exponential expressions, cf.
Lemma 12. To construct these expressions, we use a variant of [1, Lemma 3.5].
As usual, Q[x] is the set of all polynomials over x with rational coefficients.
Algorithm 1. compute r
d
Input: q = i=0 ci · ni ∈ Q[n], c ∈ Q
Result: r ∈ Q[n] such that q = r − c · r[n/n − 1]
if d = 0 then
if c = 1 then return c0 · n else return 1−cc0
else
·nd+1 d ·n
d
if c = 1 then s ← cdd+1 else s ← c1−c
return s + compute r( q − s + c · s[n/n − 1], c )
n
n
mn−i · p[n/i − 1] = ψj (i − 1) · mn−i · αj · (i − 1)aj · bi−1
j (3)
i=1 j=1 i=1
We first regard the case m = 0. Here, the expression (4) can be simplified to
The step marked with (†) holds as we have n > i − 1 = 1 for all i ∈ {1, . . . , n}
and the step marked with (††) holds since i = c + 1 implies ψ (i − 1) = 0. If
ψ does not contain a positive literal, then let c be the maximal constant that
occurs in ψ or −1 if ψ is empty. We get:
n ⎫
i=1 ψ (i − 1) · mn−i · α · (i − 1)a · bi−1 ⎪
⎪
= i=1 n > i − 1 · ψ (i − 1) · mn−i · α · (i − 1)a · bi−1 (†) ⎬
n
c+1 (8)
i=1 n > i − 1 · ψ (i − 1) · m · α · (i − 1)a · bi−1 ⎪
n−i
= ⎪
n ⎭
+ i=c+2 mn−i · α · (i − 1)a · bi−1
Again, the step marked with (†) holds since we have n > i − 1 = 1 for all
i ∈ {1, . . . , n}. The last step holds as i ≥ c + 2 implies ψ (i − 1) = 1. Similar to
the case where ψ contains a positive literal, we can compute a poly-exponential
expression which is equivalent to the first addend. We have
c+1
n > i − 1 · ψ (i1 − 1) · m · α · (i − 1)a · bi−1
n−i
i=1
= n > i − 1 · mi · α · (i − 1)a · bi−1 · mn (9)
1≤i≤c+1
ψ(i−1)=1
dr
Lemma 10 ensures r ∈ Q[n], i.e., we have r = i=0 mi · ni for some dr ∈ N and
b c+1 α
mi ∈ Q. Thus, r[n/c+1]· m · b ∈ Af[x] which implies n > c + 1·r[n/c+1]·
b c+1 α n
m · b · m ∈ PE[x]. It remains to show that the addend n > c + 1 · αb · r · bn
is equivalent to a poly-exponential expression. As αb · mi ∈ Af[x], we have
dr
n > c + 1 · α
b · r · bn = i=0 n > c + 1 · α
b · mi · ni · bn ∈ PE[x]. (11)
where w is a variable. (It will later on be needed to compute a closed form for
Example 4, see Example 18.) According to Algorithm 2 and (3), we get
n
i=1 4n−i · (n = 0 · 2 · w + n = 0 · 4 − 2) [n/i − 1]
n
= i=1 4n−i · (i − 1 = 0 · 2 · w + i − 1 = 0 · 4 − 2)
= p1 + p2 + p3
n n
i=1 i − 1 = 0 · 4 · 2 · w, p2 = i=1 i − 1 = 0 · 4n−i · 4, and
n−i
with p 1 =
n
p3 = i=1 4n−i · (−2). We search for q1 , q2 , q3 ∈ PE[w] that are equivalent to
p1 , p2 , p3 , i.e., q1 + q2 + q3 is equivalent to (12). We only show how to compute
q2 (and omit the computation of q1 = n = 0 · 12 · w · 4n and q3 = 23 − 23 · 4n ).
Analogously to (8), we get:
n
i=1 i − 1 = 0 · 4n−i · 4
n
= i=1 n > i − 1 · i − 1 = 0 · 4n−i · 4
1 n
= i=1 n > i − 1 · i − 1 = 0 · 4n−1 · 4 + i=2 4
n−i
·4
The next step is to rearrange the first sum as in (9). In our example, it directly
simplifies to 0 and hence we obtain
1 n n
i=1 n > i − 1 · i − 1 = 0 · 4 · 4 + i=2 4n−i · 4 = i=2 4n−i · 4.
n−1
434 F. Frohn and J. Giesl
The step marked with (†) holds by Lemma 10 with q = 1 and c = 4. Thus, we
have r = − 13 , cf. Example 11.
Recall that our goal is to compute closed forms for loops. As a first step,
instead of the n-fold update function h(n, x) = f n (x) of (1) where f is the update
of (1), we consider a recursive update function for a single variable x ∈ x:
Here, m ∈ N and p ∈ PE[x]. Using Lemma 12, it is easy to show that g can be
represented by a poly-exponential expression.
Lemma 14 (Closed Form for Single Variables). If x ∈ x, m ∈ N, and
p ∈ PE[x], then one can compute a q ∈ PE[x] which satisfies
Moreover,
Example 15. We show how to compute the closed forms for the variables w
and x from Example 4. We first consider the assignment w ← 2, i.e., we want to
compute a qw ∈ PE[w, x, y, z] with qw [n/0] = w and qw = (mw ·qw +pw ) [n/n−1]
for n > 0, where mw = 0 and pw = 2. According to (13) and (14), qw is
n n
mnw ·w + i=1 mn−i
w ·pw [n/i−1] = 0 ·w +
n
i=1 0
n−i
·2 = n = 0·w +n = 0·2.
To prove this claim, we show the more general Lemma 16. For all i1 , . . . , ik ∈
{1, . . . , m}, we define [z1 , . . . , zm ]i1 ,...,ik = [zi1 , . . . , zik ] (and the notation y i1 ,...,ik
for column vectors is defined analogously). Moreover, for a matrix A, Ai is A’s
ith row and ⎡Ai1 ,...,in ;j1 ,...,jk⎤ is the matrix with rows (Ai1 )j1 ,...,jk , . . . , (Ain )j1 ,...,jk .
a1,1 a1,2 a1,3
⎣ ⎦ a a
So for A = a2,1 a2,2 a2,3 , we have A1,2;1,3 = 1,1 1,3 .
a2,1 a2,3
a3,1 a3,2 a3,3
Proof. Assume that A is lower triangular (the case that A is upper triangular
works analogously). We use induction on d. For any d ≥ 1 we have:
q = (A q + p) [n/n − 1]
⇐⇒ q j = (Aj · q + pj ) [n/n − 1] for all 1 ≤ j ≤ d
⇐⇒ q j = (Aj;2,...,d · q 2,...,d + Aj;1 · q 1 + pj ) [n/n − 1] for all 1 ≤ j ≤ d
⇐⇒ q 1 = (A1;2,...,d · q 2,...,d + A1;1 · q 1 + p1 ) [n/n − 1] ∧
qj = (Aj;2,...,d · q 2,...,d + Aj;1 · q 1 + pj ) [n/n − 1] for all 1 < j ≤ d
⇐⇒ q 1 = (A1;1 · q 1 + p1 ) [n/n − 1] ∧
qj = (Aj;2,...,d · q 2,...,d + Aj;1 · q 1 + pj ) [n/n − 1] for all 1 < j ≤ d
The last step holds as A is lower triangular. By Lemma 14, we can compute a
q 1 ∈ PE[x] that satisfies
for all n > 0. As Aj;1 · q 1 + pj ∈ PE[x] for each 2 ≤ j ≤ d, the claim follows from
the induction hypothesis.
Together, Lemmas 14 and 16 and their proofs give rise to the following algo-
rithm to compute a solution for (16) and (17). It computes a closed form q 1 for
x1 as in the proof of Lemma 14, constructs the argument p for the recursive call
based on A, q 1 , and the current value of p as in the proof of Lemma 16, and
then determines the closed form for x2,...,d recursively.
y ← 2 · w + 4 · y − 2,
we obtain
qy = (4 · qy +2 · qw − 2) [n/n − 1]
= 4 n · y + ni=1 4
n−i
· (2 · qw − 2) [n/i − 1] (by (13))
= y · 4n + n i=1 4n−i
· (n = 0 · 2 · w + n
= 0 · 4 − 2) [n/i − 1] (see Example 15)
= q0 + q1 + q2 + q3 (see Example 13)
[n/n − 1]
qz = (qx + 1)
n
= 0n · z + i=1 0n−i · (qx + 1) [n/i − 1] (by (13))
= n = 0 · z + n = 0 · (qx [n/n − 1] + 1)
= n = 0 · z + n = 0 · ((x + 2 · n) [n/n − 1] + 1) (see Example 15)
= n = 0 · z + n = 0 · (x − 1) + n = 0 · 2 · n.
So the closed form of Example 4 for the values of the variables after n iterations
is:
⎡ ⎤ ⎡ ⎤
qw n = 0 · w + n = 0 · 2
⎢ qx ⎥ ⎢ x+2·n ⎥
⎣q ⎦ = ⎣ q0 + q1 + q2 + q3 ⎦
y
qz n = 0 · z + n = 0 · (x − 1) + n = 0 · 2 · n
438 F. Frohn and J. Giesl
By removing the factors ψ from the closed form q of an nnt-loop, we obtain
normalized poly-exponential expressions.
Definition 22 (Normalized PEs). We call p ∈ PE[x] normalized if it is in
n
NPE[x] = j=1 αj · naj
· b j , aj ∈ N, αj ∈ Af[x], bj ∈ N≥1 .
qw = n = 0 · w + n = 0 · 2 becomes 2,
qz = n = 0 · z + n = 0 · (x − 1) + n = 0 · 2 · n becomes x − 1 + 2 · n,
and qx = x + 2 · n, q0 = y · 4n , and q3 = 2
3 − 23 · 4n remain unchanged. Moreover,
q1 = n = 0 · · w · 4n
1
2 becomes · w · 4n
1
2 and
q2 = n > 1 · − 34 + n > 1 · 1
3 · 4n becomes − 3 + 3 · 4n .
4 1
Thus, qy = q0 + q1 + q2 + q3 becomes
y · 4n + 12 · w · 4n − 43 + 13 · 4n + 23 − 23 · 4n = 4n · y − 13 + 12 · w − 23 .
Let σ = w/2, x/x + 2 · n, y/4n · y − 13 + 12 · w − 23 , z/x − 1 + 2 · n . Then we
get that Example 2 is non-terminating iff there are w, x, y, z ∈ Z, n0 ∈ N such
that
(y +z) σ > 0 ∧ (−w − 2 · y + x) σ > 0 ⇐⇒
4n · y − 13 + 12 · w − 23 + x − 1 + 2 · n > 0 ∧
−2 − 2 · 4n · y − 13 + 12 · w − 23 + x + 2·n > 0 ⇐⇒
pϕ ϕ
1 > 0 ∧ p2 > 0
Proof. By considering the cases b2 > b1 and b2 = b1 separately, the claim can
easily be deduced from the definition of O.
Example 26. In Example 23 we saw that the loop from Example 2 is non-
terminating iff there are w, x, y, z ∈ Z, n0 ∈ N such that pϕ ϕ
1 > 0 ∧ p2 > 0 for all
n > n0 . We get:
(4,0) (1,1) (1,0)
coeffs (pϕ
1) = y − 13 + 12 · w ,2 , x − 53
(4,0) (1,1)
2 (1,0)
coeffs (pϕ
2 ) = 2
3 − 2 · y − w , 2 , x − 3
Proof. If p ∈
/ Q, then the limit of each addend of p is in {−∞, ∞} by definition
of NPE. As the asymptotically dominating addend determines limn→∞ p and
unmark(max (coeffs(p))) determines the sign of the asymptotically dominating
addend, the claim follows.
! j−1
max coeff pos(p) = j=1 αj > 0 ∧ i=0 αi = 0 . (21)
is valid. By multiplying each (in-)equality in (22) with the least common multiple
of all denominators, one obtains a first-order formula over the theory of linear
integer arithmetic. It is well known that validity of such formulas is decidable.
k
Note that (22) is valid iff i=1 max coeff pos(pi ) is satisfiable. So to implement
our decision procedure,k one can use integer programming or SMT solvers to
check satisfiability of i=1 max coeff pos(pi ). Lemma 30 allows us to prove our
main theorem.
Theorem 31. Termination of triangular loops is decidable.
Proof. By Theorem 8, termination of triangular loops is decidable iff termination
of nnt-loops is decidable. For an nnt-loop (1) we obtain a q norm ∈ NPE[x]d (see
Theorem 17 and Corollary 21) such that (1) is non-terminating iff
is valid. This is the case iff max coeff pos(p1 ) ∧ max coeff pos(p2 ), i.e.,
y− 1
3 + 12 ·w > 0 ∨ 2 > 0 ∧ y − 1
3+ 12 ·w = 0 ∨ x − 53 > 0 ∧ 2 = 0 ∧ y − 13 + 12 ·w = 0
∧
2
3 − 2·y − w > 0 ∨ 2 > 0 ∧ 2
3 − 2·y − w = 0 ∨ x − 23 > 0 ∧ 2 = 0 ∧ 23 − 2·y − w = 0
is satisfiable. This formula is equivalent to 6 · y − 2 + 3 · w = 0 which does not
have any integer solutions. Hence, the loop of Example 2 terminates.
Example 33 shows that our technique does not yield witnesses for non-
termination, but it only proves the existence of a witness for eventual non-
termination. While such a witness can be transformed into a witness for non-
termination by applying the loop several times, it is unclear how often the loop
needs to be applied.
Example 33. Consider the following non-terminating loop:
x x+y
while x > 0 do ← (23)
y 1
5 Conclusion
References
1. Bagnara, R., Zaccagnini, A., Zolo, T.: The Automatic Solution of Recurrence Rela-
tions. I. Linear Recurrences of Finite Order with Constant Coefficients. Technical
report. Quaderno 334. Dipartimento di Matematica, Università di Parma, Italy
(2003). http://www.cs.unipr.it/Publications/
2. Ben-Amram, A.M., Genaim, S., Masud, A.N.: On the termination of integer loops.
ACM Trans. Programm. Lang. Syst. 34(4), 16:1–16:24 (2012). https://doi.org/10.
1145/2400676.2400679
3. Bozga, M., Iosif, R., Konecný, F.: Deciding conditional termination. Logical Meth-
ods Comput. Sci. 10(3) (2014). https://doi.org/10.2168/LMCS-10(3:8)2014
4. Braverman, M.: Termination of integer linear programs. In: Ball, T., Jones, R.B.
(eds.) CAV 2006. LNCS, vol. 4144, pp. 372–385. Springer, Heidelberg (2006).
https://doi.org/10.1007/11817963 34
5. Brockschmidt, M., Cook, B., Ishtiaq, S., Khlaaf, H., Piterman, N.: T2: temporal
property verification. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS,
vol. 9636, pp. 387–393. Springer, Heidelberg (2016). https://doi.org/10.1007/978-
3-662-49674-9 22
444 F. Frohn and J. Giesl
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
AliveInLean: A Verified LLVM Peephole
Optimization Verifier
1 Introduction
Verifying compiler optimizations is important to ensure reliability of the soft-
ware ecosystem. Various frameworks have been proposed to verify optimizations
of industrial compilers. Among them, Alive [12] is a tool for verifying peephole
optimizations of LLVM that has been successfully adopted by compiler develop-
ers. Since it was released, Alive has helped developers find dozens of bugs.
Figure 1 shows the structure of Alive. An optimization pattern of interest
written in a domain-specific language is given as input. Alive parses the input,
and encodes the behavior of the source and target programs into logic formulas in
the theory of quantified bit-vectors and arrays. Finally, several proof obligations
are created from the encoded behavior, and then checked by an SMT solver.
Alive relies on the following three-fold trust base. Firstly, the semantics of
LLVM’s intermediate representation and SMT expressions. Secondly, Alive’s ver-
ification condition generator. Finally, the SMT solver used to discharge proof
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 445–455, 2019.
https://doi.org/10.1007/978-3-030-25543-5_25
446 J. Lee et al.
%s = shl 2, %N
%q = zext %s
Logical
%r = udiv %x, %q Formula
Encode Verification
=> VCGen
Semantics Condition
Logical
%N2 = add %N, 1
%N3 = zext %N2
%r = lshr %x, %N3 Formula
<Input>
obligations. None of these are formally verified, and thus an error in any of these
may result in an incorrect answer.
To address this problem, we introduce AliveInLean, a formally verified peep-
hole optimization verifier for LLVM. AliveInLean is written in Lean [14], an
interactive theorem proving language. Its semantics of LLVM IR (Intermedi-
ate Representation) and SMT expressions are rigorously tested using Lean’s
metaprogramming language [5] and system library. AliveInLean’s verification
condition generator is formally verified in Lean.
Using AliveInLean requires less human effort than directly proving the opti-
mizations on formal frameworks thanks to automation given by SMT solvers. For
example, verifying the correctness of a peephole optimization on a formal frame-
work requires more than a hundred lines of proofs [15]. However, the correctness
of AliveInLean relies on the correctness of the used SMT solver. To counteract
the dependency on SMT solvers, proof obligations can be cross-checked with
multiple SMT solvers. Moreover, there is substantial work towards making SMT
solvers generate proof certificates [2,3,6,7].
AliveInLean is a proof of concept. It currently does not support all operations
that Alive does like, e.g., memory-related operations. However, AliveInLean sup-
ports all integer peephole optimizations, which is already useful in practice as
most bugs found by Alive were in integer optimizations [12].
2 Overview
Name: AddSub:1309
%lhs = and i4 %a, %b
%rhs = or i4 %a, %b
%r = add i4 %lhs, %rhs
=>
%r = add i4 %a, %b
lemma never_poison:
forall .. (HSTEP: some st’ = step st (udiv isz name op1 op2))
(HNOUB: not (has_ub st’))
(HVAL: some val = get_value st op2 (ty.int isz)),
not (is_poison val)
3 Verifying Optimizations
1
poison is a special value of LLVM representing a result of an erroneous computation.
448 J. Lee et al.
Given a program and an initial state, the semantics encoder produces the final
state of the program as a set of SMT expressions. The IR interpreter is simi-
lar, but works over concrete values rather than symbolic ones. The semantics
encoder and the IR interpreter share the same codebase (essentially the LLVM
IR semantics). The code is parametric on the type of the program state. For
example, the type of undefined behavior can be either initialized as the bool
type of Lean or the Bool SMT expression type. Given the type, Lean can auto-
matically resolve which operations to use to update the state using typeclass
resolution.
Given a source program, a transformed program, and an initial state, the refine-
ment encoder emits an SMT expression that encodes the refinement check
between the final states of the two programs. To obtain the final states, the
semantics encoder is used.
The refinement check proves that (1) the transformed program only triggers
UB when the original program does (i.e., UB can only be removed), (2) the root
variable of the transformed program is only poison when it is also poison in the
original program, and (3) variables’ values in the final states of the two programs
are the same when no UB is triggered and the original value is not poison.
The parser for Alive’s DSL is implemented using Lean’s parser monad and file
I/O library. SMT expressions are processed with Z3 using Lean’s SMT interface.
4 Correctness of AliveInLean
This spec says that if SMT expressions s1, s2 of a bit-vector type (sbitvec)
are equivalent to two concrete bit-vector values b1, b2 in Lean (bitvector), an
add expression of s1, s2 is equivalent to the result of adding b1 and b2. Function
bitvector.add must be called in Lean, so its operands (b1, b2) are assigned
random values in Lean. sbitvec.add is translated to SMT’s bvadd expression,
and s1 and s2 are initialized as BitVec variables in an SMT solver. The testing
function generates an SMT expression with random inputs like the following:
(assert (forall ((s1 (_ BitVec 4))) (forall ((s2 (_ BitVec 4)))
(=> (= s1 #xA) (=> (= s2 #x2) (= (bvadd s1 s2) #xC))))))
The size of bitvector (sz) is initialized to 4, and b1, b2 were randomly initial-
ized to 10 (#xA) and 2 (#x2). A specification is incorrect if the generated SMT
expression is not valid.
5 Evaluation
For the evaluation, we used a computer with an Intel Core i5-6600 CPU and 8 GB
of RAM, and Z3 [13] for SMT solving. To test whether AliveInLean and Alive
give the same result, we used all of the 150 integer optimizations from Alive’s
test suite that are supported by AliveInLean. No mismatches were observed.
To test the SMT specification, we randomly generated 10,000 tests for each
of the operations (18 bit-vector and 15 boolean). This test took 3 CPU hours.
The LLVM IR specification was tested by running 1,000,000 random IR pro-
grams in our interpreter and comparing the output with that of LLVM. This
comparison needs to take into account that some programs may trigger UB or
yield a poison value, which gives freedom to LLVM to produce a variety of results.
These tests took 10 CPU hours overall. Four admitted arithmetic lemmas were
tested as well. As a side-effect of the testing, we found several miscompilation
bugs in LLVM.2
AliveInLean3 consists of 11.9K lines of code. The optimization verifier con-
sists of 2.2K LoC, the specification tester is 1.5K, and the proof has 8.1K lines.
It took 3 person-months to implement the tool and prove its correctness.
6 Related Work
We introduce previous work on compiler verification and validation and compare
it with AliveInLean. Also, we give an overview on previous work on semantics
of compiler intermediate representations (IRs).
2
https://llvm.org/PR40657.
3
https://github.com/Microsoft/AliveInLean.
AliveInLean: A Verified LLVM Peephole Optimization Verifier 451
z = 0 - (x / C)
=>
z = x / -C
Rosette [21] and Smten [23] address this issue by providing higher-level languages
for describing the search problem. SpaceSearch [24] helps programmers prove the
correctness of the description by supporting Coq and Rosette backends from a
single specification. AliveInLean provides a stronger guarantee of correctness
because translation to SMT expressions is also written in Lean, leaving Lean as
the sole trust-base.
7 Discussion
AliveInLean has several limitations. As discussed before, AliveInLean does not
support memory operations. Correctly encoding the memory model of LLVM
IR is challenging because the memory model of LLVM IR is more complex than
either a byte array or a set of memory objects [9]. Supporting branch instruc-
tions and floating point would help developers prove interesting optimizations.
Supporting branches is a challenging job especially when loops are involved.
Maintainability of AliveInLean highly relies on one’s proficiency in Lean.
Changing the semantics of an IR instruction breaks the proof, and updating it
requires proficiency in Lean. However, we believe that only relevant parts in the
proof need to be updated as the proof is modularized.
Alive has features that are absent in AliveInLean. Alive supports defining a
precondition for an optimization, inferring types of variables if not given, and
showing counter-examples if the optimization is wrong. We leave this as future
work.
8 Conclusion
AliveInLean is a formally verified compiler optimization verifier. Its verification
condition generator is formally verified with a machine-checked proof. Using
AliveInLean, developers can easily check the correctness of compiler optimiza-
tions with high reliability. Also, they can use AliveInLean as a formal framework
like Vellvm to prove properties of interest in limited cases. The extensive random
testing did not find problems in the trust base, increasing its trustworthiness.
Moreover, as a side-effect of the IR semantics testing, we found several bugs in
LLVM.
AliveInLean: A Verified LLVM Peephole Optimization Verifier 453
Acknowledgments. The authors thank Leonardo de Moura and Sebastian Ullrich for
their help with Lean. This work was supported in part by the Basic Science Research
Program through the National Research Foundation of Korea (NRF) funded by the
Ministry of Science and ICT (2017R1A2B2007512). The first author was supported by
a Korea Foundation for Advanced Studies scholarship.
References
1. LLVM language reference manual. https://llvm.org/docs/LangRef.html
2. Barbosa, H., Blanchette, J.C., Fontaine, P.: Scalable fine-grained proofs for for-
mula processing. In: de Moura, L. (ed.) CADE 2017. LNCS (LNAI), vol. 10395, pp.
398–412. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63046-5 25
3. Böhme, S., Fox, A.C.J., Sewell, T., Weber, T.: Reconstruction of Z3’s bit-vector
proofs in HOL4 and Isabelle/HOL. In: Jouannaud, J.-P., Shao, Z. (eds.) CPP
2011. LNCS, vol. 7086, pp. 183–198. Springer, Heidelberg (2011). https://doi.org/
10.1007/978-3-642-25379-9 15
4. Dénès, M., Hriţcu, C., Lampropoulos, L., Paraskevopoulou, Z., Pierce, B.C.:
Quickchick : Property-based Testing for Coq (2014)
5. Ebner, G., Ullrich, S., Roesch, J., Avigad, J., de Moura, L.: A metaprogramming
framework for formal verification. Proc. ACM Program. Lang. 1(ICFP), 34:1–34:29
(2017). https://doi.org/10.1145/3110278
6. Ekici, B., et al.: SMTCoq: a plug-in for integrating SMT solvers into Coq. In:
Computer Aided Verification, pp. 126–133 (2017)
7. Hadarean, L., Barrett, C., Reynolds, A., Tinelli, C., Deters, M.: Fine grained
SMT proofs for the theory of fixed-width bit-vectors. In: Davis, M., Fehnker, A.,
McIver, A., Voronkov, A. (eds.) LPAR 2015. LNCS, vol. 9450, pp. 340–355. Springer,
Heidelberg (2015). https://doi.org/10.1007/978-3-662-48899-7 24
8. Kang, J., et al.: Crellvm: verified credible compilation for LLVM. In: Proceed-
ings of the 39th ACM SIGPLAN Conference on Programming Language Design
and Implementation, pp. 631–645. ACM (2018). https://doi.org/10.1145/3192366.
3192377
9. Lee, J., Hur, C.K., Jung, R., Liu, Z., Regehr, J., Lopes, N.P.: Reconciling high-
level optimizations and low-level code in LLVM. Proc. ACM Program. Lang.
2(OOPSLA), 125:1–125:28 (2018). https://doi.org/10.1145/3276495
10. Lee, J., et al.: Taming undefined behavior in LLVM. In: Proceedings of the 38th
ACM SIGPLAN Conference on Programming Language Design and Implementa-
tion, pp. 633–647. ACM (2017). https://doi.org/10.1145/3062341.3062343
11. Leroy, X.: Formal verification of a realistic compiler. Commun. ACM 52(7), 107–
115 (2009). https://doi.org/10.1145/1538788.1538814
12. Lopes, N.P., Menendez, D., Nagarakatte, S., Regehr, J.: Provably correct peephole
optimizations with alive. In: Proceedings of the 36th ACM SIGPLAN Conference
on Programming Language Design and Implementation, pp. 22–32. ACM (2015).
https://doi.org/10.1145/2737924.2737965
13. de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R.,
Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg
(2008). https://doi.org/10.1007/978-3-540-78800-3 24
14. de Moura, L., Kong, S., Avigad, J., van Doorn, F., von Raumer, J.: The lean
theorem prover (System Description). In: Felty, A.P., Middeldorp, A. (eds.) CADE
2015. LNCS (LNAI), vol. 9195, pp. 378–388. Springer, Cham (2015). https://doi.
org/10.1007/978-3-319-21401-6 26
454 J. Lee et al.
15. Mullen, E., Zuniga, D., Tatlock, Z., Grossman, D.: Verified peephole optimiza-
tions for CompCert. In: Proceedings of the 37th ACM SIGPLAN Conference on
Programming Language Design and Implementation, pp. 448–461. ACM (2016).
https://doi.org/10.1145/2908080.2908109
16. Namjoshi, K.S., Tagliabue, G., Zuck, L.D.: A witnessing compiler: a proof of con-
cept. In: Legay, A., Bensalem, S. (eds.) RV 2013. LNCS, vol. 8174, pp. 340–345.
Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40787-1 22
17. Namjoshi, K.S., Zuck, L.D.: Witnessing program transformations. In: Logozzo, F.,
Fähndrich, M. (eds.) SAS 2013. LNCS, vol. 7935, pp. 304–323. Springer, Heidelberg
(2013). https://doi.org/10.1007/978-3-642-38856-9 17
18. Pnueli, A., Siegel, M., Singerman, E.: Translation validation. In: Steffen, B. (ed.)
TACAS 1998. LNCS, vol. 1384, pp. 151–166. Springer, Heidelberg (1998). https://
doi.org/10.1007/BFb0054170
19. Rinard, M.C., Marinov, D.: Credible compilation with pointers. In: Proceedings of
the Workshop on Run-Time Result Verification (1999)
20. Stepp, M., Tate, R., Lerner, S.: Equality-based translation validator for LLVM. In:
Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 737–742.
Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1 59
21. Torlak, E., Bodik, R.: Growing solver-aided languages with Rosette. In: Proceed-
ings of the 2013 ACM International Symposium on New Ideas, New Paradigms,
and Reflections on Programming & Software, pp. 135–152. ACM (2013). https://
doi.org/10.1145/2509578.2509586
22. Tristan, J.B., Govereau, P., Morrisett, G.: Evaluating value-graph translation val-
idation for LLVM. In: Proceedings of the 32nd ACM SIGPLAN Conference on
Programming Language Design and Implementation, pp. 295–305. ACM (2011).
https://doi.org/10.1145/1993498.1993533
23. Uhler, R., Dave, N.: Smten: automatic translation of high-level symbolic compu-
tations into SMT queries. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS,
vol. 8044, pp. 678–683. Springer, Heidelberg (2013). https://doi.org/10.1007/978-
3-642-39799-8 45
24. Weitz, K., Lyubomirsky, S., Heule, S., Torlak, E., Ernst, M.D., Tatlock, Z.: Space-
search: a library for building and verifying solver-aided tools. Proc. ACM Program.
Lang. 1(ICFP), 25:1–25:28 (2017). https://doi.org/10.1145/3110269
25. Zaks, A., Pnueli, A.: CoVaC: compiler validation by program analysis of the cross-
product. In: Cuellar, J., Maibaum, T., Sere, K. (eds.) FM 2008. LNCS, vol. 5014, pp.
35–51. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68237-0 5
26. Zhao, J., Nagarakatte, S., Martin, M.M., Zdancewic, S.: Formalizing the LLVM
intermediate representation for verified program transformations. In: Proceed-
ings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles
of Programming Languages, pp. 427–440. ACM (2012). https://doi.org/10.1145/
2103656.2103709
AliveInLean: A Verified LLVM Peephole Optimization Verifier 455
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Concurrency
Automated Parameterized
Verification of CRDTs
1 Introduction
For distributed applications, keeping a single copy of data at one location or
multiple fully-synchronized copies (i.e. state-machine replication) at different
locations, makes the application susceptible to loss of availability due to net-
work and machine failures. On the other hand, having multiple un-synchronized
replicas of the data results in high availability, fault tolerance and uniform low
latency, albeit at the expense of consistency. In the latter case, an update issued
at one replica can be asynchronously transmitted to other replicas, allowing
the system to operate continuously even in the presence of network or node
failures [8]. However, mechanisms must now be provided to ensure replicas are
kept consistent with each other in the face of concurrent updates and arbitrary
re-ordering of such updates by the underlying network.
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 459–477, 2019.
https://doi.org/10.1007/978-3-030-25543-5_26
460 K. Nagar and S. Jagannathan
Over the last few years, Conflict-free Replicated Datatypes (CRDTs) [19–21]
have emerged as a popular solution to this problem. In op-based CRDTs, when
an operation on a CRDT instance is issued at a replica, an effector (basically an
update function) is generated locally, which is then asynchronously transmitted
(and applied) at all other replicas.1 Over the years, a number of CRDTs have
been developed for common datatypes such as maps, sets, lists, graphs, etc.
The primary correctness criterion for a CRDT implementation is conver-
gence (sometimes called strong eventual consistency [9,20] (SEC)): two replicas
which have received the same set of effectors must converge to the same CRDT
state. Because of the weak default guarantees assumed to be provided by the
underlying network, however, we must consider the possibility that effectors
can be applied in arbitrary order on different replicas, complicating correctness
arguments. This complexity is further exacerbated because CRDTs impose no
limitations on how often they are invoked, and may assume additional properties
on network behaviour [14] that must be taken into account when formulating
correctness arguments.
Given these complexities, verifying convergence of operations in a replicated
setting has proven to be challenging and error-prone [9]. In response, several
recent efforts have used mechanized proof assistants to yield formal machine-
checked proofs of correctness [9,24]. While mechanization clearly offers stronger
assurance guarantees than handwritten proofs, it still demands substantial man-
ual proof engineering effort to be successful. In particular, correctness arguments
are typically given in terms of constraints on CRDT states that must be satisfied
by the underlying network model responsible for delivering updates performed
by other replicas. Relating the state of a CRDT at one replica with the visibility
properties allowed by the underlying network has typically involved construct-
ing an intricate simulation argument or crafting a suitably precise invariant to
establish convergence. This level of sophisticated reasoning is required for every
CRDT and consistency model under consideration. There is a notable lack of
techniques capable of reasoning about CRDT correctness under different weak
consistency policies, even though such techniques exist for other correctness cri-
teria such as preservation of state invariants [10,11] or serializability [4,16] under
weak consistency.
To overcome these challenges, we propose a novel automated verification
strategy that does not require complex proof-engineering of handcrafted sim-
ulation arguments or invariants. Instead, our methodology allows us to directly
connect constraints on events imposed by the consistency model with con-
straints on states required to prove convergence. Consistency model constraints
are extracted from an axiomatization of network behavior, while state con-
straints are generated using reasoning principles that determine the commuta-
tivity and non-interference of sequences of effectors, subject to these consistency
constraints. Both sets of constraints can be solved using off-the-shelf theorem
1
In this work, we focus on the op-based CRDT model; however, our technique nat-
urally extends to state-based CRDTs since they can be emulated by an op-based
model [20].
Automated Parameterized Verification of CRDTs 461
Collectively, these contributions yield (to the best of our knowledge) the first
automated and parameterized proof methodology for CRDT verification.
The remainder of the paper is organized as follows. In the next section, we
provide further motivation and intuition for our approach. Section 3 formalizes
the problem definition, providing an operational semantics and axiomatizations
of well-known consistency specifications. Section 4 describes our proof strategy
for determining CRDT convergence that is amenable to automated verification.
Section 5 provides details about our implementation and experimental results
justifying the effectiveness of our framework. Section 6 presents related work
and conclusions.
2 Illustrative Example
2
Assume that every call to Add uses a unique identifier, which can be easily arranged,
for example by keeping a local counter at every replica which is incremented at every
operation invocation, and using the id of the replica and the value of the counter as
a unique identifier.
Automated Parameterized Verification of CRDTs 463
3 Problem Definition
In this section, we formalize the problem of determining convergence in CRDTs
parametric to a weak consistency policy. First, we define a general operational
semantics to describe all valid executions of a CRDT under any given weak
consistency policy. As stated earlier, a CRDT program P is specified by the
tuple (Σ, O, σinit ). Here, we find it to convenient to define an operation o ∈ O as
464 K. Nagar and S. Jagannathan
Note that indicates the empty sequence. Hence, for all states σ and sequence
of functions π, we have o(σ, π) = ô(π(σ)).
To define the operational semantics, we abstract away from the concept of
replicas, and instead maintain a global pool of effectors. A new CRDT opera-
tion is executed against a CRDT state obtained by first selecting a subset of
effectors from the global pool and then applying the elements in that set in
some non-deterministically chosen permutation to the initial CRDT state. The
choice of effectors and their permutation must obey the weak consistency policy
specification. Given a CRDT P = (Σ, O, σinit ) and a weak consistency policy
Ψ , we define a labeled transition system SP,Ψ = (C, →), where C is a set of
configurations and → is the transition relation. A configuration c = (Δ, vis, eo)
consists of three components: Δ is a set of events, vis ⊆ Δ × Δ is a visibility
relation, and eo ⊆ Δ × Δ is a global effector order relation (constrained to be
anti-symmetric). An event η ∈ Δ is a tuple (eid, o, σs , Δr , eo) where eid is a
unique event id, o ∈ O is a CRDT operation, σs ∈ Σ is the start CRDT state,
Δr is the set of events visible to η (also called the history of η), and eo is a
total order on the events in Δr (also called the local effector order relation). We
assume projection functions for each component of an event (for example σs (η)
projects the start state of the event η).
Given an event η = (eid, o, σs , Δr , eo), we define η e to be the effector associ-
ated with the event. This effector is obtained by executing the CRDT operation
o against the start CRDT state σs and the sequence of effectors obtained from
the events in Δr arranged in the reverse order of eo. Formally,
⎧
⎪
⎨o(σs , ) if Δr = φ
η e = o(σs , ki=1 ηPe (i) ) if Δr = {η1 , . . . , ηk } where P : {1, . . . , k} → {1, . . . , k}
⎪
⎩
∀i, j.i < j ⇒ (ηP (j) , ηP (i) ) ∈ eo
(1)
The rule describes the effect of executing a new operation o, which begins by
first selecting a subset of already completed events (Δr ) and a total order eor
on these events which obeys the global effector order eo. This mimics applying
the operation o on an arbitrary replica on which the events of Δr have been
applied in the order eor . A new event (η) corresponding to the issued operation
o is computed, which is used to label the transition and is also added to the cur-
rent configuration. All the events in Δr are visible to the new event η, which is
reflected in the new visibility relation vis . The system moves to the new configu-
ration (Δ , vis , eo ) which must satisfy the consistency policy Ψ . Note that even
though the general transition rule allows the event to pick any arbitrary start
state σs , we restrict the start state of all events in a well-formed execution
to be the initial CRDT state σinit , i.e. the state in which all replicas begin their
execution. A trace of SP,Ψ is a sequence of transitions. Let SP,Ψ be the set of
all finite traces. Given a trace τ , L(τ ) denotes all events (i.e. labels) in τ .
For Eventual Consistency (EC) [3], we do not place any constraints on the
visibility order and require the global effector order to be empty. This reflects
the fact that in EC, any number of events can occur concurrently at different
replicas, and hence a replica can witness any arbitrary subset of events which
may be applied in any order. In Causal Consistency (CC) [14], an event is applied
at a replica only if all causally dependent events have already been applied. An
event η1 is causally dependent on η2 if η1 was generated at a replica where either
η2 or any other event causally dependent on η2 had already been applied. The
visibility relation vis captures causal dependency, and by making vis transitive,
we ensure that all causal dependencies of events in Δr are also present in Δr
(this is because in the transition rule, Ψ is checked on the updated visibility
relation which relates events in Δr with the newly generated event). Further,
causally dependent events must be applied in the same order at all replicas,
which we capture by asserting that vis implies eo. In RedBlue Consistency (RB)
[13], a subset of CRDT operations (Or ⊆ O) are synchronized, so that they
must occur in the same order at all replicas. We express RB in our framework
by requiring the visibility relation to be total among events whose operations
are in Or . In Parallel Snapshot Isolation (PSI) [23], two events which conflict with
each other (because they write to a common variable) are not allowed to be
executed concurrently, but are synchronized across all replicas to be executed
in the same order. Similar to [10], we assume that when a CRDT is used under
PSI, its state space Σ is a map from variables to values, and every operation
generates an effector which simply writes to certain variables. We assume that
Wr(η e ) returns the set of variables written by the effector η e , and express PSI
in our framework by requiring that events which write a common variable are
applied in the same order (determined by their visibility relation) across all
replicas; furthermore, the policy requires that the visibility operation among
such events is total. Finally, in Strong Consistency, the visibility relation is total
and all effectors are applied in the same order at all replicas.
η
Given an execution τ ∈ SP,Ψ and a transition C − → C in τ , we associate
a set of replica states Ση that the event can potentially witness, by consider-
ing all permutations of the effectors visible to η which obey the global effector
order, when applied to the start state σs (η). Formally, this is defined as follows,
assuming η = (eid, o, σs , {η1 , . . . , ηk }, eor ) and C = (Δ, vis, eo)):
Ση = {ηPe (1) ◦ ηPe (2) ◦ . . . ◦ ηPe (k) (σs ) | P : {1, . . . , k} → {1, . . . , k},
eoP is a total order , i < j ⇒ (ηP (j) , ηP (i) ) ∈ eoP , eo ⊆ eoP }
In the above definition, for all valid local effector orders eoP , we compute the
CRDT states obtained on applying those effectors on the start CRDT state,
which constitute Ση . The original event η presumably would have witnessed one
of these states.
4 Automated Verification
In order to show that a CRDT achieves SEC under a consistency specification,
we need to show that all events in any execution are convergent, which in turn
requires us to show that any valid permutation of valid subsets of events in an
execution leads to the same state. This is a hard problem because we have to
reason about executions of unbounded length, involving unbounded sets of effec-
tors and reconcile the declarative event-based specifications of weak consistency
with states generated during execution. To make the problem tractable, we use
a two-fold strategy. First, we show that if any pair of effectors generated during
any execution either commute with each other or are forced to be applied in the
same order by the consistency policy, then the CRDT achieves SEC. Second, we
develop an inductive proof rule to show that all pairs of effectors generated dur-
ing any (potentially unbounded) execution obey the above mentioned property.
To ensure soundness of the proof rule, we place some reasonable assumptions on
the consistency policy that (intuitively) requires behaviorally equivalent events
to be treated the same by the policy, regardless of context (i.e., the length
of the execution history at the time the event is applied). We then extract a
simple sufficient condition which we call as non-interference to commutativity
that captures the heart of the inductive argument. Notably, this condition can
be automatically checked for different CRDTs under different consistency poli-
cies using off-the-shelf theorem provers, thus providing a pathway to performing
automated parametrized verification of CRDTs.
η
Given a transition (Δ, vis, eo) −
→ C, we denote the global effector order in
the starting configuration of η, i.e. eo as eoη . We first show that a sufficient
condition to prove that a CRDT is convergent is to show that any two events in
its history either commute or are related by the global effector order.
3
All proofs can be found in the extended version [15] of the paper.
468 K. Nagar and S. Jagannathan
We now present a property that consistency policies must obey for our verifi-
cation methodology to be soundly applied. First, we define the notion of behav-
ioral equivalence of events:
Definition 4 (Behavioral Equivalence).
Two events η1 = (id1 , o1 , σ1 , Δ1 , eo1 ) and η2 = (id2 , o2 , σ2 , Δ2 , eo2 ) are behav-
iorally equivalent if η1e = η2e and o1 = o2 .
That is, behaviorally equivalent events produce the same effectors. We use
the notation η1 ≡ η2 to indicate that they are behaviorally equivalent.
Definition 5 (Behaviorally Stable Consistency Policy). A consistency
policy Ψ is behaviorally stable if ∀Δ, vis, eo, Δ , vis , eo , η1 , η2 ∈ Δ, η1 , η2 ∈ Δ
the following holds:
(Ψ (Δ, vis, eo) ∧ Ψ (Δ , vis , eo ) ∧ η1 ≡ η1 ∧ η2 ≡ η2 ∧ vis(η1 , η2 ) ⇔ vis (η1 , η2 ))
⇒ eo(η1 , η2 ) ⇔ eo (η1 , η2 )
(2)
Behaviorally stable consistency policies treat behaviorally equivalent events
which have the same visibility relation among them in the same manner by
enforcing the same effector order. All consistency policies that we discussed in
the previous section (representing the most well-known in the literature) are
behaviorally stable:
Lemma 2. EC, CC, PSI, RB and SC are behaviorally stable.
EC does not enforce any effector ordering and hence is trivially stable behav-
iorally. CC forces causally dependent events to be in the same order, and hence
behaviorally equivalent events which have the same visibility order will be forced
to be in the same effector order. RB forces events whose operations belong to a
specific subset to be in the same order, but since behaviorally equivalent events
perform the same operation, they would be enforced in the same effector order-
ing. Similarly, PSI forces events writing to a common variable to be in the same
order, but since behaviorally equivalent events generate the same effector, they
would also write to the same variables and hence would be forced in the same
effector order. SC forces all events to be in the same order which is equal to
the visibility order, and hence is trivially stable behaviorally. In general, behav-
iorally stable consistency policies do not consider the context in which events
occur, but instead rely only on observable behavior of the events to constrain
their ordering. A simple example of a consistency policy which is not behav-
iorally stable is a policy which maintains bounded concurrency [12] by limiting
the number of concurrent operations across all replicas to a fixed bound. Such
a policy would synchronize two events only if they occur in a context where
keeping them concurrent would violate the bound, but behaviorally equivalent
events in a different context may not be synchronized.
For executions under a behaviorally stable consistency policy, the global effec-
tor order between events only grows in an execution, so that if two events η1 and
Automated Parameterized Verification of CRDTs 469
η2 are in the history of some event η are related by eoη , then if they later occur
in the history of any other event, they would be related in the same effector
order. Hence, we can now define a common global effector order for an execu-
tion. Given an execution τ ∈ SP,Ψ , the effector order eoτ ⊆ L(τ ) × L(τ ) is an
anti-symmetric relation defined as follows:
Condition (1) corresponds to the base case of our inductive argument and
requires that in well-formed executions with 2 events, both the events commute
modulo Ψ . For condition (2), our intention is to consider two events ηa and
ηb with any arbitrary histories which can occur in any well-formed execution
and, assuming that they commute modulo Ψ , show that even after the addition
of another event to their histories, they continue to commute. We use CRDT
states σ1 , σ2 to summarize the histories of the two events, and construct behav-
iorally equivalent events (η1 ≡ ηa and η2 ≡ ηb ) which would take σ1 , σ2 as
their start states. That is, if ηa produced the effector o(σinit , π)4 , where o is the
CRDT operation corresponding to ηa and π is the sequence of effectors in its
history, we leverage the observation that o(σinit , π) = o(π(σinit ), ), and assum-
ing σ1 = π(σinit ), we obtain the behaviorally equivalent event η1 , i.e. η1e ≡ ηae .
Similar analysis establishes that η2e ≡ ηbe . However, since we have no way of char-
acterizing states σ1 and σ2 which are obtained by applying arbitrary sequences
of effectors, we use commutativity itself as an identifying characteristic, focusing
on only those σ1 and σ2 for which the events η1 and η2 commute modulo Ψ .
The interfering event is also summarized by another CRDT state σ3 , and
we require that after suffering interference from this new event, the original two
events would continue to commute modulo Ψ . This would essentially establish
that any two events with any history would commute modulo Ψ in these small
executions, which by the behavioral stability of Ψ would translate to their com-
mutativity in any execution.
Example: Let us apply the proposed verification strategy to the ORSet CRDT
shown in Fig. 2. Under EC, condition (1) of Non-Interf fails, because in the exe-
η1 η2
cution Cinit −→ C1 −→ C2 where o(η1 ) =Add(a,i) and o(η2 ) =Remove(a) and
vis(η1 , η2 ), η1 and η2 don’t commute modulo EC, since (a,i) would be present in
the source replica of Remove(a). However, η1 and η2 would commute modulo CC,
since they would be related by the effector order. Now, moving to condition (2) of
Non-interf, we limit ourselves to source replica states σ1 and σ2 where Add(a,i)
and Remove(a) do commute modulo CC. If visτ (η1 , η2 ), then after interference,
in execution τ , visτ (η1 , η2 ), in which case η1 and η2 trivially commute modulo
CC (because they would be related by the effector order). On the other hand,
if ¬visτ (η1 , η2 ), then for η1 and η2 to commute modulo CC, we must have that
the effectors η1e and η2e themselves commute, which implies that (a,i) ∈ / σ2 .
Now, consider any execution τ with an interfering operation η3 . If η3 is another
Add(a,i’) operation, then i’ = i, so that even if it is visible to η2 , η2e will
not remove (a,i), so that η1 and η2 would commute. Similarly, if η3 is another
Remove(a) operation, it can only remove tagged versions of a from the source
replicas of η2 , so that the effector η2e would not remove (a,i).
4
Note that in a well-formed execution, the start state is always σinit .
Automated Parameterized Verification of CRDTs 471
5 Experimental Results
In this section, we present the results of applying our verification methodology
to a number of CRDTs under different consistency models. We collected CRDT
implementations from a number of sources [1,19,20] and since all of the exist-
ing implementations assume a very weak consistency model (primarily CC), we
additionally implemented a few CRDTs on our own intended to only work under
stronger consistency schemes but which are better in terms of time/space com-
plexity and ease of development. Our implementations are not written in any
specific language but instead are specified abstractly akin to the definitions given
in Figs. 1 and 2. To specify CRDT states and operations, we fix an abstract lan-
guage that contains uninterpreted datatypes (used for specifying elements of sets,
lists, etc.), a set datatype with support for various set operations (add, delete,
union, intersection, projection, lookup), a tuple datatype (along with operations
to create tuples and project components) and a special uninterpreted datatype
equipped with a total order for identifiers. Note that the set datatype used in
our abstract language is different from the Set CRDT, as it is only intended to
perform set operations locally at a replica. All existing CRDT definitions can be
naturally expressed in this framework.
Here, we revert back to the op-based specification of CRDTs. For a given
CRDT P = (Σ, O, σinit ), we convert all its operations into FOL formulas relat-
ing the source, input and output replica states. That is, for a CRDT operation
o : Σ → Σ → Σ, we create a predicate o : Σ × Σ × Σ → B such that o(σs , σi , σo )
is true if and only if o(σs )(σi ) = σo . Since CRDT states are typically expressed
as sets, we axiomatize set operations to express their semantics in FOL.
In order to specify a consistency model, we introduce a sort for events and
binary predicates vis and eo over this sort. Here, we can take advantage of the
declarative specification of consistency models and directly encode them in FOL.
Given an encoding of CRDT operations and a consistency model, our verifica-
tion strategy is to determine whether the Non-Interf property holds. Since both
conditions of this property only involve executions of finite length (at most 3),
we can directly encode them as UNSAT queries by asking for executions which
break the conditions. For condition (1), we query for the existence of two events
η1 and η2 along with vis and eo predicates which satisfy the consistency specifi-
cation Ψ such that these events are not related by eo and their effectors do not
commute. For condition (2), we query for the existence of events η1 , η2 , η3 and
their respective start states σ1 , σ2 , σ3 , such that η1 and η2 commute modulo Ψ
but after interference from η3 , they are not related by eo and do not commute.
Both these queries are encoded in EPR [18], a decidable fragment of FOL, so
if the CRDT operations and the consistency policy can also be encoded in a
decidable fragment of FOL (which is the case in all our experiments), then our
verification strategy is also decidable. We write Non-Interf-1 and Non-Interf-2 for
the two conditions of Non-Interf.
Figure 4 shows the results of applying the proposed methodology on different
CRDTs. We used Z3 to discharge our satisfiability queries. For every combination
of a CRDT and a consistency policy, we write ✗ to indicate that verification of
472 K. Nagar and S. Jagannathan
Non-Interf failed, while ✓ indicates that it was satisfied. We also report the
verification time taken by Z3 for every CRDT across all consistency policies
executing on a standard desktop machine. We have picked the three collection
datatypes for which CRDTs have been proposed i.e. Set, List and Graph, and
for each such datatype, we consider multiple variants that provide a tradeoff
between consistency requirements and implementation complexity. Apart from
EC, CC and PSI, we also use a combination of PSI and RB, which only enforce
PSI between selected pairs of operations (in contrast to simple RB which would
enforce SC between all selected pairs). Note that when verifying a CRDT under
PSI, we assume that the set operations are implemented as Boolean assignments,
and the write set Wr consists of elements added/removed. We are unaware of
any prior effort that has been successful in automatically verifying any CRDT,
let alone those that exhibit the complexity of the ones considered here.
Set: The Simple-Set CRDT in Fig. 1 does not converge under EC or CC, but
achieves convergence under PSI+RB which only synchronizes Add and Remove
operations to the same elements, while all other operations continue to run under
EC, since they do commute with each other. As explained earlier, ORSet does not
converge under EC and violates Non-Interf-1. ORSet with tombstones converges
under EC as well since it uses a different set (called a tombstone) to keep track of
removed elements. USet is another implementation of the Set CRDT which con-
verges under the assumptions that an element is only added once, and removes
only work if the element is already present in the source replica. USet converges
only under PSI, because under any weaker consistency model, non-interf-2
breaks, since Add(a) interferes and breaks the commutativity of Add(a) and
Remove(a). Notice that as the consistency level weakens, implementations need
Automated Parameterized Verification of CRDTs 473
the imprecision of Non-Interf-2. There are two sources of imprecision, both con-
cerning the start states of the events picked in the condition: (1) we only use
commutativity as a distinguishing property of the start states, but this may not
be a sufficiently strong inductive invariant, (2) we place no constraints on the
start state of the interfering operation. In practice, we have found that for all
cases except U-Set, convergence violations manifest via failure of Non-Interf-1.
If Non-Interf-2 breaks, we can search for well-formed executions of higher length
upto a bound. For U-Set, we were successful in adopting this approach, and were
able to find a non-convergent well-formed execution of length 3.
References
1. Attiya, H., Burckhardt, S., Gotsman, A., Morrison, A., Yang, H., Zawirski, M.:
Specification and complexity of collaborative text editing. In: Proceedings of the
2016 ACM Symposium on Principles of Distributed Computing, PODC 2016,
Chicago, IL, USA, 25–28 July 2016, pp. 259–268 (2016). https://doi.org/10.1145/
2933057.2933090
2. Bailis, P., Fekete, A., Franklin, M.J., Ghodsi, A., Hellerstein, J.M., Stoica, I.: Coor-
dination avoidance in database systems. PVLDB 8(3), 185–196 (2014). https://doi.
org/10.14778/2735508.2735509. http://www.vldb.org/pvldb/vol8/p185-bailis.pdf
3. Bailis, P., Ghodsi, A.: Eventual consistency today: limitations, extensions, and
beyond. Commun. ACM 56(5), 55–63 (2013). https://doi.org/10.1145/2447976.
2447992
4. Bernardi, G., Gotsman, A.: Robustness against consistency models with atomic
visibility. In: 27th International Conference on Concurrency Theory, CONCUR
2016, 23–26 August 2016, Québec City, Canada, pp. 7:1–7:15 (2016). https://doi.
org/10.4230/LIPIcs.CONCUR.2016.7
5. Brutschy, L., Dimitrov, D., Müller, P., Vechev, M.T.: Static serializability analysis
for causal consistency. In: Proceedings of the 39th ACM SIGPLAN Conference
on Programming Language Design and Implementation, PLDI 2018, Philadelphia,
PA, USA, 18–22 June 2018, pp. 90–104 (2018). https://doi.org/10.1145/3192366.
3192415
6. Burckhardt, S., Gotsman, A., Yang, H., Zawirski, M.: Replicated data types: spec-
ification, verification, optimality. In: The 41st Annual ACM SIGPLAN-SIGACT
Symposium on Principles of Programming Languages, POPL 2014, San Diego, CA,
USA, 20–21 January 2014, pp. 271–284 (2014). https://doi.org/10.1145/2535838.
2535848
7. DeCandia, G., et al.: Dynamo: amazon’s highly available key-value store. In: Pro-
ceedings of the 21st ACM Symposium on Operating Systems Principles 2007, SOSP
2007, Stevenson, Washington, USA, 14–17 October 2007, pp. 205–220 (2007).
https://doi.org/10.1145/1294261.1294281
8. Gilbert, S., Lynch, N.A.: Brewer’s conjecture and the feasibility of consistent, avail-
able, partition-tolerant web services. SIGACT News 33(2), 51–59 (2002). https://
doi.org/10.1145/564585.564601. http://doi.acm.org/10.1145/564585.564601
9. Gomes, V.B.F., Kleppmann, M., Mulligan, D.P., Beresford, A.R.: Verifying strong
eventual consistency in distributed systems. PACMPL 1(OOPSLA), 109:1–109:28
(2017). https://doi.org/10.1145/3133933
476 K. Nagar and S. Jagannathan
10. Gotsman, A., Yang, H., Ferreira, C., Najafzadeh, M., Shapiro, M.: ‘Cause i’m
strong enough: reasoning about consistency choices in distributed systems. In: Pro-
ceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles
of Programming Languages, POPL 2016, St. Petersburg, FL, USA, 20–22 Jan-
uary 2016, pp. 371–384 (2016). https://doi.org/10.1145/2837614.2837625, http://
doi.acm.org/10.1145/2837614.2837625
11. Houshmand, F., Lesani, M.: Hamsaz: replication coordination analysis and syn-
thesis. PACMPL 3(POPL), 74:1–74:32 (2019). https://dl.acm.org/citation.cfm?
id=3290387
12. Kaki, G., Earanky, K., Sivaramakrishnan, K.C., Jagannathan, S.: Safe replication
through bounded concurrency verification. PACMPL 2(OOPSLA), 164:1–164:27
(2018). https://doi.org/10.1145/3276534
13. Li, C., Porto, D., Clement, A., Gehrke, J., Preguiça, N.M., Rodrigues, R.: Mak-
ing geo-replicated systems fast as possible, consistent when necessary. In: 10th
USENIX Symposium on Operating Systems Design and Implementation, OSDI
2012, Hollywood, CA, USA, 8–10 October 2012, pp. 265–278 (2012). https://www.
usenix.org/conference/osdi12/technical-sessions/presentation/li
14. Lloyd, W., Freedman, M.J., Kaminsky, M., Andersen, D.G.: Don’t settle for even-
tual: scalable causal consistency for wide-area storage with COPS. In: Proceedings
of the 23rd ACM Symposium on Operating Systems Principles 2011, SOSP 2011,
Cascais, Portugal, 23–26 October 2011, pp. 401–416 (2011). https://doi.org/10.
1145/2043556.2043593, http://doi.acm.org/10.1145/2043556.2043593
15. Nagar, K., Jagannathan, S.: Automated Parameterized Verification of CRDTs
(Extended Version). https://arxiv.org/abs/1905.05684
16. Nagar, K., Jagannathan, S.: Automated detection of serializability violations under
weak consistency. In: 29th International Conference on Concurrency Theory, CON-
CUR 2018, 4–7 September 2018, Beijing, China, pp. 41:1–41:18 (2018). https://
doi.org/10.4230/LIPIcs.CONCUR.2018.41
17. Nichols, D.A., Curtis, P., Dixon, M., Lamping, J.: High-latency, low-bandwidth
windowing in the jupiter collaboration system. In: Proceedings of the 8th Annual
ACM Symposium on User Interface Software and Technology, UIST 1995, Pitts-
burgh, PA, USA, 14–17 November 1995, pp. 111–120 (1995). https://doi.org/10.
1145/215585.215706
18. Piskac, R., de Moura, L.M., Bjørner, N.: Deciding effectively propositional logic
using DPLL and substitution sets. J. Autom. Reasoning 44(4), 401–424 (2010).
https://doi.org/10.1007/s10817-009-9161-6
19. Preguiça, N.M., Baquero, C., Shapiro, M.: Conflict-free replicated data types
(CRDTs). CoRR abs/1805.06358 (2018). http://arxiv.org/abs/1805.06358
20. Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M.: A comprehensive study of
Convergent and Commutative Replicated Data Types. Technical report RR-7506,
INRIA, Inria - Centre Paris-Rocquencourt (2011)
21. Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M.: Conflict-free replicated data
types. In: Défago, X., Petit, F., Villain, V. (eds.) SSS 2011. LNCS, vol. 6976, pp.
386–400. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24550-
3 29
22. Sivaramakrishnan, K.C., Kaki, G., Jagannathan, S.: Declarative programming
over eventually consistent data stores. In: Proceedings of the 36th ACM SIG-
PLAN Conference on Programming Language Design and Implementation, Port-
land, OR, USA, 15–17 June 2015, pp. 413–424 (2015). https://doi.org/10.1145/
2737924.2737981
Automated Parameterized Verification of CRDTs 477
23. Sovran, Y., Power, R., Aguilera, M.K., Li, J.: Transactional storage for geo-
replicated systems. In: Proceedings of the 23rd ACM Symposium on Operating
Systems Principles 2011, SOSP 2011, Cascais, Portugal, 23–26 October 2011, pp.
385–400 (2011). https://doi.org/10.1145/2043556.2043592, http://doi.acm.org/10.
1145/2043556.2043592
24. Zeller, P., Bieniusa, A., Poetzsch-Heffter, A.: Formal specification and verification of
CRDTs. In: Ábrahám, E., Palamidessi, C. (eds.) FORTE 2014. LNCS, vol. 8461, pp.
33–48. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43613-4 3
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
What’s Wrong with On-the-Fly Partial
Order Reduction
Stephen F. Siegel(B)
1 Introduction
in [12] and in further detail in [13]. I shall refer to this algorithm as the combined
algorithm. Theorem 4.2 of [13] asserts the soundness of the combined algorithm.
A proof of the theorem is also given in [13].
The proof has a gap. This was pointed out in [16, Sect. 5], with details in
[15]. The gap was rediscovered in the course of developing mechanized correctness
proofs for model checking algorithms; an explicit counterexample to the incorrect
proof step was also found ([2, Sect. 8.4.5] and [3, Sect. 5]). The fact that the
proof is erroneous, however, does not imply the theorem is wrong. To the best
of my knowledge, no one has yet produced a proof or a counterexample for the
soundness of the combined algorithm.
In this paper, I show that the combined algorithm is not sound; a counterex-
ample is given in Sect. 3.1. I found this counterexample by modeling the com-
bined algorithm in Alloy and using the Alloy analyzer [11] to check its soundness.
Sect. 4 describes this model. Spin’s POR is based on the combined algorithm,
and in Sect. 5, Spin is seen to return an incorrect result on a Promela model
derived from the theoretical counterexample.
There is a small adjustment to the combined algorithm, yielding an algo-
rithm that is arguably more natural and that returns the correct result on the
previous counterexample; this is described in Sect. 6. It turns out this one is also
unsound, as demonstrated by another Alloy-produced counterexample. However,
in Sect. 7, I show that this variation is sound if certain restrictions are placed on
the property automaton.
2 Preliminaries
Definition 1. A finite state program is a triple P = T, Q, ι, where Q is a
finite set of states, ι ∈ Q is the initial state, and T is a finite set of operations.
Each operation α ∈ T is a function from a set enα ⊆ Q to Q.
Definition 5. The language of P , denoted L(P ), is the set of all infinite words
L(q0 )L(q1 ) · · · ∈ Σ ω , where q0 q1 · · · is the sequence of states generated by an
execution of P .
P ⊗ B = Q × S, {ι} × Δ, T × Σ, δ⊗ , Q × F ,
where
It is well-known that
language of the full product automaton is nonempty, then the language of the
resulting reduced automaton must be nonempty.
To make this precise, fix a finite state program P = T, Q, ι, a set AP of
atomic propositions, an interpretation L : Q → Σ = 2AP , and Büchi automaton
B = S, Δ, Σ, δ, F . Let A = P ⊗ B.
A = Q × S, {ι} × Δ, T × Σ, δ , Q × F (1)
Note 2. The definition in [13] is slightly different. Given an LTL formula φ over
AP, let AP be the set of atomic propositions occurring syntactically in φ. The
definition in [13] says α is invisible in φ if, for all p ∈ AP and q ∈ enα , p ∈
L(q) ⇔ p ∈ L(α(q)). However, there is no loss of generality using Definition 11,
since one can define a new interpretation L : Q → 2AP by L (q) = L(q) ∩ AP .
Then α is invisible for φ if, and only if, α is invisible with respect to L , and the
results of this paper can be applied without modification to P , AP , and L .
2
I am using the numbering from [4]. In [13], C2 and C3 are swapped.
482 S. F. Siegel
3.1 Counterexample
∅ A α ∅ α, ∅
∅ 0 1 {p} α, ∅ A0 A1
β β, ∅
amp 0 1 β, ∅
The property automaton, B1 , is shown in Fig. 1 (center top). It has two states,
numbered 0 and 1. State 1 is the sole accepting state. The language consists of
all infinite words of the following form: a finite nonempty prefix of ∅s followed
by an infinite sequence of {p}s. This language is stutter-invariant, and is the
language of the LTL formula (¬p) ∧ ((¬p)U Gp).
The ample selector is specified by the table (center bottom). Notice that
amp(A, 1) = en(A), but the other three ample sets are full. C0 holds because
the ample sets are never empty. C1 holds because β is independent of α. C2
holds because α is invisible. The reachable product space is shown in Fig. 1
(right). In any DFS of reduced(A, amp), the only back edge is the self-loop on
A0 labeled α, ∅. Since amp(A, 0) is full, C3 holds. Yet there is an accepting
path in the full space, but not in the reduced space.
The alphabet is some unconstrained set Sigma. The set of states is represented
by signature BState. There is a single initial state, and any number of accepting
states. Each transition has a source and destination state, and label. Relations
declared within a signature declaration have that signature as an implicit first
argument. So, for example, src is a binary relation of type BTrans × BState.
Furthermore, the relation is many-to-one: each transition has exactly one BState
atom associated to it by the src relation.
The remaining concepts are incorporated into module por_v0:
1 module por_v0 -- on-the-fly POR variant 0, corresponding to [13]
2 open ba -- import the Büchi automata module
3 sig Operation {} -- program operation
4 sig PState { -- program state
5 label: one Sigma, -- the set of propositions which hold in this state
6 enabled: set Operation, -- the set of all operations enabled at this state
7 nextState: enabled -> one PState, -- the next-state function
8 ample: BState -> set Operation -- ample(q,s)
9 }{ all s: BState | ample[s] in enabled } -- ample sets subsets of enabled
10 fun amp[q: PState, s: BState] : set Operation { q.ample[s] }
11 one sig Pinit extends PState {} -- initial program state
12 fact { -- all program states are reachable from Pinit
13 let r = {q, q’: PState | some op: Operation | q.nextState[op]=q’} |
14 PState = Pinit.*r
15 }
16 sig ProdState { -- state in the product of program and property automaton
17 pstate: PState, -- the program state component
18 bstate: BState, -- the property state component
19 nextFull: set ProdState, -- all next states in the full product space
20 nextReduced: set ProdState -- all next states in the reduced product space
21 }
22 one sig ProdInit extends ProdState {} -- initial product state
23 pred transitionInProduct[q,q’: PState, op: Operation, s,s’: BState] {
24 q->op->q’ in nextState
25 some t : BTrans | t.src = s and t.dest = s’ and t.label = q.label
26 }
27 pred nextProd[x: ProdState, op: Operation, x’: ProdState] {
28 transitionInProduct[x.pstate, x’.pstate, op, x.bstate, x’.bstate]
29 }
30 pred independent[op1, op2 : Operation] {
31 all q: PState | (op1+op2 in q.enabled) implies (
32 op2 in q.nextState[op1].enabled and
33 op1 in q.nextState[op2].enabled and
34 q.nextState[op1].nextState[op2] = q.nextState[op2].nextState[op1])
35 }
36 pred invisible[op: Operation] {
What’s Wrong with On-the-Fly Partial Order Reduction 485
The facts are constraints that any instance must satisfy; some of the facts are
given names for readability. A pred declaration defines a (typed) predicate.
Most aspects of this model are self-explanatory; I will comment only on the
less obvious features. The relations nextFull and nextReduced represent the
next state relations in the full and reduced spaces, respectively. They are declared
in ProdState, but specified completely in the final fact on lines 56–58. Strictly
speaking, one could remove those predicates and substitute their definitions, but
this seemed more convenient. Line 60 asserts that a product state is determined
uniquely by its program and property components. Line 61 specifies the initial
product state.
486 S. F. Siegel
Line 59 insists that only states reachable (in the full space) from the initial
state will be included in an instance (* is the reflexive transitive closure oper-
ator). Lines 62–64 specify the converse. Hence in any instance of this model,
ProdState will consist of exactly the reachable product states in the full space.
The encoding of C1 is based on the following observation: given q ∈ Q and
a set A of operations enabled at q, define r ⊆ Q × Q by removing from the
program’s next-state relation all edges labeled by operations in A. Then “no
operation dependent on an operation in A can occur unless an operation in A
occurs first” is equivalent to the statement that on any path from q using edges
in r, all enabled operations encountered will either be in A or independent of
every operation in A.
Condition C3 is difficult to encode, in that it depends on specifying a depth-
first search. I have replaced it with a weaker condition, which is similar to a
well-known cycle proviso in the offline theory:
C3 In any cycle in reduced(A, amp), there is a transition from q, s to q , s
for which amp(q, s ) = en(q).
Equivalently: if one removes from the reduced product space all such transitions,
then the resulting graph should have no cycles. This is the meaning of lines 50–54
(^ is the strict transitive closure operator).
The next step is to create tests for specific property automata. This example
is for the automaton B1 of Fig. 1:
1 module ba1
2 open ba
3 one sig X0, X1 extends Sigma {}
4 one sig B1 extends BState {}
5 one sig T1, T2, T3 extends BTrans {}
6 fact {
7 AState = B1 -- B1 is the sole accepting state
8 T1.src=Binit && T1.label=X0 && T1.dest=Binit
9 T2.src=Binit && T2.label=X0 && T2.dest=B1
10 T3.src=B1 && T3.label=X1 && T3.dest=B1
11 }
The final step is a test that combines the modules above:
1 open por_v0
2 open ba1
3 checkPORsoundness for exactly 2 Sigma, exactly 2 BState,
4 exactly 3 BTrans, 2 Operation, 2 PState, 4 ProdState
It places upper bounds on the numbers of operations, program states, and prod-
uct states while checking the soundness assertion. Using the Alloy analyzer to
check the assertion above results in a counterexample like the one in Fig. 1. The
runtime is a fraction of a second. The Alloy instance uses two uninterpreted
atoms for the elements of Sigma; I have simply substituted the sets ∅ and {p}
for them to produce Fig. 1. As we have seen, this counterexample happens to
also satisfy the stronger constraint C3.
What’s Wrong with On-the-Fly Partial Order Reduction 487
5 Spin
The POR algorithm used by Spin is described in [10] and is similar to the
combined algorithm. We can see what Spin actually does by encoding examples
in Promela and executing Spin with and without POR.
bit p = 0;
active proctype p0() { p=1 }
active proctype p1() { bit x=0; do :: x=0 od }
never {
B0: do :: !p :: !p -> break od
accept_B1: do :: p od
}
I did this with Spin version 6.4.9, the latest stable release. The output indicates
that 4 states and 5 transitions are explored, and one state is matched—exactly
as in Fig. 1 (right). As expected, the output also reports a violation—a path to
an accepting cycle that corresponds to the transition from A0 to B1 followed by
the self-loop on B1 repeated forever.
Repeat this experiment without the -DNOREDUCE, however, and Spin finds no
errors. The output indicates that it misses the transition from A0 to B1.
Specifically, given an ample selector amp, define reduced2 (A, amp) as in (1) and
(2), except replace “α ∈ amp(q, s )” in (2) with “α ∈ amp(q, s)”. Perform the
same substitution in C3 and call the resulting condition C31 . The weaker version
of C31 is simply:
C31 In any cycle in reduced2 (A, amp) there is a state q, s with amp(q, s) =
en(q).
Conditions C0–C2 are unchanged. I refer to this scheme as V1, and to the
original combined algorithm as V0. The Alloy model of V0 in Sect. 4 can be
easily modified to represent V1.
Using V1, the example of Fig. 1 is no longer a counterexample. In fact, Alloy
reports there are no counterexamples using B1 , at least for small bounds on the
program size. Figure 5 gives detailed results for this and other Alloy experiments.
Unfortunately, Alloy does find a counterexample for a slightly more compli-
cated property automaton, B2 , which is shown in Fig. 3.
∅ A α 0
∅ ∅ β, ∅ β, ∅
β α, ∅ α, ∅
∅ α, ∅
{p} B α 1 2 B1 A1 B2
∅ β, ∅ α, ∅ β, ∅
{p} α, {p}
α, {p}
3 {p} B3
Fig. 3. Counterexample to V1 with B2 (center). A0 and A2 have proper ample set {α}.
The program is the same as the one in Sect. 3.1. Automaton B2 has four
states, with state 3 the sole accepting state. The language is the same as that
of B1 : all infinite words formed by concatenating a finite nonempty prefix of ∅s
and an infinite sequence of {p}s. If the prefix has odd length, the accepting run
begins with the transition 0 → 1, otherwise it begins with the transition 0 → 2.
In the ample selector, only A0 and A2 are not fully enabled:
amp 0 1 2 3
A {α} {α, β} {α} {α, β}
B {α} {α} {α} {α}.
C0–C2 hold for the reasons given in Sect. 3.1. C31 holds for any DFS in which
A2 is pushed onto the stack before A1. In that case, there is no back edge from
A2; there will be a back edge when A1 is pushed, but A1 is fully enabled.
What’s Wrong with On-the-Fly Partial Order Reduction 489
7 What’s Right
In this section, I show that POR scheme V1 of Sect. 6 is sound if one intro-
duces certain assumptions on the property automaton. The following definition
is similar to the notion of stutter invariant (SI) automaton in [6] and to that
of closure under stuttering in [9]. The main differences derive from the use of
Muller automata in [6] and Büchi transition systems in [9], while we are dealing
with ordinary Büchi automata.
Following the approach of [6], one can show that the language of an automa-
ton in SI normal form is stutter-invariant. Moreover, any Büchi automaton with
a stutter-invariant language can be transformed into SI normal form without
changing the language. The conversion satisfies |S | ≤ O(|Σ||S|), where |S| and
|S | are the number of states in the original and new automaton, respectively.
For details and proofs, see [17]. An example is given in Fig. 4; the language of
B3 (or B4 ) consists of all words with a finite number of {p}s.
{p} 0 ∅ ∅ 0 ∅
∅ {p} {p}
{p}
1 ∅ ∅ 1 2 3 ∅
∅ ∅
The remainder of this section is devoted to the proof of Theorem 1. The proof
is similar to the proof of the offline case in [4].
Let θ be an accepting path in the full space A. An infinite sequence of accept-
ing paths π0 , π1 , . . . will be constructed, where π0 = θ. For each i ≥ 0, πi will be
decomposed as ηi ◦ θi , where ηi is a finite path of length i in the reduced space, θi
is an infinite path, ηi is a prefix of ηi+1 , and ◦ denotes concatenation. For i = 0,
η0 is empty and θ0 = θ.
Assume i ≥ 0 and we have defined ηj and θj for j ≤ i. Write
α1 ,σ0 α2 ,σ1
θi = q0 , s0 −−−−−→ q1 , s1 −−−−−→ · · · (3)
where σk = L(qk ) for k ≥ 0. Then ηi+1 and θi+1 are defined as follows. Let
A = amp(q0 , s0 ). There are two cases:
Case 1: α1 ∈ A. Let ηi+1 be the path obtained by appending the first transition
of θi to ηi , and θi+1 the path obtained by removing the first transition from θi .
Case 2a: Some operation in A occurs in θi . Let n be the index of the first
occurrence, so that αn ∈ A, but αj ∈ A for 1 ≤ j < n. By C1, αj and αn
are independent for 1 ≤ j < n. By repeated application of the independence
property, there are paths in P
α1 α2 α3 αn−2 αn−1
q0 q1 q2 ··· qn−2 qn−1
αn αn αn αn αn
α1 α2 α3 αn−2 αn−1 αn+1 αn+2
q1 q2 q3 ···
qn−1 qn qn+1 ···
By C2, αn is invisible, whence L(qj+1 ) = σj for 0 ≤ j ≤ n − 2, and σn−1 = σn .
Hence the admissible sequence
α α α αn−2 αn−1 αn+1 αn+2
q0 →n q1 →1 q2 →2 q3 → · · · → qn−1
→ qn → qn+1 → qn+2 → · · · (4)
generates the word
σ0 σ0 σ1 σ2 · · · σn−2 σn σn+1 σn+2 · · · . (5)
Now the projection of θi onto B has the form
σ0 1 σ2 σ n σn−2
n σ σ σn+1 σn+2
s0 −→ s1 −→ s2 −→ · · · −−−→ sn−1 −−→ sn −−→ sn+1 −−−→ sn+2 −−−→ · · ·
since σn−1 = σn . By Lemma 1, there is a path in B
σ σ σ σ σn−2 σ σn+1 σn+2
0
s0 −→ 0
s1 −→ s1 −→
1 2
s2 −→ n
· · · −−−→ sn−1 −−→ sn −−−→ sn+2 −−−→ · · · (6)
which accepts the word (5). Composing (4) and (6) therefore gives a path through
the product space. Removing the first transition (labeled αn , σ0 ) from this path
yields θi+1 . Appending that transition to ηi yields ηi+1 .
What’s Wrong with On-the-Fly Partial Order Reduction 491
and the projection onto the property component has the form
σ σ σ σ
0
s0 −→ 0
s1 −→ s1 −→
1 2
s2 −→ ··· .
Removing the first transition from this path yields θi+1 . Appending that tran-
sition to ηi yields ηi+1 . This completes the definitions of ηi+1 and θi+1 .
Let η be the limit of the ηi . Clearly η is an infinite path through the reduced
product space, starting from the initial state. We must show that it passes
through an accepting state infinitely often. To do so, we must examine more
closely the sequence of property states through which each θi passes.
Let i ≥ 0, and s0 the final state of ηi . Say θi passes through states s0 s1 s2 · · · .
Then the final state of ηi+1 will be s1 , and the state sequence of θi+1 is deter-
mined by the three cases as follows:
Case 1: s1 s2 · · ·
Case 2a: s1 s1 s2 · · · sn sn+2 · · · (sn+1 ∈ F =⇒ sn ∈ F ) (7)
Case 2b: s1 s1 s2 · · ·
We first claim that for all i ≥ 0, θi passes through an accepting state infinitely
often. This holds for θ0 , which is an accepting path by assumption. Assume it
holds for θi . In each case of (7), we see that the state sequence of θi+1 has a
suffix which is a suffix of the state sequence of θi , so the claim holds for θi+1 .
Alloy proved useful for reasoning about the algorithms and generating small
counterexamples. A summary of the Alloy experiments and results is given in
What’s Wrong with On-the-Fly Partial Order Reduction 493
Fig. 5. These were run on an 8-core 3.7GHz Intel Xeon W-2145 and used the plin-
geling SAT solver [1].3 In addition to the experiments already discussed, Alloy
found no soundness counterexamples for property automata B3 or B4 , using V0
or V1. In the case of B4 , this is what Theorem 1 predicts. For further confir-
mation of Theorem 1, I constructed a general Alloy model of Büchi automata
in SI normal form, represented by B5 in the table. Alloy confirms that both V0
and V1 are sound for all such automata within small bounds on program and
automata size.
It is possible that the use of the normal form, while correct, cancels out the
benefits of POR. A comprehensive exploration of this issue is beyond the scope
of this paper, but I can provide data on one non-trivial example. I encoded
an n-process version of Peterson’s mutual exclusion algorithm in Promela, and
used Spin to verify starvation-freedom for one process in the case n = 5. If p is
the predicate that holds whenever the process is enabled, a trace violates this
property if p holds only a finite number of times in the trace, i.e., if the trace
is in L(B3 ) = L(B4 ). Figure 6 shows the results of Spin verification using B3
without POR, and using B3 and B4 with POR. The results indicate that POR
significantly improves performance on this problem, and that using the normal
form B4 in place of B3 actually improves performance further by a small amount.
bugs. Once Alloy no longer finds any counterexamples, one could then expend
the considerable effort required to construct a formal mechanized proof.
References
1. Biere, A.: CaDiCaL, Lingeling, Plingeling, Treengeling, YalSAT Entering the SAT
Competition 2017. In: Balyo, T., Heule, M., Järvisalo, M. (eds.) Proceedings of SAT
Competition 2017 - Solver and Benchmark Descriptions. Department of Computer
Science Series of Publications B, vol. B-2017-1, pp. 14–15. University of Helsinki
(2017)
2. Brunner, J.: Implementation and verification of partial order reduction for on-the-
fly model checking. Master’s thesis, Technische Universität München, Department
of Computer Science, July 2014. https://www21.in.tum.de/∼brunnerj/documents/
ivporotfmc.pdf
3. Brunner, J., Lammich, P.: Formal verification of an executable LTL model checker
with partial order reduction. J. Autom. Reason. 60, 3–21 (2018). https://doi.org/
10.1007/s10817-017-9418-4
4. Clarke Jr., E.M., Grumberg, O., Peled, D.A.: Model Checking. MIT Press, Cam-
bridge (1999)
5. Courcoubetis, C., Vardi, M., Wolper, P., Yannakakis, M.: Memory-efficient algo-
rithms for the verification of temporal properties. Form. Methods Syst. Des. 1(2),
275–288 (1992). https://doi.org/10.1007/BF00121128
6. Etessami, K.: Stutter-invariant languages, ω-automata, and temporal logic. In:
Halbwachs, N., Peled, D. (eds.) CAV 1999. LNCS, vol. 1633, pp. 236–248. Springer,
Heidelberg (1999). https://doi.org/10.1007/3-540-48683-6 22
7. Holzmann, G., Peled, D., Yannakakis, M.: On nested depth first search. In: The
Spin Verification System, DIMACS - Series in Discrete Mathematics and Theo-
retical Computer Science, vol. 32, pp. 23–31. AMS and DIMACS (1997). https://
bookstore.ams.org/dimacs-32/
8. Holzmann, G.J.: The Spin Model Checker: Primer and Reference Manual. Addison-
Wesley, Boston (2004)
9. Holzmann, G.J., Kupferman, O.: Not checking for closure under stuttering. In:
Grégoire, J.C., Holzmann, G.J., Peled, D.A. (eds.) The SPIN Verification System.
DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol.
32, pp. 17–22. American Mathematical Society (1997)
10. Holzmann, G.J., Peled, D.: An improvement in formal verification. In: Hogrefe, D.,
Leue, S. (eds.) Proceedings of the 7th IFIP WG6.1 International Conference on
Formal Description Techniques (Forte 1994). IFIP Conference Proceedings, vol. 6,
pp. 197–211. Chapman & Hall (1995). http://dl.acm.org/citation.cfm?id=646213.
681369
What’s Wrong with On-the-Fly Partial Order Reduction 495
11. Jackson, D.: Software Abstractions: Logic, Language, and Analysis, Revised edn.
MIT Press (2012)
12. Peled, D.: Combining partial order reductions with on-the-fly model-checking. In:
Dill, D.L. (ed.) CAV 1994. LNCS, vol. 818, pp. 377–390. Springer, Heidelberg
(1994). https://doi.org/10.1007/3-540-58179-0 69
13. Peled, D.: Combining partial order reductions with on-the-fly model-checking.
Form. Methods Syst. Des. 8(1), 39–64 (1996). https://doi.org/10.1007/
BF00121262
14. Schwoon, S., Esparza, J.: A note on on-the-fly verification algorithms. In: Halb-
wachs, N., Zuck, L.D. (eds.) TACAS 2005. LNCS, vol. 3440, pp. 174–190. Springer,
Heidelberg (2005). https://doi.org/10.1007/978-3-540-31980-1 12
15. Siegel, S.F.: Reexamining two results in partial order reduction. Technical report.
UD-CIS-2011/06, U. Delaware (2011). http://vsl.cis.udel.edu/pubs/por tr 2011.
html
16. Siegel, S.F.: Transparent partial order reduction. Form. Methods Syst. Des. 40(1),
1–19 (2012). https://doi.org/10.1007/s10703-011-0126-0
17. Siegel, S.F.: What’s wrong with on-the-fly partial order reduction (extended ver-
sion). Technical report. UD-CIS-2019/05, University of Delaware (2019). http://
vsl.cis.udel.edu/pubs/onthefly.html
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Integrating Formal Schedulability
Analysis into a Verified OS Kernel
1 Introduction
The real-time and OS communities have seen recent effort towards formal proofs,
through several techniques such as model checking [16,22] and interactive the-
orem provers [7,14,17]. This trend is motivated by the high stakes of critical
systems and the combinatorial complexity of considering all possible interleav-
ings of states of a system, which makes pen-and-paper reasoning too error-prone.
Real-time OSes used in critical areas such as avionics and automobile applica-
tions must ensure not only functional correctness but also timing requirements.
Indeed, a missed deadline may have catastrophic consequences. Schedulability
analysis aims to guarantee the absence of deadline miss given a scheduling algo-
rithm which decides which task is going to execute.
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 496–514, 2019.
https://doi.org/10.1007/978-3-030-25543-5_28
Integrating Formal Schedulability Analysis into a Verified OS Kernel 497
In the current state of the art, the schedulability analysis is decoupled from
the kernel code verification. This is good from a separation of concern perspec-
tive as both kernel verification and schedulability analysis are already complex
enough without adding in the other. Nevertheless, this gap also means that both
communities may lack validation from the other one.
On the one hand, schedulability analysis itself is error-prone, e.g., a flaw was
found in the original schedulability analysis [26,27,29] for the Controller Area
Network bus, which is widely used in automobile. To tackle this issue, the Prosa
library [7] provides mechanized schedulability proofs. This library is developed
with a focus on readable specifications in order to ensure wide acceptance by
the community. It is currently a reference for mechanized schedulability proofs
and was able to verify several existing multicore scheduling policies under a new
setting with jitter. However, some of its design decisions, in particular for task
models and scheduling policies, are highly unusual and their adequacy to reality
has never been justified by connecting them to a concrete OS kernel enforcing a
real-time scheduling policy.
On the other hand, OS kernels are very sensitive and bug-prone pieces of code,
which inspires a lot of existing work on using formal methods to prove functional
correctness and other requirements, such as access control policies [17], schedul-
ing policies [31], timing requirements, etc. One such verified OS kernel is RT-
CertiKOS [21], developed by the Yale FLINT group and built on top of the sequen-
tial CertiKOS [9,13]. Its verification focuses on extensions beyond pure functional
correctness, such as real-time guarantees and isolation between components. How-
ever, any major extension such as real-time adds a lot of proof burden.
In this paper, we solve both problems at once by combining the formal schedu-
lability analysis given by Prosa with the functional correctness guarantees of RT-
CertiKOS. Thus, we get a formal schedulability proof for this kernel: if it accepts a
task set, then formal proofs ensure that there will be no deadline miss during exe-
cution. Furthermore, this work also produces a concrete instance of the definitions
used in Prosa, ensuring their consistency and adequacy with a real system.
Outline of the Paper. Section 2 introduces the Prosa library and its descrip-
tion of scheduling. In Sect. 3, we describe RT-CertiKOS, its scheduler, as well as
the associated verification technique, abstraction layers. Section 4 then highlights
the key differences between the models of Prosa and RT-CertiKOS, and how we
resolve them. Finally, Sects. 5, 6, and 7, evaluate our work, present future work
and related work before concluding.
2 Prosa
Prosa [7] is a Coq [25] library of models and analyses for real-time systems.
The library is aimed towards the real-time community and provides models and
analyses found in the literature with a focus on readable specifications.
The library contains four basic layers, which are presented in Fig. 1:
System behavior. The base of the library is a model of discrete time traces
as infinite sequences of events. We consider two such kinds of sequences:
arrival sequences record requests for service called job activations and sched-
ules record which job is able to progress.
System model. In order to reason about system behavior, jobs with simi-
lar properties are grouped into tasks. Based on system behavior, task mod-
els (arrival patterns and cost models) and scheduling policies are defined.
These models are axiomatic in the sense that they are given as predicates on
traces/schedules and not as generating and scheduling functions. In particu-
lar, a “FPP scheduler” (see Sect. 2.2) is modeled as “any trace satisfying the
FPP policy”.
Integrating Formal Schedulability Analysis into a Verified OS Kernel 499
Analysis. The library provides response time and schedulability analyses for
these models.
Implementation. Finally, examples of traces and schedulers are implemented
to validate the specifications axiomatized in the System model layer and to
use the results proven in the Analysis layer. It is this part (more precisely,
the top left dark block of Fig. 1) that is meant to connect with RT-CertiKOS.
Task Model. In order to specify the behavior of the system we are interested
in, Prosa introduces predicates on traces for which the response time analysis
provides guarantees.
We now focus on the definitions related to the sporadic task model and the
fixed priority preemptive (FPP) scheduling policy.
Definition 4 (Sporadic FPP task). A sporadic FPP task τ is defined by a
deadline Dτ ∈ N, a minimal inter-arrival time δτ− ∈ N, a worst case execution
time (WCET) Cτ , and a priority pτ ∈ N. When Dτ is equal to δτ− , the deadline
is said implicit.
FPP Scheduling Policy. The FPP policy is modeled in Prosa as two con-
straints on the schedule: it must be work conserving, that is, it cannot be idle
when there are pending tasks; and it must respect the priorities, that is, a sched-
uled job always has the highest priority among pending jobs.
2.3 Analysis
Prosa contains a proof of Bertogna and Cirinei’s [4] response time analysis for
FPP single-core schedules of sporadic tasks, with exact bounds for implicit dead-
lines. The analysis is based on the following property of the maximum workload
for these schedules.
The maximum workload Wτ (Δ) corresponds to the worst case activation pat-
tern in which all tasks are simultaneously activated with maximum cost (WCET
of their task) and minimal inter-arrival distance. It is an upper bound on the
Integrating Formal Schedulability Analysis into a Verified OS Kernel 501
For instance, the smallest response time bound for a task τ ∈ T askSet can
be computed by the least positive fixed point of the function Wτ . Using this
response time bound, we can derive a schedulability criterion by requiring this
bound to be smaller than or equal to the deadline of task τ .
The Prosa library includes functions to generate periodic traces and the corre-
sponding FPP schedules, together with proofs of these properties and an instan-
tiation of the schedulability criterion for these traces. This implementation was
initially provided as a way to check that the modeling of the arrival model and
scheduling policy are not contradictory and as such the implementation is as
simple as possible. Although this is a good step in order to make the axiomatic
definition of scheduling policies more acceptable, there is still room for improve-
ment: these implementations are still rather ad-hoc and there is no connection
to an actual system. This is where the link with RT-CertiKOS is beneficial to
the Prosa ecosystem: it justifies that the model is indeed suitable for a concrete
and independently developed real-time OS scheduler.
2
There is a multicore version of CertiKOS [14, 15], but RT-CertiKOS is developed on
top of the sequential version.
502 X. Guo et al.
Yield System Call. Tasks do not always use up their budgets. A task can yield
to relinquish any remaining quota, so that lower priority tasks may be scheduled
earlier and more time slots may be dedicated to non real-time tasks.
Integrating Formal Schedulability Analysis into a Verified OS Kernel 503
Based on sequential CertiKOS, RT-CertiKOS [21] follows the idea of deep spec-
ifications3 in which the specification should be rich enough to deduce any prop-
erty of interest: there should never be any need to consider the implementation.
In particular, even though its source code is written in both C and assembly,
the underlay always abstracts the concrete memory states it operates on into
abstract states, and abstracts concrete code into Coq functions that act as exe-
cutable specification. Subsequent layers relying on this underlay will invoke Coq
functions instead of the concrete code, thus hiding implementation details.
In the case of scheduling, there are essentially two functions: the scheduler
and the yield system call. The scheduler relies on two concrete data structures:
a counter tracking the current time (in time slot units) and an array tracking
the current quota for each periodic task. The yield system call simply sets the
remaining quota of the current task to zero. Both functions are verified in RT-
CertiKOS, that is, formals proofs ensure that their C code implementations
indeed simulate the corresponding Coq specifications.
Upgrading an OS kernel into a real-time one is not an easy task. When one
further adds formal proofs about functional correctness, isolation, and timing
requirements, the proof burden becomes enormous. In particular, there is still
room for future work on RT-CertiKOS, e.g., a WCET analysis of its system
calls.
In order to reduce the overall proof burden, it is important to try to del-
egate as much as possible to specialized libraries and tools. Thus, from the
RT-CertiKOS perspective, the benefit of using Prosa is precisely to have state-
of-the-art schedulability analyses already mechanized in Coq, without having to
prove all these results.
Furthermore, the schedulability check of Prosa is only performed once while
verifying the proofs, such that there is no runtime overhead and no loss of per-
formance for RT-CertiKOS.
3
https://deepspec.org/.
504 X. Guo et al.
Table 1. Summary of the range of the various data between RT-CertiKOS and Prosa
Key Elements of the Interface. The task model we consider is the one of
RT-CertiKOS, as it is more restrictive than the ones supported by Prosa. Tasks
are defined by a priority level p, a period Tp and a WCET (more accurately a
budget) Cp . Since we only allow one task per priority level, we identify tasks
and priority levels and we write Cp , Dp , and Tp instead of Cτ , Dτ , and Tτ . In
order for this setting to make sense, we assume the following inequality for each
task p: 0 < Cp ≤ Tp . Notice that this is a particular case of Prosa’s FPP task
model (Definition 4). There is no definition of the jobs of a task as they can be
easily defined from a task and a period number.
The second element Prosa needs is an infinite schedule. RT-CertiKOS cannot
provide such an infinite schedule, as only a finite prefix can be known, up to the
current time. Thus, we keep RT-CertiKOS’s finite schedule as is in the interface
and it is up to Prosa to extend it into an infinite one, suitable for its analysis.
Finally, Prosa needs two properties about the schedule: (a) any task receives
no more service than its WCET in any period; (b) the schedule indeed follows
the FPP policy. We refer to schedules satisfying these properties as valid schedule
prefixes. Proving these properties falls to RT-CertiKOS.
It is also more permissive: more transitions are allowed since it does not perform
the sanity checks about preconditions such as being in kernel mode, host mode,
etc. Nevertheless, we still have a simulation: any step in the full RT-CertiKOS
is also allowed in the simplified version and results in the same scheduling deci-
sion and trace. This simulation is enough for our purposes as we are ultimately
interested in the behavior of the full RT-CertiKOS.
Bridging the Gap Between the Interface and Prosa. The interface pro-
vides Prosa with a task set, service and job cost functions, and a valid schedule
prefix. We first build an arrival sequence from the schedule prefix where the n-th
job (n > 0) for a given task p arrives at time (n − 1) × Tp with the cost given
by the interface. Note that jobs that do not arrive within the prefix cannot have
yielded yet so that their costs is the WCET of their tasks: we assume the worst
case for the future.
The arrival sequence is then defined by adding all jobs of each task p from
T askSet, that is, the arrival sequence at time t contains the (t/Tp + 1)-th job
of p iff t is divisible by Tp .
Next, we need to turn the finite schedule prefix into an infinite one. There are
two possibilities: either build a full schedule from the arrival sequence using the
Prosa implementation of FPP, or start from the schedule prefix of the interface
and extend it into an infinite one. The first technique gives for free the fact that the
infinite schedule satisfies the FPP model from Prosa. The difficulty lies in proving
that the schedule prefix from the interface is indeed a prefix of this infinite schedule.
The second technique starts from the schedule prefix and the difficulty is proving
that it satisfies the FPP model as specified on the Prosa side.
In this paper, we use the first strategy and prove that the prefix of the
schedule built by Prosa is equal to the schedule prefix provided in the interface.
To do so, we use the fact that two FPP schedule prefixes with the same arrival
sequence and job costs (only known at runtime) are the same, provided we take
care to properly remember when jobs yield.
508 X. Guo et al.
Overall, we see the small amount of LoC required to perform this work as a
validation of the adequacy of our method to the considered problem.
Beyond the particular artifact linking RT-CertiKOS with Prosa, what more gen-
eral lessons can we learn from this connection?
First, using the same proof assistant greatly helps. Indeed, beyond the
absence of technical hassle of inter-operability between different formal tools,
it also avoids the pitfall of a formalization mismatch between both formal mod-
els and permits sharing common definitions.
Second, the creation of an explicit interface between both tools clearly marks
the flow of information, stays focused on the essential information, and delimits
the “proof responsibility”: which side is responsible for proving which fact. It
also segregate the proof techniques used on each side so as not to pollute the
other one, either on a technical aspect (vanilla Coq for RT-CertiKOS vs the
SSReflect extension for Prosa) or on the verification methods used (invariant-
based properties for RT-CertiKOS vs trace-based properties for Prosa). This
separation makes it unnecessary to have people be experts in both tools at once:
once the interface was clearly defined, experts on each side could work with only
a rough description of the other one, even though this interface required a few
later changes. In particular, it is interesting to notice that half the authors are
experts in RT-CertiKOS whereas the other half are experts in Prosa.
Third, the common part of the models used by both sides must be amenable
to agreement: in our case, this means having the same notion of time (scheduling
slots, or ticks) and a compatible notion of schedule (finite and infinite).
Finally, we expect the interface we designed to be reusable for other verified
kernels wanting to connect to Prosa or for linking RT-CertiKOS to other formal
schedulability analysis tools.
510 X. Guo et al.
6 Related Work
Schedulability Analysis. Schedulability analysis as a key theory in the real-
time community has been widely studied in the past decades. Liu and Layland’s
seminal work [20] presents a schedulability analysis technique for a simple system
model described as a set of assumptions. Many later work [3,5,11,23,28] aim
to capture more realistic 5 and complex system models by generalizing those
assumptions.
In order to provide formal guarantees to those results, several formal
approaches have been used for the formalism of schedulability analyses, such as
model checking [8,12,16], temporal logic [32,33], and theorem proving [10,30].
As far as we know, none of the above work has been applied to a formally
verified OS kernel.
7 Conclusion
Formal verification aims at providing stronger guarantees than testing. Real-
time systems are a good target because they are often part of critical systems.
Both the scheduling and OS communities have developed their own formally
verified tools but there is a lack of integration between them. In this paper,
we make a first step toward bridging this gap by integrating a formally proven
schedulability analysis tool, Prosa, with a verified sequential real-time OS kernel,
RT-CertiKOS. This gives two benefits: first, it provides RT-CertiKOS with a
modular, extensible, state-of-the-art formal schedulability analysis proof; second,
it gives a concrete instance of one of the scheduling theories described in Prosa,
thus ensuring that its model is consistent and applicable to actual systems.
We believe this connection can be easily adapted for other verified kernels or
schedulability analyzers.
It also showcases that it is possible and practical to connect two completely
independent medium- to large-scale formal proof developments.
References
1. Andronick, J., Lewis, C., Matichuk, D., Morgan, C., Rizkallah, C.: Proof of
OS scheduling behavior in the presence of interrupt-induced concurrency. In:
Blanchette, J.C., Merz, S. (eds.) ITP 2016. LNCS, vol. 9807, pp. 52–68. Springer,
Cham (2016). https://doi.org/10.1007/978-3-319-43144-4 4
2. Andronick, J., Lewis, C., Morgan, C.: Controlled Owicki-Gries concurrency: rea-
soning about the preemptible eChronos embedded operating system. In: Proceed-
ings Workshop on Models for Formal Analysis of Real Systems, MARS, pp. 10–24
(2015). https://doi.org/10.4204/EPTCS.196.2
3. Baruah, S.: Techniques for multiprocessor global schedulability analysis. In: Pro-
ceedings - 28th IEEE International Real-Time Systems Symposium (RTSS), pp.
119–128, December 2007. https://doi.org/10.1109/RTSS.2007.35
4. Bertogna, M., Cirinei, M.: Response-time analysis for globally scheduled symmetric
multiprocessor platforms. In: 28th IEEE International Real-Time Systems Sympo-
sium (RTSS), pp. 149–160, December 2007. https://doi.org/10.1109/RTSS.2007.
31
5. Bini, E., Buttazzo, G.C.: Schedulability analysis of periodic fixed priority systems.
IEEE Trans. Comput. 53(11), 1462–1473 (2004)
6. Blackham, B., Shi, Y., Chattopadhyay, S., Roychoudhury, A., Heiser, G.: Timing
analysis of a protected operating system kernel. In: 2011 IEEE 32nd Real-Time
Systems Symposium (RTSS), pp. 339–348, November 2011. https://doi.org/10.
1109/RTSS.2011.38
7. Cerqueira, F., Stutz, F., Brandenburg, B.B.: PROSA: a case for readable mecha-
nized schedulability analysis. In: 28th Euromicro Conference on Real-Time Systems
(ECRTS), pp. 273–284 (2016). https://doi.org/10.1109/ECRTS.2016.28
8. Cordovilla, M., Boniol, F., Noulard, E., Pagetti, C.: Multiprocessor schedulability
analyser. In: Proceedings of the 2011 ACM Symposium on Applied Computing,
SAC 2011, pp. 735–741 (2011). http://doi.acm.org/10.1145/1982185.1982345
9. Costanzo, D., Shao, Z., Gu, R.: End-to-end verification of information-flow secu-
rity for C and assembly programs. In: Proceedings of the 37th ACM SIGPLAN
Conference on Programming Language Design and Implementation (PLDI), pp.
648–664 (2016). http://doi.acm.org/10.1145/2908080.2908100
10. Dutertre, B.: The priority ceiling protocol: formalization and analysis using PVS.
In: Proceedings of the 21st IEEE Conference on Real-Time Systems Symposium
(RTSS), pp. 151–160 (1999)
11. Feld, T., Biondi, A., Davis, R.I., Buttazzo, G.C., Slomka, F.: A survey of schedu-
lability analysis techniques for rate-dependent tasks. J. Syst. Softw. 138, 100–107
(2018). https://doi.org/10.1016/j.jss.2017.12.033
12. Fersman, E., Mokrushin, L., Pettersson, P., Yi, W.: Schedulability analysis of
fixed-priority systems using timed automata. Theor. Comput. Sci. 354(2), 301–
317 (2006)
13. Gu, R., et al.: Deep specifications and certified abstraction layers. In: Proceedings
of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Pro-
gramming Languages (POPL), pp. 595–608 (2015). http://doi.acm.org/10.1145/
2676726.2676975
Integrating Formal Schedulability Analysis into a Verified OS Kernel 513
14. Gu, R., et al.: CertiKOS: an extensible architecture for building certified concur-
rent OS kernels. In: 12th USENIX Symposium on Operating Systems Design and
Implementation (OSDI), pp. 653–669. USENIX Association (2016). https://www.
usenix.org/conference/osdi16/technical-sessions/presentation/gu
15. Gu, R., et al.: Certified concurrent abstraction layers. In: Proceedings of the 39th
ACM SIGPLAN Conference on Programming Language Design and Implementa-
tion (PLDI), pp. 646–661 (2018). http://doi.acm.org/10.1145/3192366.3192381
16. Guan, N., Gu, Z., Deng, Q., Gao, S., Yu, G.: Exact schedulability analysis for static-
priority global multiprocessor scheduling using model-checking. In: IFIP Interna-
tional Workshop on Software Technolgies for Embedded and Ubiquitous Systems,
pp. 263–272 (2007)
17. Klein, G., et al.: seL4: formal verification of an OS kernel. In: Proceedings of the
ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP), pp.
207–220 (2009). https://doi.org/10.1145/1629575.1629596
18. Klein, G., Huuck, R., Schlich, B.: Operating system verification. J. Autom. Rea-
soning 42(2–4), 123–124 (2009). https://doi.org/10.1007/s10817-009-9126-9
19. Labrosse, J.J.: Microc/OS-II, 2nd edn. R&D Books, Gilroy (1998)
20. Liu, C.L., Layland, J.W.: Scheduling algorithms for multiprogramming in a hard-
real-time environment. J. ACM (JACM) 20(1), 46–61 (1973)
21. Liu, M., et al.: Compositional verification of preemptive OS kernels with tempo-
ral and spatial isolation. Technical report, YALEU/DCS/TR-1549. Department of
Computer Science, Yale University (2019)
22. Nelson, L., et al.: Hyperkernel: push-button verification of an OS kernel. In: Pro-
ceedings of the 26th Symposium on Operating Systems Principles (SOSP), Shang-
hai, China, 28–31 October 2017, pp. 252–269 (2017). https://doi.org/10.1145/
3132747.3132748
23. Palencia, J.C., Harbour, M.G.: Schedulability analysis for tasks with static and
dynamic offsets. In: Proceedings 19th IEEE Real-Time Systems Symposium
(RTSS), pp. 26–37. IEEE (1998)
24. Sewell, T., Kam, F., Heiser, G.: High-assurance timing analysis for a high-assurance
real-time operating system. Real-Time Syst. 53(5), 812–853 (2017). https://doi.
org/10.1007/s11241-017-9286-3
25. The Coq Development Team: The Coq Proof Assistant Reference Manual. INRIA,
8.4pl4 edn. (2014). https://coq.inria.fr/distrib/8.4pl4/files/Reference-Manual.pdf
26. Tindell, K., Burns, A.: Guaranteeing message latencies on controller area network
(CAN). In: Proceedings of 1st International CAN Conference, pp. 1–11 (1994)
27. Tindell, K., Burns, A., Wellings, A.: Calculating controller area network (CAN)
message response times. Control Eng. Pract. 3(8), 1163–1169 (1995)
28. Tindell, K., Clark, J.: Holistic schedulability analysis for distributed hard real-time
systems. Microprocessing Microprogramming 40(2–3), 117–134 (1994)
29. Tindell, K., Hanssmon, H., Wellings, A.J.: Analysing real-time communications:
controller area network (CAN). In: Proceedings of the 15th IEEE Real-Time Sys-
tems Symposium (RTSS), San Juan, Puerto Rico, 7–9 December 1994, pp. 259–263
(1994). https://doi.org/10.1109/REAL.1994.342710
30. Wilding, M.: A machine-checked proof of the optimality of a real-time scheduling
policy. In: Proceedings of the 10th International Conference on Computer Aided
Verification (CAV), pp. 369–378 (1998)
514 X. Guo et al.
31. Xu, F., Fu, M., Feng, X., Zhang, X., Zhang, H., Li, Z.: A practical verification
framework for preemptive OS kernels. In: Chaudhuri, S., Farzan, A. (eds.) CAV
2016. LNCS, vol. 9780, pp. 59–79. Springer, Cham (2016). https://doi.org/10.1007/
978-3-319-41540-6 4
32. Xu, Q., Zhan, N.: Formalising scheduling theories in duration calculus. Nord. J.
Comput. 14(3), 173–201 (2008)
33. Yuhua, Z., Chaochen, Z.: A formal proof of the deadline driven scheduler. In:
International Symposium on Formal Techniques in Real-Time and Fault-Tolerant
Systems, pp. 756–775 (1994)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Rely-Guarantee Reasoning About
Concurrent Memory Management
in Zephyr RTOS
1 Introduction
The operating system (OS) is a fundamental component of critical systems.
Thus, correctness and reliability of systems highly depend on the system’s under-
lying OS. As a key functionality of OSs, the memory management provides ways
to dynamically allocate portions of memory to programs at their request, and to
free them for reuse when no longer needed. Since program variables and data are
stored in the allocated memory, an incorrect specification and implementation
of the memory management may lead to system crashes or exploitable attacks
on the whole system. RTOS are frequently deployed on critical systems, mak-
ing formal verification of RTOS necessary to ensure their reliability. One of the
state of the art RTOS is Zephyr RTOS [1], a Linux Foundation project. Zephyr
is an open source RTOS for connected, resource-constrained devices, and built
This work has been supported in part by the National Natural Science Foundation of
China (NSFC) under the Grant No.61872016, and the National Satellite of Excellence in
Trustworthy Software Systems and the Award No. NRF2014NCR-NCR001-30, funded
by NRF Singapore under National Cyber-security R&D (NCR) programme.
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 515–533, 2019.
https://doi.org/10.1007/978-3-030-25543-5_29
516 Y. Zhao and D. Sanán
with security and safety design in mind. Zephyr uses a buddy memory allocation
algorithm optimized for RTOS, and that allows multiple threads to concurrently
manipulate shared memory pools with fine-grained locking.
Formal verification of the concurrent memory management in Zephyr is a
challenging work. (1) To achieve high performance, data structures and algo-
rithms in Zephyr are laid out in a complex manner. The buddy memory alloca-
tion can split large blocks into smaller ones, allowing blocks of different sizes to
be allocated and released efficiently while limiting memory fragmentation con-
cerns. Seeking performance, Zephyr uses a multi-level structure where each level
has a bitmap and a linked list of free memory blocks. The levels of bitmaps
actually form a forest of quad trees of bits. Memory addresses are used as a
reference to memory blocks, so the algorithm has to deal with address alignment
and computation concerning the block size at each level, increasing the com-
plexity of its verification. (2) A complex algorithm and data structures imply as
well complex invariants that the formal model must preserve. These invariants
have to guarantee the well-shaped bitmaps and their consistency to free lists. To
prevent memory leaks and block overlapping, a precise reasoning shall keep track
of both numerical and shape properties. (3) Thread preemption and fine-grained
locking make the kernel execution of memory services to be concurrent.
In this paper, we apply the rely-guarantee reasoning technique to the con-
current buddy memory management in Zephyr. This work uses π-Core, a rely-
guarantee framework for the specification and verification of concurrent reactive
systems. π-Core introduces a concurrent imperative system specification lan-
guage driven by “events” that supports reactive semantics of interrupt handlers
(e.g. kernel services, scheduler) in OSs, and thus makes the formal specification of
Zephyr simpler. The language embeds Isabelle/HOL data types and functions,
therefore it is as rich as the own Isabelle/HOL. π-Core concurrent constructs
allow the specification of Zephyr multi-thread interleaving, fine-grained locking,
and thread preemption. Compositionality of rely-guarantee makes feasible to
prove the functional correctness of Zephyr and invariants over its data struc-
tures. The formal specification and proofs are developed in Isabelle/HOL. They
are available at https://lvpgroup.github.io/picore/.
We first analyze the structural properties of memory pools in Zephyr (Sect. 3).
The properties clarify the constraints and consistency of quad trees, free block
lists, memory pool configuration, and waiting threads. All of them are defined as
invariants for which its preservation under the execution of services is formally
verified. From the well-shaped properties of quad trees, we can derive a critical
property to prevent memory leaks, i.e., memory blocks cover the whole memory
address of the pool, but not overlap each other.
Together with the formal verification of Zephyr, we aim at the highest evalu-
ation assurance level (EAL 7) of Common Criteria (CC) [2], which was declared
this year as the candidate standard for security certification by the Zephyr
project. Therefore, we develop a fine-grained low level formal specification of
a buddy memory management (Sect. 4). The specification has a line-to-line cor-
respondence with the Zephyr C code, and thus is able to do the code-to-spec
Rely-Guarantee Reasoning About Concurrent Memory Management 517
review required by the EAL 7 evaluation, covering all the data structures and
imperative statements present in the implementation.
We enforce the formal verification of functional correctness and invariant
preservation by using a rely-guarantee proof system (Sect. 5), which supports
total correctness for loops where fairness does not need to be considered. The
formal verification revealed three bugs in the C code: an incorrect block split, an
incorrect return from the kernel services, and non-termination of a loop (Sect. 6).
Two of them are critical and have been repaired in the latest release of Zephyr.
The third bug causes nontermination of the allocation service when trying to
allocate a block of a larger size than the maximum allowed.
Related Work. (1) Memory models [17] provide the necessary abstraction to
separate the behaviour of a program from the behaviour of the memory it reads
and writes. There are many formalizations of memory models in the literature,
e.g., [10,14,15,19,21], where some of them only create an abstract specification
of the services for memory allocation and release [10,15,21]. (2) Formal verifi-
cation of OS memory management has been studied in CertiKOS [11,20], seL4
[12,13], Verisoft [3], and in the hypervisors from [4,5], where only the works
in [4,11] consider concurrency. Comparing to buddy memory allocation, the
data structures and algorithms verified in [11] are relatively simpler, without
block split/coalescence and multiple levels of free lists and bitmaps. [4] only
considers virtual mapping but not allocation or deallocation of memory areas.
(3) Algorithms and implementations of dynamic memory allocation have been
formally specified and verified in an extensive number of works [7–9,16,18,23].
However, the buddy memory allocation is only studied in [9], which does not
consider concrete data structures (e.g. bitmaps) and concurrency. To the best of
our knowledge, this paper presents the first formal specification and mechanized
proof for a concurrent buddy memory allocation of a realistic operating system.
The pool is initially configured with the parameters n max and max sz,
together with a third parameter min sz. min sz defines the minimum size for
an allocated block and must be at least 4 × X (X > 0) bytes long. Memory pool
blocks are recursively split into quarters until blocks of the minimum size are
obtained, at which point no further split can occur. The depth at which min sz
blocks are allocated is n levels and satisfies that n max = min sz × 4n levels .
Every memory block is composed of a level; a block index within the level,
ranging from 0 to (n max × 4level ) − 1; and the data representing the block
start address, which is equal to buf + (max sz/4level ) × block. We use a tuple
(level, block) to uniquely represent a block within a pool p.
A memory pool keeps track of how its buffer space has been split using
a linked list free list with the start address of the free blocks in each level.
To improve the performance of coalescing partner blocks, memory pools main-
tain a bitmap at each level to indicate the allocation status of each block in
the level. This structure is represented by a C union of an integer bits and an
array bits p. The implementation can allocate the bitmaps at levels smaller than
max inlinle levels using only an integer bits. However, the number of blocks
in levels higher than max inlinle levels make necessary to allocate the bitmap
information using the array bits map. In such a design, the levels of bitmaps
actually form a forest of complete quad trees. The bit i in the bitmap of level j
is set to 1 for the block (i, j) iff it is a free block, i.e. it is in the free list at level
i. Otherwise the bitmap for such block is set to 0.
Zephyr provides two kernel services k mem pool alloc and k mem pool free,
for memory allocation and release respectively. The main part of the C code of
k mem pool alloc is shown in Fig. 1. When an application requests for a memory
block, Zephyr first computes alloc l and f ree l. alloc l is the level with the size of
the smallest block that will satisfy the request, and f ree l, with f ree l alloc l,
is the lowest level where there are free memory blocks. Since the services are
concurrent, when the service tries to allocate a free block blk from level f ree l
(Line 8), blocks at that level may be allocated or merged into a bigger block
by other concurrent threads. In such case the service will back out (Line 9) and
tell the main function k mem pool alloc to retry. If blk is successfully locked for
allocation, then it is broken down to level alloc l (Lines 11–14). The allocation
service k mem pool alloc supports a timeout parameter to allow threads waiting
for that pool for a period of time when the call does not succeed. If the allocation
Rely-Guarantee Reasoning About Concurrent Memory Management 519
fails (Line 24) and the timeout is not K NO WAIT, the thread is suspended
(Line 30) in a linked list wait q and the context is switched to another thread
(Line 31).
Interruptions are always enabled in both services with the exception of
the code for the functions alloc block and break block, which invoke irq lock
and irq unlock to respectively enable and disable interruptions. Similar to
k mem pool alloc, the execution of k mem pool free is interruptable too.
the implementation and comprising the same data. There are two exceptions to
this: (1) k mem block id and k mem block are merged in one single record, (2)
the union in the struct k mem pool lvl is replaced by a single list representing
the bitmap, and thus max inline level is removed.
buf
buf+max_sz
*levels free_list max_sz bytes max_sz bytes
0 0 n_max - 1
1
2 memory Legend
8 9 11 0 1 2 3
3 block
41 42 DIVIDED
0 7 8 9 10 11 ALLOCATED
0i
FREE
0 39 40 41 42 43 ALLOCATING
FREEING
n_levels - 1
NOEXIST
0 159 160 161 162 163 172 173 174 175
0 1 2 700 703
Theorem 1 (Memory Partition). For any kernel state s, If the memory pools
in s are consistent in their configuration, and their bitmaps are well-shaped, the
memory pools satisfy the partition property in s:
inv mempool info(s) ∧ inv bitmap(s) ∧ inv bitmap0(s) ∧ inv bitmapn(s) =⇒ mem part(s)
522 Y. Zhao and D. Sanán
Together with the memory partition property, pools must also satisfy the
following:
No Partner Fragmentation. The memory release algorithm in Zephyr coa-
lesces free partner memory blocks into blocks as large as possible for all the
descendants from the root level, without including it. Thus, a memory pool does
not contain four FREE partner bits.
Validity of Free Block Lists. The free list at one level keeps the start-
ing address of free memory blocks. The memory management ensures that the
addresses in the list are valid, i.e., they are different from each other and aligned
to the block size, which at a level i is given by (max sz/4i ). Moreover, a memory
block is in the free list iff the corresponding bit of the bitmap is FREE.
Non-overlapping of Memory Pools. The memory spaces of the set of pools
defined in a system must be disjoint, so the memory addresses of a pool does
not belong to the memory space of any other pool.
Other Properties. The state of a suspended thread in wait q has to be consis-
tent with the threads waiting for a memory pool. Threads can only be blocked
once, and those threads waiting for available memory blocks have to be in a
BLOCKED state. During allocation and free of a memory block, blocks of the
tree may temporally be manipulated during the coalesce and division process.
A block can be only manipulated by a thread at a time, and the state bit of a
block being temporally manipulate has to be FREEING or ALLOCATING.
is atomic since kernel services can not interrupt it. But kernel services can be
interrupted via the scheduler, i.e., the execution of a memory service invoked by
a thread ti may be interrupted by the kernel scheduler to execute a thread tj .
Figure 3 illustrates Zephyr execution model, where solid lines represent execution
steps of the threads/kernel services and dotted lines mean the suspension of the
thread/code. For instance, the execution of k mempool free in thread t1 is inter-
rupted by the scheduler, and the context is switched to thread t2 which invokes
k mempool alloc. During the execution of t2 , the kernel service may suspend the
thread and switch to another thread tn by calling rescheduling. Later, the exe-
cution is switched back to t1 and continues the execution of k mempool free in
a different state from when it was interrupted.
The event systems of Zephyr are illustrated in the right part of Fig. 3. A
user thread ti invoke allocation/release services, thus the event system for ti is
esysti , a set composed of the events alloc and free. The input parameters for these
events correspond with the arguments of the service implementation, that are
constrained by the guard for each service. Together with system users we model
the event service for the scheduler esyssched consisting on a unique event sched
whose argument is a thread t to be scheduled when t is in the READY state. The
formal specification of the memory management is the parallel composition of the
event system for the threads and the scheduler esyst1 ... esystn esyssched
Thread Context and Preemption. Events are parametrized by a thread iden-
tifier used to access to the execution context of the thread invoking it. As shown
in Fig. 3, the execution of an event executed by a thread can be stopped by
the scheduler to be resumed later. This behaviour is modelled using a global
variable cur that indicates the thread being currently has been scheduled and
is being executed, and conditioning the execution of parametrized events in t
only when t is scheduled. This is achieved by using the expression t p ≡
AWAIT cur = t THEN p END, so an event invoked by a thread t only pro-
gresses when t is scheduled. This scheme allows to use rely-guarantee for concur-
rent execution of threads on mono-core architectures, where only the scheduled
thread is able to modify the memory.
the events we represent access to a state component c using ´c and the value
of a local component c for the thread t is represented as ´c t. Local variables
allocating node and freeing node are relevant for the memory services, storing
the temporal blocks being split/coalesced in alloc/release services respectively.
Memory Pool Initialization. Zephyr defines and initializes memory pools
at compile time by constructing a static variable of type struct k mem pool.
The implementation initializes each pool with n max level 0 blocks with size
max sz bytes. Bitmaps of level 0 are set to 1 and free list contains all level 0
blocks. Bitmaps and free lists of other level are initialized to 0 and to the empty
list respectively. In the formal model, we specify a state corresponding to the
implementation initial state and we show that it belongs to the set of states
satisfying the invariant.
P sat pre ∩ b ∩ {V }, Id, U N IV, {s | (V, s) ∈ G} ∩ pst body(α) sat pre ∩ guard(α), R, G, pst
stable(pre, R) stable(pst, R) stable(pre, R) ∀s. (s, s) ∈ G
( b P ) sat pre, R, G, pst α sat pre, R, G, pst
P sat loopinv ∩ b, R, G, loopinv (1)∀κ. PS(κ) sat presκ , Rsκ , Gsκ , pstsκ
loopinv ∩ −b ⊆ pst ∀s. (s, s) ∈ G (2)∀κ. pre ⊆ presκ (3)∀κ. pstsκ ⊆ pst (4)∀κ. Gsκ ⊆ G
stable(loopinv, R) stable(pst, R) (5)∀κ. R ⊆ Rsκ (6)∀κ, κ . κ = κ −→ Gsκ ⊆ Rsκ
( b P ) sat loopinv, R, G, pst PS sat pre, R, G, pst
specification presκ , Rsκ , Gsκ , pstsκ (Premise 1); the pre-condition for the par-
allel composition implies all the event system’s pre-conditions (Premise 2); the
overall post-condition must be a logical consequence of all post-conditions of
event systems (Premise 3); since an action transition of the concurrent system
is performed by one of its event system, the guarantee condition Gsκ of each
event system must be a subset of the overall guarantee condition G (Premise 4);
an environment transition Rsκ for the event system κ corresponds to a transi-
tion from the overall environment R (Premise 5); and an action transition of an
event system κ should be defined in the rely condition of another event system
κ , where κ = κ (Premise 6).
To prove loop termination, loop invariants are parametrized with a logical
variable α. It suffices to show total correctness of a loop statement by the fol-
lowing proposition where loopinv(α) is the parametrize invariant, in which the
logical variable is used to find a convergent relation to show that the number of
iterations of the loop is finite.
This relation states that alloc and free services may not change the state
(1), e.g., a blocked await or selecting branch on a conditional statement. If it
changes the state then: (2) the static configuration of memory pools in the
model do not change; (3.1) if the scheduled thread is not the thread invoking
the event then variables for that thread do not change (since it is blocked in
an Await as explained in Sect. 3); (3.2) if it is, then the relation preserves the
memory invariant, and consequently each step of the event needs to preserve the
invariant; (4) a thread does not change the local variables of other threads.
Using the π-Core proof rules we verify that the invariant introduced in Sect. 4
is preserved by all the events. Additionally, we prove that when starting in a valid
memory configuration given by the invariant, then if the service does not returns
Rely-Guarantee Reasoning About Concurrent Memory Management 529
an error code then it returns a valid memory block with size bigger or equal than
the requested capacity. The property is specified by the following postcondition:
Mem-pool-alloc-pre t ≡ {s. inv s ∧ allocating-node s t = None ∧ freeing-node s t = None}
Mem-pool-alloc-post t p sz timeout ≡
{s. inv s ∧ allocating-node s t = None ∧ freeing-node s t = None
∧ (timeout = FOREVER −→
(ret s t = ESIZEERR ∧ mempoolalloc-ret s t = None ∨
ret s t = OK ∧ (∃ mblk. mempoolalloc-ret s t = Some mblk ∧ mblk-valid s p sz
mblk)))
∧ (timeout = NOWAIT −→
((ret s t = ENOMEM ∨ ret s t = ESIZEERR) ∧ mempoolalloc-ret s t = None) ∨
(ret s t = OK ∧ (∃ mblk. mempoolalloc-ret s t = Some mblk ∧ mblk-valid s p sz
mblk)))
∧ (timeout > 0 −→
((ret s t = ETIMEOUT ∨ ret s t = ESIZEERR) ∧ mempoolalloc-ret s t = None) ∨
(ret s t = OK ∧ (∃ mblk. mempoolalloc-ret s t = Some mblk
∧ mblk-valid s p sz mblk)))}
If a thread requests a memory block in mode FOREVER, it may successfully
allocate a valid memory block, or fail (ESIZEERR) if the request size is larger
than the size of the memory pool. If the thread is requesting a memory pool in
mode NOWAIT, it may also get the result of ENOMEM if there is no available
blocks. But if the thread is requesting in mode TIMEOUT, it will get the result
of ETIMEOUT if there is no available blocks in timeout milliseconds.
The property is indeed weak since even if the memory has a block able to
allocate the requested size before invoking the allocation service, another thread
running concurrently may have taken the block first during the execution of
the service. For the same reason, the released block may be taken by another
concurrent thread before the end of the release services.
Bugs in Zephyr. During the formal verification, we found 3 bugs in the C code
of Zephyr. The first two bugs are critical and have been repaired in the latest
release of Zephyr. To avoid the third one, callers to k mem pool alloc have to
constrain the argument t size size.
Rely-Guarantee Reasoning About Concurrent Memory Management 531
(1) Incorrect block split: this bug is located in the loop in Line 11 of the
k mem pool alloc service, shown in Fig. 1. The level empty function checks if a
pool p has blocks in the free list at level alloc l. Concurrent threads may release
a memory block at that level making the call to level empty(p, alloc l) to return
false and stopping the loop. In such case, it allocates a memory block of a bigger
capacity at a level i but it still sets the level number of the block as alloc l at
Line 15. The service allocates a larger block to the requesting thread causing an
internal fragmentation of max sz/4i − max sz/4alloc l bytes. When this block
is released, it will be inserted into the free list at level alloc l, but not at level
i, causing an external fragmentation of max sz/4i − max sz/4alloc l . The bug is
fixed by removing the condition level empty(p, alloc l) in our specification.
(2) Incorrect return from k mem pool alloc: this bug is found at Line
26 in Fig. 1. When a suitable free block is allocated by another thread, the
pool alloc function returns EAGAIN at Line 9 to ask the thread to retry the
allocation. When a thread invokes k mem pool alloc in FOREVER mode and
this case happens, the service returns EAGAIN immediately. However, a thread
invoking k mem pool alloc in FOREVER mode should keep retrying when it does
not succeed. We repair the bug by removing the condition ret == EAGAIN
at Line 26. As explained in the comments of the C Code, EAGAIN should not
be returned to threads invoking the service. Moreover, the return EAGAIN at
Line 34 is actually the case of time out. Thus, we introduce a new return code
ETIMEOUT in our specification.
(3) Non-termination of k mem pool alloc: we have discussed that the
loop statement at Lines 23–33 in Fig. 1 does not terminate. However, it should
terminate in certain cases, which are actually violated in the C code. When a
thread requests a memory block in FOREVER mode and the requested size
is larger than max sz, the maximum size of blocks, the loop at Lines 23–33 in
Fig. 1 never finishes since pool alloc always returns ENOMEM. The reason is that
the “return ENOMEM ” at Line 6 does not distinguish two cases, alloc l < 0
and f ree l < 0. In the first case, the requested size is larger than max sz and
the kernel service should return immediately. In the second case, there are no
free blocks larger than the requested size and the service tries forever until
some free block available. We repair the bug by splitting the if statement at
Lines 4–7 into these two cases and introducing a new return code ESIZEERR
in our specification. Then, we change the condition at Lines 25–26 to check that
the returned value is ESIZEERR instead of ENOMEM.
Our work explores the challenges and cost of certifying concurrent OSs for the
highest-level assurance. The definition of properties and rely-guarantee relations
is complex and the verification task becomes expensive. We used 40 times of
LOS/LOP than the C code at low-level design. Next, we are planning to verify
other modules of Zephyr, which may be easier due to simpler data structures
and algorithms. For the purpose of fully formal verification of OSs at source code
level, we will replace the imperative language in π-Core by a more expressive
one and add a verification condition generator (VCG) to reduce the cost of the
verification.
References
1. The Zephyr Project. https://www.zephyrproject.org/. Accessed Dec 2018
2. Common Criteria for Information Technology Security Evaluation (v3.1, Release
5). https://www.commoncriteriaportal.org/. Accessed Apr 2017
3. Alkassar, E., Schirmer, N., Starostin, A.: Formal pervasive verification of a paging
mechanism. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol.
4963, pp. 109–123. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-
540-78800-3 9
4. Blanchard, A., Kosmatov, N., Lemerre, M., Loulergue, F.: A case study on formal
verification of the anaxagoros hypervisor paging system with Frama-C. In: Núñez,
M., Güdemann, M. (eds.) FMICS 2015. LNCS, vol. 9128, pp. 15–30. Springer,
Cham (2015). https://doi.org/10.1007/978-3-319-19458-5 2
5. Bolignano, P., Jensen, T., Siles, V.: Modeling and abstraction of memory man-
agement in a hypervisor. In: Stevens, P., Wasowski,
A. (eds.) FASE 2016. LNCS,
vol. 9633, pp. 214–230. Springer, Heidelberg (2016). https://doi.org/10.1007/978-
3-662-49665-7 13
6. Chen, H., Wu, X., Shao, Z., Lockerman, J., Gu, R.: Toward compositional ver-
ification of interruptible OS kernels and device drivers. In: Proceedings of 37th
ACM SIGPLAN Conference on Programming Language Design and Implementa-
tion (PLDI), pp. 431–447. ACM (2016)
7. Fang, B., Sighireanu, M.: Hierarchical shape abstraction for analysis of free list
memory allocators. In: Hermenegildo, M.V., Lopez-Garcia, P. (eds.) LOPSTR
2016. LNCS, vol. 10184, pp. 151–167. Springer, Cham (2017). https://doi.org/
10.1007/978-3-319-63139-4 9
8. Fang, B., Sighireanu, M.: A refinement hierarchy for free list memory allocators.
In: Proceedings of ACM SIGPLAN International Symposium on Memory Manage-
ment, pp. 104–114. ACM (2017)
9. Fang, B., et al.: Formal modelling of list based dynamic memory allocators. Sci.
China Inf. Sci. 61(12), 103–122 (2018)
10. Gallardo, M.D.M., Merino, P., Sanán, D.: Model checking dynamic memory allo-
cation in operating systems. J. Autom. Reasoning 42(2), 229–264 (2009)
11. Gu, R., et al.: CertiKOS: an extensible architecture for building certified concurrent
OS kernels. In: Proceedings of 12th USENIX Symposium on Operating Systems
Design and Implementation (OSDI), pp. 653–669. USENIX Association, Savannah,
GA (2016)
12. Klein, G., et al.: seL4: formal verification of an OS kernel. In: Proceedings of 22nd
ACM SIGOPS Symposium on Operating Systems Principles (SOSP), pp. 207–220.
ACM Press (2009)
Rely-Guarantee Reasoning About Concurrent Memory Management 533
13. Klein, G., Tuch, H.: Towards verified virtual memory in L4. In: Proceedings of
TPHOLs Emerging Trends, p. 16. Park City, Utah, USA, September 2004
14. Leroy, X., Blazy, S.: Formal verification of a C-like memory model and its uses for
verifying program transformations. J. Autom. Reasoning 41(1), 1–31 (2008)
15. Mansky, W., Garbuzov, D., Zdancewic, S.: An axiomatic specification for sequen-
tial memory models. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS,
vol. 9207, pp. 413–428. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-
21668-3 24
16. Marti, N., Affeldt, R., Yonezawa, A.: Formal verification of the heap manager of
an operating system using separation logic. In: Liu, Z., He, J. (eds.) ICFEM 2006.
LNCS, vol. 4260, pp. 400–419. Springer, Heidelberg (2006). https://doi.org/10.
1007/11901433 22
17. Saraswat, V.A., Jagadeesan, R., Michael, M., von Praun, C.: A theory of memory
models. In: Proceedings of the 12th ACM SIGPLAN Symposium on Principles and
Practice of Parallel Programming (PPoPP), pp. 161–172. ACM (2007)
18. Su, W., Abrial, J.R., Pu, G., Fang, B.: Formal development of a real-time oper-
ating system memory manager. In: Proceedings of International Conference on
Engineering of Complex Computer Systems (ICECCS), pp. 130–139 (2016)
19. Tews, H., Völp, M., Weber, T.: Formal memory models for the verification of low-
level operating-system code. J. Autom. Reasoning 42(2), 189–227 (2009)
20. Vaynberg, A., Shao, Z.: Compositional verification of a baby virtual memory man-
ager. In: Hawblitzel, C., Miller, D. (eds.) CPP 2012. LNCS, vol. 7679, pp. 143–159.
Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35308-6 13
21. Ševčı́k, J., Vafeiadis, V., Nardelli, F.Z., Jagannathan, S., Sewell, P.: Com-
pCertTSO: a verified compiler for relaxed-memory concurrency. J. ACM 60(3),
22:1–22:50 (2013)
22. Xu, F., Fu, M., Feng, X., Zhang, X., Zhang, H., Li, Z.: A practical verification
framework for preemptive OS kernels. In: Chaudhuri, S., Farzan, A. (eds.) CAV
2016. LNCS, vol. 9780, pp. 59–79. Springer, Cham (2016). https://doi.org/10.1007/
978-3-319-41540-6 4
23. Yu, D., Hamid, N.A., Shao, Z.: Building certified libraries for PCC: dynamic stor-
age allocation. In: Degano, P. (ed.) ESOP 2003. LNCS, vol. 2618, pp. 363–379.
Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36575-3 25
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Violat: Generating Tests of Observational
Refinement for Concurrent Objects
1 Introduction
Many mainstream software platforms including Java and .NET support mul-
tithreading to enable parallelism and reactivity. Programming multithreaded
code effectively is notoriously hard, and prone to data races on shared memory
accesses, or deadlocks on the synchronization used to protect accesses. Rather
than confronting these difficulties, programmers generally prefer to leverage
libraries providing concurrent objects [19,29], i.e., optimized thread-safe imple-
mentations of common abstract data types (ADTs) like counters, key-value
stores, and queues. For instance, Java’s concurrent collections include implemen-
tations which eschew the synchronization bottlenecks associated with lock-based
c The Author(s) 2019
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11562, pp. 534–546, 2019.
https://doi.org/10.1007/978-3-030-25543-5_30
Violat: Generating Tests of Observational Refinement 535
Fig. 1. Violat generates tests by enumerating program schemas invoking a given con-
current object, annotating those schemas with the expected outcomes of invocations
according to ADT specifications, and translating annotated schemas to executable
tests.
3 Test Enumeration
To enumerate test programs effectively, Violat considers a simple representation
of program schemas, as depicted in Fig. 2. We write schemas with a familiar nota-
tion, as parallel compositions {...}||{...} of method-invocation sequences.
Intuitively, schemas capture parallel threads invoking sequences of methods of
a given concurrent object. Besides the parallelism, these schemas include only
trivial control and data flow. For instance, we exclude conditional statements
and loops, as well as passing return values as arguments, in favor of straight-line
code with literal argument values. Nevertheless, this simple notion is expressive
enough to capture any possible outcome, i.e., combination of invocation return
values, of programs with arbitrarily complex control flow, data flow, and syn-
chronization. To see this, consider any outcome y admitted by some execution of
a program with arbitrarily-complex control and data flow in which methods are
invoked with argument values x, collectively. The schema in which each thread
invokes the same methods of a thread of the original program with literal values
x, collectively, is guaranteed to admit the same outcome y.
Fig. 3. Code generated for the containsValue schema of Fig. 2 for Java Pathfinder.
Code generation for jcstress similar, but conforms to the tool’s idiomatic test format
using decorators, and built-in thread and outcome management.
Fig. 4. Observed outcomes for the size method, recorded by Java Pathfinder and
jcstress. Outcomes list return values in program-text order, e.g., get’s return value
is listed first.
6 Usage
Violat is implemented as a Node.js command-line application, available from
GitHub and npm.2 Its basic functionality is provided by the command:
$ violat-validator ConcurrentHashMap.json
...
violation discovered
---
{ put(0,1); size(); contains(1) } || { put(0,0); put(1,1) }
---
outcome OK frequency
----------------------- -- ---------
0, 0, true, null, null X 7
0, 1, true, null, null 703
0, 2, true, null, null 94,636
null, 1, false, 1, null 2,263
null, 1, true, 1, null 59,917
null, 2, true, 1, null 4
...
reporting violations among 100 generated programs. User-provided classes, indi-
vidual schemas, program limits, and particular back-ends can also be specified:
$ violat-validator MyConcurrentHashMap.json \
--jar MyCollections.jar \
--schema "{get(1); containsValue(1)} || {put(1,1); put(0,1); put(1,0)}" \
--max-programs 1000 \
--tester "Java Pathfinder"
7 Related Work
Terragni and Pezzà survey several works on test generation for concurrent
objects [45]. Like Violat, Ballerina [31] and ConTeGe [33] enumerate tests
randomly, while ConSuite [43], AutoConTest [44], and CovCon [6] exploit
static analysis to compute potential shared-memory access conflicts to reduce
redundancy among generated tests. Similarly, Omen [35–38], Narada [40],
Intruder [39], and Minion [41] reduce redundancy by anticipating potential con-
currency faults during sequential execution. Ballerina [31] and ConTeGe [33]
compute linearizations, but only identify generic faults like data races, dead-
locks, and exceptions, being neither sound nor complete for testing observational
refinement: fault-free executions with un-admitted return-value combinations are
false negatives, while faulting executions with admitted return-value combina-
tions are generally false positives – many non-blocking concurrent objects exhibit
2
https://github.com/michael-emmi/violat.
542 M. Emmi and C. Enea
data races by design. We consider the key innovations of these works, i.e., redun-
dancy elimination, orthogonal and complementary to ours. While Pradel and
Gross do consider subclass substitutability [34], they only consider programs
with two concurrent invocations, and require exhaustive enumeration of the
superclass’s thread interleavings to calculate admitted outcomes. In contrast,
Violat computes expected outcomes without interleaving method implementa-
tions, i.e., considering them atomic.
Others generate tests for memory consistency. TSOtool [17] generates ran-
dom tests against the total-store order (TSO) model, while LCHECK [5] employs
genetic algorithms. Mador-Haim et al. [26,27] generate litmus tests to distin-
guish several memory models, including TSO, partial-store order (PSO), relaxed-
memory order (RMO), and sequential consistency (SC). CppMem [2] considers
the C++ memory model, while Herd [1] considers release-acquire (RA) and
Power in addition to the aforementioned models. McVerSi [8] employs genetic
algorithms to enhance test coverage, while Wickerson et al. [48] leverage the
Alloy model finder [22]. In some sense, these works generate tests of observa-
tional refinement for platforms implementing memory-system ADTs, i.e., with
read and write operations, whereas Violat targets arbitrary ADTs, including
collections with arbitrarily-rich sets of operations.
Violat more closely follows work on linearizability checking. Herlihy and
Wing [20] established the soundness of linearizability for observational refine-
ment, and Filipovic et al. [14] established completeness. Wing and Gong [49]
developed a linearizability-checking algorithm, which was later adopted by Line-
Up [4] and optimized by Lowe [24]; while Violat pays the exponential cost of
enumerating linearizations once per program, these approaches pay that cost per
execution – an exponential quantity itself. Gibbons and Korach [15] established
NP-hardness of per-execution linearizability checking for arbitrary objects, while
Emmi and Enea [11] demonstrate tractability for collections. Bouajjani et al. [3]
propose polynomial-time approximations, and Emmi et al. [13] demonstrate effi-
cient symbolic algorithms. Finally, Emmi and Enea [9,10,12] apply Violat to
checking atomicity and weak-consistency of Java concurrent objects.
References
1. Alglave, J., Maranget, L., Tautschnig, M.: Herding cats: modelling, simulation,
testing, and data mining for weak memory. ACM Trans. Program. Lang. Syst.
36(2), 7:1–7:74 (2014). https://doi.org/10.1145/2627752
2. Batty, M., Owens, S., Sarkar, S., Sewell, P., Weber, T.: Mathematizing C++ con-
currency. In: Ball, T., Sagiv, M. (eds.) Proceedings of the 38th ACM SIGPLAN-
SIGACT Symposium on Principles of Programming Languages, POPL 2011,
Austin, TX, USA, 26–28 January 2011, pp. 55–66. ACM (2011). https://doi.org/
10.1145/1926385.1926394
Violat: Generating Tests of Observational Refinement 543
3. Bouajjani, A., Emmi, M., Enea, C., Hamza, J.: Tractable refinement checking
for concurrent objects. In: Rajamani, S.K., Walker, D. (eds.) Proceedings of the
42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming
Languages, POPL 2015, Mumbai, India, 15–17 January 2015, pp. 651–662. ACM
(2015). https://doi.org/10.1145/2676726.2677002
4. Burckhardt, S., Dern, C., Musuvathi, M., Tan, R.: Line-up: a complete and auto-
matic linearizability checker. In: Zorn, B.G., Aiken, A. (eds.) Proceedings of the
2010 ACM SIGPLAN Conference on Programming Language Design and Imple-
mentation, PLDI 2010, Toronto, Ontario, Canada, 5–10 June 2010, pp. 330–340.
ACM (2010). https://doi.org/10.1145/1806596.1806634
5. Chen, Y., et al.: Fast complete memory consistency verification. In: 15th Interna-
tional Conference on High-Performance Computer Architecture (HPCA-15 2009),
14–18 February 2009, Raleigh, North Carolina, USA, pp. 381–392. IEEE Computer
Society (2009). https://doi.org/10.1109/HPCA.2009.4798276
6. Choudhary, A., Lu, S., Pradel, M.: Efficient detection of thread safety violations
via coverage-guided generation of concurrent tests. In: Uchitel, S., Orso, A., Robil-
lard, M.P. (eds.) Proceedings of the 39th International Conference on Software
Engineering, ICSE 2017, Buenos Aires, Argentina, 20–28 May 2017, pp. 266–277.
IEEE/ACM (2017). https://doi.org/10.1109/ICSE.2017.32
7. Clarke, E.M., Grumberg, O., Peled, D.A.: Model Checking. MIT Press (2001).
http://books.google.de/books?id=Nmc4wEaLXFEC
8. Elver, M., Nagarajan, V.: McVerSi: a test generation framework for fast memory
consistency verification in simulation. In: 2016 IEEE International Symposium
on High Performance Computer Architecture, HPCA 2016, Barcelona, Spain, 12–
16 March 2016, pp. 618–630. IEEE Computer Society (2016). https://doi.org/10.
1109/HPCA.2016.7446099
9. Emmi, M., Enea, C.: Exposing non-atomic methods of concurrent objects. CoRR
abs/1706.09305 (2017). http://arxiv.org/abs/1706.09305
10. Emmi, M., Enea, C.: Monitoring weak consistency. In: Chockler, H., Weissenbacher,
G. (eds.) CAV 2018. LNCS, vol. 10981, pp. 487–506. Springer, Cham (2018).
https://doi.org/10.1007/978-3-319-96145-3 26
11. Emmi, M., Enea, C.: Sound, complete, and tractable linearizability monitoring for
concurrent collections. PACMPL 2(POPL), 25:1–25:27 (2018). https://doi.org/10.
1145/3158113
12. Emmi, M., Enea, C.: Weak-consistency specification via visibility relaxation.
PACMPL 3(POPL), 60:1–60:28 (2019). https://dl.acm.org/citation.cfm?id=
3290373
13. Emmi, M., Enea, C., Hamza, J.: Monitoring refinement via symbolic reasoning. In:
Grove, D., Blackburn, S. (eds.) Proceedings of the 36th ACM SIGPLAN Confer-
ence on Programming Language Design and Implementation, Portland, OR, USA,
15–17 June 2015, pp. 260–269. ACM (2015). https://doi.org/10.1145/2737924.
2737983
14. Filipovic, I., O’Hearn, P.W., Rinetzky, N., Yang, H.: Abstraction for concurrent
objects. Theor. Comput. Sci. 411(51–52), 4379–4398 (2010). https://doi.org/10.
1016/j.tcs.2010.09.021
15. Gibbons, P.B., Korach, E.: Testing shared memories. SIAM J. Comput. 26(4),
1208–1244 (1997). https://doi.org/10.1137/S0097539794279614
16. Godefroid, P. (ed.): Partial-Order Methods for the Verification of Concurrent Sys-
tems. LNCS, vol. 1032. Springer, Heidelberg (1996). https://doi.org/10.1007/3-
540-60761-7
544 M. Emmi and C. Enea
17. Hangal, S., Vahia, D., Manovit, C., Lu, J.J., Narayanan, S.: TSOtool: a program
for verifying memory systems using the memory consistency model. In: 31st Inter-
national Symposium on Computer Architecture (ISCA 2004), 19–23 June 2004,
Munich, Germany, pp. 114–123. IEEE Computer Society (2004). https://doi.org/
10.1109/ISCA.2004.1310768
18. He, J., Hoare, C.A.R., Sanders, J.W.: Data refinement refined resume. In: Robi-
net, B., Wilhelm, R. (eds.) ESOP 1986. LNCS, vol. 213, pp. 187–196. Springer,
Heidelberg (1986). https://doi.org/10.1007/3-540-16442-1 14
19. Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kauf-
mann, San Mateo (2008)
20. Herlihy, M., Wing, J.M.: Linearizability: a correctness condition for concurrent
objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990). https://doi.
org/10.1145/78969.78972
21. Hoare, C.A.R., He, J., Sanders, J.W.: Prespecification in data refinement. Inf. Pro-
cess. Lett. 25(2), 71–76 (1987). https://doi.org/10.1016/0020-0190(87)90224-9
22. Jackson, D.: Alloy: a lightweight object modelling notation. ACM Trans. Softw.
Eng. Methodol. 11(2), 256–290 (2002). https://doi.org/10.1145/505145.505149
23. Liskov, B., Wing, J.M.: A behavioral notion of subtyping. ACM Trans. Program.
Lang. Syst. 16(6), 1811–1841 (1994). https://doi.org/10.1145/197320.197383
24. Lowe, G.: Testing for linearizability. Concurrency Comput. Pract. Exp. 29(4)
(2017). https://doi.org/10.1002/cpe.3928
25. Lu, S., Park, S., Seo, E., Zhou, Y.: Learning from mistakes: a comprehensive
study on real world concurrency bug characteristics. In: Eggers, S.J., Larus, J.R.
(eds.) Proceedings of the 13th International Conference on Architectural Sup-
port for Programming Languages and Operating Systems, ASPLOS 2008, Seattle,
WA, USA, 1–5 March 2008, pp. 329–339. ACM (2008). https://doi.org/10.1145/
1346281.1346323
26. Mador-Haim, S., Alur, R., Martin, M.M.K.: Generating litmus tests for contrasting
memory consistency models. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010.
LNCS, vol. 6174, pp. 273–287. Springer, Heidelberg (2010). https://doi.org/10.
1007/978-3-642-14295-6 26
27. Mador-Haim, S., Alur, R., Martin, M.M.K.: Litmus tests for comparing memory
consistency models: how long do they need to be? In: Stok, L., Dutt, N.D., Hassoun,
S. (eds.) Proceedings of the 48th Design Automation Conference, DAC 2011, San
Diego, California, USA, 5–10 June 2011, pp. 504–509. ACM (2011). https://doi.
org/10.1145/2024724.2024842
28. Michael, M.M., Scott, M.L.: Simple, fast, and practical non-blocking and block-
ing concurrent queue algorithms. In: Burns, J.E., Moses, Y. (eds.) Proceedings
of the Fifteenth Annual ACM Symposium on Principles of Distributed Comput-
ing, Philadelphia, Pennsylvania, USA, 23–26 May 1996, pp. 267–275. ACM (1996).
https://doi.org/10.1145/248052.248106
29. Moir, M., Shavit, N.: Concurrent data structures. In: Mehta, D.P., Sahni, S. (eds.)
Handbook of Data Structures and Applications. Chapman and Hall/CRC (2004).
https://doi.org/10.1201/9781420035179.ch47
30. Musuvathi, M., Qadeer, S.: CHESS: systematic stress testing of concurrent soft-
ware. In: Puebla, G. (ed.) LOPSTR 2006. LNCS, vol. 4407, pp. 15–16. Springer,
Heidelberg (2007). https://doi.org/10.1007/978-3-540-71410-1 2
Violat: Generating Tests of Observational Refinement 545
31. Nistor, A., Luo, Q., Pradel, M., Gross, T.R., Marinov, D.: Ballerina: automatic
generation and clustering of efficient random unit tests for multithreaded code.
In: Glinz, M., Murphy, G.C., Pezzè, M. (eds.) 34th International Conference on
Software Engineering, ICSE 2012, 2–9 June 2012, Zurich, Switzerland, pp. 727–737.
IEEE Computer Society (2012). https://doi.org/10.1109/ICSE.2012.6227145
32. Plotkin, G.D.: LCF considered as a programming language. Theor. Comput. Sci.
5(3), 223–255 (1977). https://doi.org/10.1016/0304-3975(77)90044-5
33. Pradel, M., Gross, T.R.: Fully automatic and precise detection of thread safety
violations. In: Vitek, J., Lin, H., Tip, F. (eds.) ACM SIGPLAN Conference on Pro-
gramming Language Design and Implementation, PLDI 2012, Beijing, China, 11–
16 June 2012, pp. 521–530. ACM (2012). https://doi.org/10.1145/2254064.2254126
34. Pradel, M., Gross, T.R.: Automatic testing of sequential and concurrent substi-
tutability. In: Notkin, D., Cheng, B.H.C., Pohl, K. (eds.) 35th International Con-
ference on Software Engineering, ICSE 2013, San Francisco, CA, USA, 18–26 May
2013, pp. 282–291. IEEE Computer Society (2013). https://doi.org/10.1109/ICSE.
2013.6606574
35. Samak, M., Ramanathan, M.K.: Multithreaded test synthesis for deadlock detec-
tion. In: Black, A.P., Millstein, T.D. (eds.) Proceedings of the 2014 ACM Interna-
tional Conference on Object Oriented Programming Systems Languages & Appli-
cations, OOPSLA 2014, Part of SPLASH 2014, Portland, OR, USA, 20–24 October
2014, pp. 473–489. ACM (2014). https://doi.org/10.1145/2660193.2660238
36. Samak, M., Ramanathan, M.K.: Omen+: a precise dynamic deadlock detector for
multithreaded java libraries. In: Cheung, S., Orso, A., Storey, M.D. (eds.) Pro-
ceedings of the 22nd ACM SIGSOFT International Symposium on Foundations
of Software Engineering, (FS-22), Hong Kong, China, 16–22 November 2014, pp.
735–738. ACM (2014). https://doi.org/10.1145/2635868.2661670
37. Samak, M., Ramanathan, M.K.: Omen: a tool for synthesizing tests for dead-
lock detection. In: Black, A.P. (ed.) Conference on Systems, Programming, and
Applications: Software for Humanity, SPLASH 2014, Portland, OR, USA, 20–24
October 2014, Companion Volume, pp. 37–38. ACM (2014). https://doi.org/10.
1145/2660252.2664663
38. Samak, M., Ramanathan, M.K.: Trace driven dynamic deadlock detection and
reproduction. In: Moreira, J.E., Larus, J.R. (eds.) ACM SIGPLAN Symposium on
Principles and Practice of Parallel Programming, PPoPP 2014, Orlando, FL, USA,
15–19 February 2014, pp. 29–42. ACM (2014). https://doi.org/10.1145/2555243.
2555262
39. Samak, M., Ramanathan, M.K.: Synthesizing tests for detecting atomicity viola-
tions. In: Nitto, E.D., Harman, M., Heymans, P. (eds.) Proceedings of the 2015
10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015,
Bergamo, Italy, 30 August–4 September 2015, pp. 131–142. ACM (2015). https://
doi.org/10.1145/2786805.2786874
40. Samak, M., Ramanathan, M.K., Jagannathan, S.: Synthesizing racy tests. In:
Grove, D., Blackburn, S. (eds.) Proceedings of the 36th ACM SIGPLAN Confer-
ence on Programming Language Design and Implementation, Portland, OR, USA,
15–17 June 2015, pp. 175–185. ACM (2015). https://doi.org/10.1145/2737924.
2737998
41. Samak, M., Tripp, O., Ramanathan, M.K.: Directed synthesis of failing concur-
rent executions. In: Visser, E., Smaragdakis, Y. (eds.) Proceedings of the 2016
ACM SIGPLAN International Conference on Object-Oriented Programming, Sys-
tems, Languages, and Applications, OOPSLA 2016, Part of SPLASH 2016, Ams-
546 M. Emmi and C. Enea
terdam, The Netherlands, 30 October–4 November 2016, pp. 430–446. ACM (2016).
https://doi.org/10.1145/2983990.2984040
42. Shipilev, A.: The java concurrency stress tests (2018). https://wiki.openjdk.java.
net/display/CodeTools/jcstress
43. Steenbuck, S., Fraser, G.: Generating unit tests for concurrent classes. In: Sixth
IEEE International Conference on Software Testing, Verification and Validation,
ICST 2013, Luxembourg, Luxembourg, 18–22 March 2013, pp. 144–153. IEEE
Computer Society (2013). https://doi.org/10.1109/ICST.2013.33
44. Terragni, V., Cheung, S.: Coverage-driven test code generation for concurrent
classes. In: Dillon, L.K., Visser, W., Williams, L. (eds.) Proceedings of the 38th
International Conference on Software Engineering, ICSE 2016, Austin, TX, USA,
14–22 May 2016, pp. 1121–1132. ACM (2016). https://doi.org/10.1145/2884781.
2884876
45. Terragni, V., Pezzè, M.: Effectiveness and challenges in generating concurrent tests
for thread-safe classes. In: Huchard, M., Kästner, C., Fraser, G. (eds.) Proceedings
of the 33rd ACM/IEEE International Conference on Automated Software Engi-
neering, ASE 2018, Montpellier, France, 3–7 September 2018, pp. 64–75. ACM
(2018). https://doi.org/10.1145/3238147.3238224
46. Vafeiadis, V.: Automatically proving linearizability. In: Touili, T., Cook, B., Jack-
son, P. (eds.) CAV 2010. LNCS, vol. 6174, pp. 450–464. Springer, Heidelberg
(2010). https://doi.org/10.1007/978-3-642-14295-6 40
47. Visser, W., Pasareanu, C.S., Khurshid, S.: Test input generation with java
pathfinder. In: Avrunin, G.S., Rothermel, G. (eds.) Proceedings of the ACM/SIG-
SOFT International Symposium on Software Testing and Analysis, ISSTA 2004,
Boston, Massachusetts, USA, 11–14 July 2004, pp. 97–107. ACM (2004). https://
doi.org/10.1145/1007512.1007526
48. Wickerson, J., Batty, M., Sorensen, T., Constantinides, G.A.: Automatically com-
paring memory consistency models. In: Castagna, G., Gordon, A.D. (eds.) Pro-
ceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming
Languages, POPL 2017, Paris, France, 18–20 January 2017, pp. 190–204. ACM
(2017). http://dl.acm.org/citation.cfm?id=3009838
49. Wing, J.M., Gong, C.: Testing and verifying concurrent objects. J. Parallel Distrib.
Comput. 17(1–2), 164–182 (1993). https://doi.org/10.1006/jpdc.1993.1015
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the chapter’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
Author Index