0% found this document useful (0 votes)
84 views

Weakest-Precondition of Unstructured Programs

The document discusses an approach for verifying unstructured programs through the computation of weakest preconditions. It presents a language for representing unstructured programs as control flow graphs. It then describes a multi-stage approach to transforming programs into verification conditions by first making them reducible, eliminating loops, applying single-assignment, and then computing weakest preconditions on the resulting acyclic passive program to generate a verification condition. The computation produces a verification condition that is linear in size to the original program.

Uploaded by

Katmai
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views

Weakest-Precondition of Unstructured Programs

The document discusses an approach for verifying unstructured programs through the computation of weakest preconditions. It presents a language for representing unstructured programs as control flow graphs. It then describes a multi-stage approach to transforming programs into verification conditions by first making them reducible, eliminating loops, applying single-assignment, and then computing weakest preconditions on the resulting acyclic passive program to generate a verification condition. The computation produces a verification condition that is linear in size to the original program.

Uploaded by

Katmai
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Weakest-Precondition of Unstructured Programs

Mike Barnett and K. Rustan M. Leino Microsoft Research One Microsoft Way Redmond, WA 98052, USA mbarnett,leino @microsoft.com

Abstract
Program verication systems typically transform a program into a logical expression which is then fed to a theorem prover. The logical expression represents the weakest precondition of the program relative to its specication; when (and if!) the theorem prover is able to prove the expression, then the program is considered correct. Computing such a logical expression for an imperative, structured program is straightforward, although there are issues having to do with loops and the efciency both of the computation and of the complexity of the formula with respect to the theorem prover. This paper presents a novel approach for computing the weakest precondition of an unstructured program that is sound even in the presence of loops. The computation is efcient and the resulting logical expression provides more leeway for the theorem prover efciently to attack the proof.

0 Introduction
A technique for precisely checking that a computer program meets specied correctness criteria is static program verication. The typical architecture of a static program verier takes as input a program and its specication, generates from these a verication conditiona rst-order logical formula whose validity implies that the program meets the specicationand then passes the verication condition to a theorem prover. The engineering of the verication condition has a large impact on the proving task presented to the theorem prover [11]. The primary goal is to prevent redundancy in the verication condition, which lets the prover complete its task more efciently. Although the exact nature of what constitutes redundancy may depend on the operation of the theorem prover, one general desideratum is that the formula not be dramatically larger than it needs to be. In this paper, we describe the verication condition gen1

eration in the Spec# [2] static program verier. It produces verication conditions that are decidedly smaller than those produced by ESC/Java [11, 13], the leading automatic program checker of its kind. Moreover, our verication condition generation is more general, because it applies to general control-ow graphs, not just to structured programs. Another little contribution of this paper is the data structure used when computing single-assignment incarnations, which can reduce the number of incarnations produced. Like the verication condition generation in ESC/Java [10, 14, 11], we proceed in stages. Our starting point is a general control-ow graph. For us, this was a natural choice, because the Spec# static program verier uses as its input language the intermediate language of the .NET virtual machine, whose branch instructions can give rise to any control ow. Using standard compilation techniques that duplicate instructions to eliminate multiple entry points to loops [0], we transform the general controlow graph into a reducible one. (In fact, being a superset of C#, Spec# inherits statements that enable irreducible control ow already at the source level.) We then eliminate loops, producing an acyclic control-ow graph that is correct only if the original program is correct. We apply a single-assignment transformation to the acyclic program and then turn it into a passive program by changing assignment statements into statements. Finally, we apply weakest preconditions to the unstructured, acyclic, passive program to generate the verication condition. In this paper, we describe the stages of this pipeline in reverse order. But before we do, we present the unstructured language under consideration and describe its executions and correctness criteria.

1 Programs and Correctness


Throughout this paper, we think of a program as a chunk of code that is to be veried. This may correspond to the implementation of a method in the source program, for example.

The language we consider in this paper follows this grammar:

A program consists of a number of basic blocks. Each basic block has a label, a body, and a possibly empty set of successors. We assume the programs rst block is labeled . A program gives rise to a set of execution traces. An execution trace consists of a sequence of program states, each a valuation of the program variables. A trace is either innite or it ends in termination, ends in error, or ends in infeasibility. Intuitively, each trace of a program consists of the execution of successive blocks starting from , at the end of each block arbitrarily choosing one of the declared successor blocks, if any; the trace ends in termination if there are no successors to choose from, ends in error if an statement evaluates to , and ends in infeasibility if an statement evaluates to . In the next two paragraphs, we make this denition more precise. A statement gives rise to a set of nite execution traces. gives rise to the set of The assignment statement terminating traces , where state is like state except that it evaluates to . The details of expressions are not important here, but we assume an expression always evaluates to some value in each state. The statement sets to an arbitrary value, thus giving rise to the set of all terminating traces , where and agree on their valuation of all variables except possibly . The statement gives rise to the terminating (single-state) traces where , and to the erroneous traces where . The statement gives rise to the terminating traces where , and to the infeasible traces where . Sequential composition gives rise to the non-terminating traces of , and to the terminating traces of continued (via a matching intermediate state) by the traces of . Finally, is just a shorthand for . The set of traces of a block is the smallest set of traces that include (a) the set of non-terminating traces of the body of , (b) the set of terminating traces of the body of , if has no successors, and (c) the set of terminating traces of the body of continued by the traces of the successors of , if has successors. The set of traces of a program is the set of traces of block . A program is correct if none of its traces ends in error. Note that this denition of correctness does not say anything about the nal state of terminating executions, but one can encode given postconditions by putting an appropriate 2

statement at the end of blocks with no successors. Note also that correct programs can have traces that end in infeasibility; such can be thought of as the execution making mistakes in the arbitrary choices inherent in and statements and in the arbitrary choice of the initial state. Any given preconditions of the chunk of code to be checked can be encoded by putting an appropriate statement at the beginning of block . Finally, note that a correct program can include neverending executions. Our little language may seem impoverished at rst, but it sufces for verication purposes (cf. [14]). In fact, it closely resembles the statements in BoogiePL [5], the intermediate language used by the Spec# static program verier. For example, conditional control ow, as in a common if statement

can be encoded in our little language as:

Iteration is supported via the statements. A procedure call is replaced by an encoding of the callees pre/post specication, which can be done using , , and [14]. Finally, exceptions are encoded using a couple of additional variables (cf. [14]) and conditional control ow that threads through exception handlers. We are now ready to describe the details of the verication condition generation.

2 Weakest Preconditions
In this section, we dene weakest preconditions of unstructured programs. This is the last stage in our verication-condition generation pipeline. We assume programs to be passive (there are no assignment statements). It is this computation of weakest preconditions that is at the heart of making the verication condition palatable to the theorem prover. In fact, our technique produces a verication condition that is linear in the size of the passive program. For any statement and predicate on the post-state of , the weakest precondition of with respect to , written , is a predicate that characterizes all pre-states of from which no execution will go wrong and from which every terminating execution ends in a state satisfying [8]. The weakest preconditions of the passive statements are dened as follows, for any :

Note that is monotonic in its second argument. In a structured program, the central problem to be overcome in computing weakest preconditions is that of the choice statement, , which arbitrarily chooses one of and to execute. Its weakest precondition is dened by

In the rest of this section, it will be convenient to include the auxiliary variables in states and traces. When we do so, well refer to the states as augmented states. Lemma 0. For any program state , there is an augmented state that agrees with on the values of the program variables and that satises all the programs block equations. Proof. The right-hand side of each block equation is a monotonic function of auxiliary variables (since is monotonic in its second argument). Thus, the conjunction , of block equations can be put into the form where denotes the tuple of auxiliary variables and is the tuple of block-equation right-hand sides. Since is a has monotonic function on a complete lattice, a solution in (by Tarskis Theorem [17]). Lemma 1. Let be a passive program, be a basic block in , and an augmented state that satises all block equations of . If is in , then every execution from starting in is either correct or has a correct prex that returns to block . Proof. By induction over the set of blocks not yet visited in an execution prex. If holds at the beginning of the execution from a block , then the fact that satises the block equation for means that the execution of s body is correct and that, for every successor of , holds upon termination of the body. For any successor block that is already visited in the execution trace, we are done. Morever, since the program is passive, all block equations still hold, so the antecedent for applying the induction hypothesis on any successor holds, and applying the induction hypothesis leads to a well-founded induction because there is one fewer blocks still to visit. Lemma 2. Let be a passive program, be a basic block in , and an augmented state that satises all block equations of . If is in , then every execution from starting in is correct. Proof. Take any execution trace and chop it up into the longest possible segments that do not repeat any blocks. Since this is a passive program, the rst and last states of any terminating segment are the same. Then each of these segments is correct, by Lemma 1, which implies that the whole execution is correct. Theorem 3. For any passive program , if the verication condition for is a valid formula, then is correct. Proof. By Lemma 0, we can augment the initial state with values for the auxiliary variables to form an augmented state that satises the conjunction of block equations. From the validity of the verication condition, we then conclude that holds in . By Lemma 2, every execution of the program is correct. 3

The problem is that the duplication of in the right-hand side of this equation introduces redundancy. represents proof obligations downstream of the choice statement, and this naive formulation suggests that the theorem prover would need to process twice. In general, may need to be processed twice, but in practice, large parts of are often independent of which choice is taken [11]. Luckily, passive programs satisfy a property that lets this equation be formulated in a way that signicantly reduces redundancy [13]. The alternate form uses and so-called weakest liberal preconditions ( ) and produces verication conditions whose size is quadratic in the size of the passive program [11]. This alternate form applies to structured programs only, so applying it to unstructured programs would require some preprocessing step. Unstructured programs do not have the structured choice statement. Instead, they have statements, which at rst seem even more disastrouscertainly, we would not like to explode the control-ow graph into a tree, which would lose all the sharing that a control-ow graph representation affords (not to mention that we dont actually assume acyclicity in this section of the paper, even though in our application the passive programs are all acyclic). Here is our solution. For every block

we introduce an auxiliary variable . Intuitively, is if the program is in a state from which all executions beginning from block are correct. Formally, we postulate the following block equation:

where denotes the set of successors of so that the second argument to is the conjunction of for each block in that set. For example, the block equation for in the previous section is:

Each block contributes one block equation, call their conjunction , and the programs verication condition is:

The verication condition and block equations are in terms of the programs variables and the auxiliary variables.

3 Passication
We convert a loop-free program into a passive program by rst rewriting it in a single-assigment form and then removing all of the assignment statements.

3.0 Single Assignment


Dynamic single-assignment (DSA) [9] is similar to the standard static single-assignment (SSA) [4] where even statically in the program text there is at most one denition for each variable. In DSA form, there may be more than one denition, but in any program execution, at most one of them will be executed. We convert the loop-free program into DSA form by noting that after each update to a variable, its value must be understood relative to the newly updated state by identifying each updated value as a new incarnation of the variable. For instance, we replace the assignment statement:

incarnation is assigned to at most once. In the current example, either block or block will execute and will be equal to the corresponding incarnation from that block. This procedure for converting the program to DSA form means that a new incarnation is potentially created for each variable at every join point. However, this may lead to the introduction of more incarnations than strictly necessary. Consider the program in Fig. 0. The algorithm sketched

with the assignment statement:

Figure 0. A program that does not need a new incarnation at every join point.

above would create a fresh incarnation at the join points and , resulting in the program in Fig. 1. But it is clear

where is a fresh incarnation. In general, all variables read by the statement are replaced by their current incarnations. After a variable update (assignment or statement), a fresh incarnation becomes the new current incarnation for the updated variable. At the beginning of the program, an initial incarnation is created for each program variable. We call the last incarnation of a variable in a block the blocks incarnation for that variable. The algorithm for performing these replacements processes the graph in a topologically sorted order. For straight-line code, it is simple to iterate over the sequence of statements, replacing all of the variables with their current incarnations. But at join points (nodes in the control-ow graph with more than one predecessor), a node may be inheriting conicting current incarnations from its predecessors. For instance, in the program in Section 1, let s incarnation for be , s incarnation be , and s incarnation be . Consider block : which incarnation should a reference to (on the right-hand side of an assignment statement) be taken to be, or ? To model the joining of the values, we introduce a fresh incarnation, , and introduce new assignment statements at the end of blocks and : and , respectively. We also update each blocks incarnation (for ) to be . (This reects a choice; we could leave their incarnations to be and respectively, but we next discuss how either choice leads to the excessive creation of incarnations.) This has the effect that during any particular execution, each 4

Figure 1. The program from Fig. 0 in DSA form. Assuming the processing order is: , , , , , , the incarnation replaces as s incarnation when processing and the incarnation is then generated when processing since s incarnation is . that a minimal renaming would result in the DSA shown in Fig. 2. We achieve this reduction by keeping a set of incarnations as each blocks incarnation. All of the incarnations have the same value, so when a join point is reached, any one of the incarnations in its predecessorss incarnation set can be used.

3.1 Passive Programs


Once the program has been converted to DSA form, we replace all assignment statements by statements.

loop body to represent an arbitrary loop iteration, we must make sure that the values of any variables modied within the loop have a value that they might hold on any iteration of the loop. For each natural loop , we collect into a set the variables that are updated by any statement in any block in the loop. These variables are called loop targets. For each loop target in , we introduce a statement and insert it at the beginning of , before any of the existing statements in that block. Wiping out all knowledge of the value a variable might hold may cause the theorem prover to be unable to prove the verication condition. That is, it induces an overapproximation of the original program and loses too much precision. To this end, we allow for each loop to have an invariant: a condition that must be met on each iteration of the loop. A loop invariant may be written by a user or it may be one inferred by another component of the Spec# static program verier. (Inferring invariants is important to spare the programmer from an undue annotation burden.) Loop invariants are encoded as a prex of statements at the beginning of the loop headers code block. These assert statements cannot be validated if any of the variables they mention are in . Instead, we introduce a copy of this sequence of statements into each predecessor node of (including the node that is the source of the back edge). Since the assertions are now checked just before the jump to the loop header, we change the statements into statements in itself. We process loop invari statements and ants in this way before adding the cutting the back edges. The resulting followed by the statements have the effect of retaining, about the loop targets, the information in the loop invariant. We claim that this transformation does not affect the correctness of the program. It may however increase the size of the code since it introduces a copy of some code at the source of each edge instead of having a single copy at its target. When a loop heads predecessor has additional edges to other nodes than the header, this adds an assertion to control-ow paths that it had not been on previously. However this is a conservative approximation: if the transformed program executes correctly, then so would the original program. Note that even after removing all back edges, the source node of the back edge is still reached from the loop header along forward edges. 5

Figure 2. The program from Fig. 0 with minimal renaming. Note that there was no need to create the incarnation when processing since s incarnation is the set instead of having to choose either one of them.

We replace the assignment statement:

with the statement:


We are able to replace the assignment with an statement since the value of is not used prior to its denitionin effect, we thus assume that had the desired value all along. Using an statement in this way expresses what some language use a let binding for: giving a name to a particular value.

4 Loops
In this section, we describe the transformation from a reducible control-ow graph into an acyclic control-ow graph. (We use the standard techniques for converting an irreducible graph into an equivalent, although possibly far larger, reducible graph. We are looking into ways to deal with irreducible graphs that avoid this problem, but so far it has not been an issue.) A reducible control-ow graph is one where it is possible to identify a unique loop head for each loop (throughout this section, we use standard terminology from compilers [0]). In order to identify the loops, we begin by nding all of the back edges. It is the existence of a back edge that uniquely identies a loop. A back edge is an edge in the control-ow graph whose tail (target of the edge) dominates its head (source of the edge). One node dominates another node when all paths to the latter pass through the former. The loop header for a back edge, , is the target of the edge. A loop header, , may have more than one loop associated with it: each natural loop is identied by the pair . We remove all back edges to cut the loops, thus transforming the graph into an acyclic one. But in order for the

5 Example

The passive form of the program is then:


We illustrate our technique with a simple example. Consider the following Spec# source program:

// precondition // postcondition

// loop invariant

After computing the , the set of block equations are then:

The control-ow graph corresponding to this method is encoded as follows, where we have used a variable to denote the result value:

// precondition // loop invariant // loop guard

// negation of guard // statement // postcondition

where we use as a right-associative operator whose binding power lies between that of and . Finally, the verication condition is:

After cutting back edges, the loop-free program is:

6 Related Work
The use of single-assignment form for program analysis has a long history; the canonical reference is Cytron et al. [4]. Feautrier [9] introduced dynamic single-assignment, using it in the analysis of nested-loop programs. Since then, it has been used extensively in the context of nested-loop programs, e.g., [1, 15, 16]. The ESC/Java checker [10] used DSA in its generation of verication conditions [11]. ESC/Java also converts programs to be loop free in order to compute verication conditions, either by unrolling the loop a certain number of times (which misses some execution traces) or by a sound treatment [14]. We have not seen descriptions of single-assignment that map variables to sets of incarnations, like we do to reduce the number of incarnations needed. 6

// check inv. // havoc loop targets // assume inv. // check inv. // removed back edge

Despite their signicant advantages, many other verication tools, including LOOP [18] and JACK [3], do not make use of redundancy-reducing techniques when generating verication conditions, thus producing voluminous verication conditions. Weakest preconditions for unstructured programs have been dened in a similar way before [12]. However, in that work, the denitions were applied directly to programs that were neither passive nor loop-free, so the block equations used auxiliary functions instead of auxiliary variables, and the program semantics (that is, the antecedent of the verication condition, ) was dened to be a xpoint of these functions. The rewriting of these formulas into formulas without quantiers, functions, and xpoints can give rise to a doubly exponential increase in size.

We are also investigating whether our use of sets of incarnations achieves minimality, because there may be blocks considered even later in the algorithm that force new incarnations to be created.

Acknowledgments
Wed like to thank the Spec# team for various discussions about this design. Manuel F ahndrich contributed to the design of a previous scheme to generate verication conditions by rst transforming unstructured programs into structured ones and to the design of where to place declared loop invariants in BoogiePL programs. Bart Jacobs implemented loop invariants in Spec#, taking measures to make sure these end up in the right place in the BoogiePL programs. Simon Ou coded up the block equations. We also thank Dave Naumann and the referees for their careful readings of this paper.

7 Conclusions
We have presented a detailed account of our procedure for computing a verication condition from a program (and its specication) in order to use an automatic theorem prover for program verication. Our input does not need to be a structured program; we deal efciently with unstructured control-ow graphs. As a special case, our technique can be applied to structured programs, which will yield formulas with less redundancy than previously reported (formulas linear in the size of the passive program compared to the previous quadratic). In the end, it is the time and space needed to generate verication conditions and the resulting theorem prover performance that matter. We had rst implemented a transformation of the unstructured program into a structured one, from which we could then use previous techniques. We found that the transformation, which is exponential in the general case, caused our machines to run out of memory for some methods in the programs we applied our tool to. Good heuristics could probably have improved the situation, but since our new technique can be applied directly to unstructured programs, we abandoned the transformation in favor of it. The theorem prover we currently use, Simplify [6], was developed along with redundancy-reduction techniques for the ESC/Modula-3 checker [7, 11], similar to those of the later ESC/Java checker. We were uncertain that Simplify would perform well on our new verication conditions, since their at structure does not provide any guidance about a good order in which to do case splits and Simplify performs case splits only as a last resort. So far, we have not detected any such problems, though. We are in the process of switching to a theorem prover whose case splits are performed by a SAT solver, and we are hopeful that our verication conditions will be an especially good match for such a theorem prover. 7

References
[0] Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1987. [1] Zena M. Ariola, Barton C. Massey, M. Sami, and Evan Tick. A common intermediate language and its use in partitioning concurrent declarative programs. New Generation Computing, 14(3):281315, 1996. [2] Mike Barnett, K. Rustan M. Leino, and Wolfram Schulte. The Spec# programming system: An overview. In Gilles Barthe, Lilian Burdy, Marieke Huisman, Jean-Louis Lanet, and Traian Muntean, editors, CASSIS 2004, Construction and Analysis of Safe, Secure and Interoperable Smart devices, volume 3362 of LNCS, pages 4969. Springer, 2005. [3] L. Burdy, A. Requet, and J.-L. Lanet. Java applet correctness: a developer-oriented approach. In Keijiro Araki, Stefania Gnesi, and Dino Mandrioli, editors, FME 2003: Formal Methods, International Symposium of Formal Methods Europe, volume 2805 of LNCS, pages 422439. Springer, September 2003. [4] Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. Efciently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems, 13(4):451490, October 1991. [5] Robert DeLine and K. Rustan M. Leino. BoogiePL: A typed procedural language for checking object-

oriented programs. Technical Report 2005-70, Microsoft Research, May 2005. [6] David Detlefs, Greg Nelson, and James B. Saxe. Simplify: A theorem prover for program checking. Technical Report HPL-2003-148, HP Labs, July 2003. [7] David L. Detlefs, K. Rustan M. Leino, Greg Nelson, and James B. Saxe. Extended static checking. Research Report 159, Compaq Systems Research Center, December 1998. [8] Edsger W. Dijkstra. A Discipline of Programming. Prentice Hall, Englewood Cliffs, NJ, 1976. [9] Paul Feautrier. Dataow analysis of array and scalar references. International Journal of Parallel Programming, 20(1):2353, 1991. [10] Cormac Flanagan, K. Rustan M. Leino, Mark Lillibridge, Greg Nelson, James B. Saxe, and Raymie Stata. Extended static checking for Java. In Proceedings of the 2002 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), volume 37, number 5 in SIGPLAN Notices, pages 234245. ACM, May 2002. [11] Cormac Flanagan and James B. Saxe. Avoiding exponential explosion: Generating compact verication conditions. In Conference Record of the 28th Annual ACM Symposium on Principles of Programming Languages, pages 193205. ACM, January 2001. [12] K. Rustan M. Leino. A SAT characterization of boolean-program correctness. In Thomas Ball and Sriram K. Rajamani, editors, Model Checking Software: SPIN 2003, volume 2648 of LNCS, pages 104120. Springer, May 2003. [13] K. Rustan M. Leino. Efcient weakest preconditions. Information Processing Letters, 93(6):281288, March 2005. [14] K. Rustan M. Leino, James B. Saxe, and Raymie Stata. Checking Java programs via guarded commands. In Bart Jacobs, Gary T. Leavens, Peter M uller, and Arnd Poetzsch-Heffter, editors, Formal Techniques for Java Programs, Technical Report 251. Fernuniversit at Hagen, May 1999. Also available as Technical Note 1999-002, Compaq Systems Research Center. [15] Carl Offner and Kathleen Knobe. Weak dynamic single assignment form. Technical Report HPL-2003169, HP Laboratories, 2003. 8

[16] K. C. Shashidhar, Maurice Bruynooghe, Francky Catthoor, and Gerda Janssens. Geometric model checking: An automatic verication technique for loop and data reuse transformations. Electronic Notes Theoretical Computer Science, 65(2), 2002. [17] Alfred Tarski. A lattice-theoretical xpoint theorem and its applications. Pacic Journal of Mathematics, 5:285309, 1955. [18] Joachim van den Berg and Bart Jacobs. The LOOP compiler for Java and JML. In Tiziana Margaria and Wang Yi, editors, Tools and Algorithms for the Construction and Analysis of Systems, 7th International Conference, TACAS 2001, volume 2031 of LNCS, pages 299312. Springer, April 2001.

You might also like