0% found this document useful (0 votes)
36 views

Defining Software Faults: Why It Matters

This document proposes a formal, semantics-based definition of software faults that involves the program, the faulty feature at an appropriate level of granularity, and the specification against which correctness is defined. This definition enables reasoning about fault removal and has practical applications, including characterizing valid program repairs, distinguishing single and multi-site faults, and separating debugging from testing. The definition is based on relational mathematics and refinement of specifications.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Defining Software Faults: Why It Matters

This document proposes a formal, semantics-based definition of software faults that involves the program, the faulty feature at an appropriate level of granularity, and the specification against which correctness is defined. This definition enables reasoning about fault removal and has practical applications, including characterizing valid program repairs, distinguishing single and multi-site faults, and separating debugging from testing. The definition is based on relational mathematics and refinement of specifications.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Defining Software Faults

Why It Matters

Besma Khaireddine Aleksandr Zakharchenko, Ali Mili


University of Tunis El Manar New Jersey Institute of Technology
Tunis, Tunisia Newark, NJ USA
[email protected] {az68,mili}@njit.edu

Abstract— Because faults are at the center of software quality chosen level of granularity (e.g. statement, expression,
concerns, they ought to be defined formally, by semantics-based variable reference, lexical token, etc). For the sake of
criteria that enable us to reason about them. In this paper, we generality, we admit that a feature needs not be
consider a semantics-based definition of a fault, which involves contiguous, hence may involve two or more lexemes at
the program, the faulty feature (at the appropriate level of
different locations in the program.
granularity) and the specification against which correctness and
incorrectness are defined. We explore the implications of this  Our definition of a fault f in a program P with respect to
definition for various aspects of software testing, software specification R involves nothing other than f, P and R; and
reliability, and software repair; and we argue that providing a it is totally formal, modulo the definition of a feature. It
formal, verifiable definition of faults is not a mere intellectual does not involve any subjective value judgement
exercise, but has important practical applications. (adjudging or hypothesizing) nor does it require that we
know the expected state of the program at intermediate
Keywords— fault, fault removal, relative correctness, steps in its execution.
correctness enhancement, software testing, software reliability,
software repair.
We have found, and we argue in this paper, that a formal
I. INTRODUCTION definition of program faults enables us to achieve a number of
capabilities, with concrete practical implications:
In (Avizienis, Laprie, Randell, & Landwehr, 2004) Avizienis
et al. define a fault as the adjudged or hypothesized cause of
an error; an error, in turn, is defined as deviation of the  A formal characterization of fault removal. Given a fault
system state from the correct state. The IEEE Standard IEEE f in a program P, we formulate the condition under which
Std 7-4.3.2-2003 defines a software fault as an incorrect step, a substitute f’ of f constitutes a certifiable fault removal.
process or data definition in a computer program. Whereas  A distinction between a single multi-site fault and
these definitions fulfill their purpose as part of a broader multiple single-site faults. This distinction is important in
ontology, we argue that they do little to support the practice because it enables us to control combinatorial
engineering processes of identifying, inventorying, explosion when we attempt to generate patches: the only
diagnosing, and removing faults in a program. Indeed, the time we ever need to combine patches is if we are looking
definition of Aviezienis et al. relies on adjudging or for a multi-site single fault; other than that, we ought to
hypothesizing, two highly subjective criteria, and assumes, remove faults one at a time, to avoid combinatorial
through its definition of an error, that we have means to judge explosion.
the correctness of arbitrary system states (vs. initial or final  A definition of a unitary increment of correctness enhan-
states). Also, the IEEE definition is vague as to what cement. We introduce the concept of elementary fault
constitutes a correct or an incorrect step, and who gets to removal, which represents an atomic/ minimal program
decide what is or is not correct. transformation that enhances the correctness of a
program, for a given level of granularity.
In this paper, we consider a semantics-based definition of a  Insights into oracle design. The definition of fault
software fault, and discuss, through analytical and empirical removal enables us to design precise test oracles that
arguments, how this definition can be deployed to support the characterize valid program repairs; we find in this paper
engineering processes that we use routinely to deal with that while non-regression is a sufficient condition for
software faults. Specifically, the definition we use has the correctness enhancement, it is not a necessary condition;
following attributes: so that traditional regression testing is prone to cause a
 It is based on an implicit level of granularity of the source loss of recall.
code, which is determined according to the level of  The distinction between removing a fault and remedying a
precision at which we want to localize faults; we use the failure. There is no one-to-one mapping between faults
term feature to refer to a segment of source code at the and failures; the same fault may cause several failures and
the same failure may be traced back to more than one (𝑑𝑜𝑚(𝑅)) and the pre-restriction (of relation 𝑅 to set 𝑇: 𝑇\𝑅 ).
fault. Hence focusing on faults and focusing on failures It is easy to see that the product 𝑅𝐿 (of relation 𝑅 by the
yield vastly different policies. universal relation 𝐿) is: 𝑅𝐿 = {(𝑠, 𝑠’)|𝑠 ∈ 𝑑𝑜𝑚(𝑅)}; we may,
 Letting programs dictate the fault removal schedule. when this does not lead to confusion, use 𝑅𝐿 and 𝑑𝑜𝑚(𝑅)
Programs do not expose all their faults at once; rather, interchangeably. A relation 𝑅 is said to be deterministic if and
some faults may have to be removed before others can be only if 𝑅̂𝑅𝐼, reflexive if and only if 𝐼𝑅, symmetric if and
discovered; the order in which the faults of a program only if 𝑅𝑅̂, and transitive if and only if 𝑅𝑅𝑅. A relation is
come to light depends on the test data we use, and on how said to be a partial ordering if and only if it is reflexive,
each discovered fault is fixed. A given observed failure antisymmetric and transitive.
may be due to a combination of faults (some of which
may be visible earlier than others), hence cannot be III. RELATIVE CORRECTNESS AND FAULTS
remedied until we have derived the set of patches that In order to define faults, we need to discuss relative
remove simultaneously all the relevant faults; this carries correctness, which in turns requires that we discuss (absolute)
a significant risk of combinatorial explosion. By focusing correctness
on fault removal rather than failure remedial, we let the
program expose its faults in the order it determines, and A. Programs and Specifications
we remove them in sequence as they appear. Given a program 𝑃 on space 𝑆, we let the function of program
 Separating debugging from testing. Fault removal (aka 𝑃 be the set of pairs (𝑠, 𝑠’) such that if execution of 𝑃 starts in
debugging) is so inconceivable without testing that these state 𝑠 then it terminates in state 𝑠’. We may, when this does
two terms are used almost interchangeably. Yet, with a not lead to confusion, use a program and its function
definition of fault removal, we can identify, remove and interchangeably. Given a space 𝑆, we let a specification 𝑅 on
prove the removal of a fault by static analysis, without space 𝑆 be a relation on 𝑆.
recourse to any testing; this shown in (Ghardallou,
Diallo, Frias, & Mili, 2016). B. Refinement and Absolute Correctness
 Fault Density vs Fault Depth. Once we define what a Given two relations 𝑅 and 𝑅’, we say that 𝑅’ refines 𝑅, or that
fault is, we discover that there is a difference between the 𝑅 is refined by 𝑅’ (denoted by: 𝑅′ ≥ 𝑅, or 𝑅 ≤ 𝑅′) if and only
statement “Program P has N faults”, and the statement if 𝑅𝐿 ∩ 𝑅′ 𝐿 ∩ (𝑅 ∪ 𝑅′ ) = 𝑅′ . This relation (between relations)
“Program P requires N fault removals”. This difference is a partial ordering. Interpretation: 𝑅’ refines 𝑅 if and only if
stems from the interdependence between faults; we 𝑅’ has a larger domain and assigns fewer images than 𝑅 to
discuss in this paper why we find that the latter is a more each element of the domain of 𝑅. See Figure 1.
meaningful measure of faultiness than the former.

The definitions and propositions we discuss in this paper rely


on relational mathematics; while we assume the reader
familiar with elementary relational mathematics, we briefly
introduce some relevant notations in section II. In section III,
we discuss the definition and properties of faults, and in
section IV we explore some consequences of this definition.
In section V we show the results of an experiment that Figure 1: 𝑅′ ≥ 𝑅.
illustrates some of our discussions, and we conclude in section
VI by offering some tentative insights and prospects. Given a program 𝑃 and a specification 𝑅 , we say that 𝑃 is
correct with respect to 𝑅 if and only if 𝑃 refines 𝑅. Though it
II. RELATIONAL MATHEMATICS looks different, this is the exact same definition as traditional
We assume the reader is familiar with simple relational definitions of (total) correctness, found in (Gries, 1981)
mathematics, and we use this section to introduce some (Hehner, 1993) and others. The following Proposition, due to
definitions and notations, inspired from (Brink, Wolfram, & Mills et al (Mills, Basili, Gannon, & Hamlet, 1986), gives a
Schmidt, 1997). We represent specifications and programs necessary and sufficient condition of correctness: Program
with sets and relations. Sets are represented by programming 𝑃 is correct with respect to specification 𝑅 if and only if
language-like variable declarations; we refer to such a set as 𝑑𝑜𝑚(𝑅 ∩ 𝑃) = 𝑑𝑜𝑚(𝑅). The set 𝑑𝑜𝑚(𝑅 ∩ 𝑃) is the set of
the space of the program of interest, and we refer to its states for which execution of 𝑃 terminates normally and
elements as the states of the program. A relation on set 𝑆 is a satisfies specification 𝑅 . We refer to this set as the
subset of the Cartesian product 𝑆 × 𝑆. Special relations on S competence domain of 𝑃 with respect to 𝑅.
include the universal relation 𝐿 = 𝑆𝑆, the identity relation I, C. Relative Correctness
and the empty relation . Operations on relations include the
In order to define faults (as we do in the next section), we
usual set theoretic operations of union (𝑅 ∪ 𝑅′), intersection
must define relative correctness, i.e. the property of a program
(𝑅 ∩ 𝑅′) and complement (𝑅̅ ), as well as the relational product P’ to be more-correct than a program P with respect to a
( 𝑅𝑅′ , or 𝑅𝑅′ for short), the converse ( 𝑅̂ ), the domain specification R. For the sake of simplicity, we limit our
discussions in this paper to deterministic programs, and we of 𝑅; for the sake of simplicity we assume that the domain of
present the following definition, due to (Mili, Frias, & Jaoua, 𝑅 is finite and that  is a discrete probability distribution. We
2014): Program 𝑃’ is said to be more-correct (respectively: define the reliability of program 𝑃 with respect to
strictly more-correct) than program 𝑃 with respect to specification 𝑅 and probability distribution  on the domain of
specification 𝑅 if and only if: (𝑅 ∩ 𝑃′ )𝐿(𝑅 ∩ 𝑃)𝐿 𝑅 as the probability that a random state in 𝑑𝑜𝑚(𝑅) selected
(respectively: (𝑅 ∩ 𝑃 )𝐿(𝑅 ∩ 𝑃)𝐿 ). We abbreviate this

according to  yields a correct execution of 𝑃 with respect to
property by: 𝑃′ ≥𝑅 𝑃 (respectively: 𝑃′ >𝑅 𝑃). Interpretation: 𝑅. Given the definition of competence domain, this is merely
Program 𝑃’ is (strictly) more-correct than program 𝑃 with the probability that a randomly selected element of 𝑑𝑜𝑚(𝑅)
respect to specification 𝑅 if and only if 𝑃’ has a (strictly) falls in the competence domain of 𝑃 with respect to 𝑅. We
larger competence domain with respect to 𝑅 than 𝑃 . See write:
Figure 2. To contrast correctness (defined in the previous
𝑅  ∑ (𝑠),
section) with relative correctness, we may refer to the former
𝑠∈𝐶𝐷
as absolute correctness. where 𝐶𝐷 = 𝑑𝑜𝑚(𝑅 ∩ 𝑃) is the competence domain of
𝑃 with respect to 𝑅.
Notice that to be more-correct, a program 𝑃’ does not need to
imitate the correct behavior of program 𝑃 ; it may have a We have found, and we argue in this paper, that a formal
wholly distinct correct behavior. The ovals in the domains of definition of program faults enables us to achieve a number of
𝑃 and 𝑃’ show the competence domains of 𝑃 and 𝑃’ with capabilities, with concrete practical
respect to 𝑅.  implications:
 A formal characterization of fault removal. Given a fault
f in a program P, we formulate the condition under which
a substitute f’ of f𝐶𝐷 = 𝑑𝑜𝑚(𝑅 ∩ 𝑃)
constitutes a certifiable fault removal.
A distinction between a multi-site single fault and multiple
𝑑𝑜𝑚(𝑅) S

Figure 2: 𝑃′ ≥𝑅 𝑃
Figure 4: Reliability of 𝑃 with respect to 𝑅 and .
How do we know whether our definition is any good? We
check it against some properties that we expect a definition of We infer from this definition that for any probability
relative correctness to satisfy. distribution , if 𝑃’ is more-correct than 𝑃 with respect to 𝑅,
then 𝑃’ is also more reliable than 𝑃 with respect to 𝑅 and  .
Relative Correctness is reflexive and transitive but not More interestingly, we find that if 𝑃’ is more reliable than
antisymmetric. It is clear why we want relative correctness to 𝑃 with respect to 𝑅 for any probability distribution  , then
be reflexive and transitive. Figure 3 shows why we do not 𝑃’ is more-correct than 𝑃 with respect to 𝑅. To prove this, we
want it to be antisymmetric: we want to allow two programs assume that 𝑃’ is not more correct than 𝑃 , and we find a
to be equally correct yet distinct. probability distribution on 𝑑𝑜𝑚(𝑅) for which 𝑃 is more-
reliable than 𝑃’. If 𝑃’ is not more-correct than 𝑃, then there
exists an element 𝑠0 of the competence domain of 𝑃 that is not
in the competence domain of 𝑃’ . We let (𝑠0) = 1 and
(𝑠) = 0 for all 𝑠 ≠ 𝑠0 , and we find that 𝑅 (𝑃) = 1 and
𝑅 (𝑃′ ) = 0, whence 𝑃 is more reliable than 𝑃’. QED.

Figure 3: 𝑃′ ≥𝑅 𝑃, 𝑃 ≥𝑅 𝑃′, 𝑃𝑃′ Whence we write:

Relative Correctness culminates in absolute correctness. Of (𝑃′ ≥𝑅 𝑃)  (∀ ∶ 𝑅 (𝑃′ ) ≥ 𝑅 (𝑃)).
course, we want a (absolutely) correct program P’ to be more-
correct than (or as correct as) any candidate program. This equation links relative correctness with reliability: to be
According to Mills’ Proposition (previous section) if 𝑃’ is more-correct means to be more-reliable with respect to any
(absolutely) correct with respect to 𝑅 , then 𝑑𝑜𝑚(𝑅 ∩ 𝑃′ ) = probability distribution of inputs.
𝑑𝑜𝑚(𝑅); on the other hand, 𝑑𝑜𝑚(𝑅) is an upper bound of
𝑑𝑜𝑚(𝑅 ∩ 𝑃) for any candidate program 𝑃. QED. Relative Correctness and Refinement. If program 𝑃’ refines
program 𝑃, it means that whatever 𝑃 does, 𝑃’ can do better.
Relative Correctness and Reliability. The reliability of a We would expect this to imply that 𝑃’ refines 𝑃 if and only if
program 𝑃 is defined by means of two parameters: a 𝑃’ is more-correct than 𝑃 with respect to any specification.
specification 𝑅 and a probability distribution  on the domain The proof of necessity is trivial: if 𝑃’ refines 𝑃, then 𝑃’ is a
superset of 𝑃, hence it has a larger competence domain with but not a necessary condition of fault removal; the use of an
respect to any specification, by monotonicity. The proof of unnecessary conditions causes a loss of recall (missing
sufficiency relies on a lemma to the effect that if two functions programs that are actually more-correct). Whereas regression
𝐹 and 𝐹’ are such that 𝐹𝐹′ and 𝑑𝑜𝑚(𝐹 ′ )𝑑𝑜𝑚(𝐹) then testing checks the following condition:
𝐹 = 𝐹 ′ . If 𝑃’ is more correct than 𝑃 for all 𝑅, then it is more- 𝑠𝑑𝑜𝑚(𝑅 ∩ 𝑃)  𝑃′ (𝑠) = 𝑃(𝑠),
correct than 𝑃 with respect to 𝑃 , which can be written relative correctness mandates the (weaker) condition:
𝑑𝑜𝑚(𝑃 ∩ 𝑃′ )𝑑𝑜𝑚(𝑃) . This, in conjunction with the set 𝑠𝑑𝑜𝑚(𝑅 ∩ 𝑃)  𝑠𝑑𝑜𝑚(𝑅 ∩ 𝑃′).
theoretic identity 𝑃 ∩ 𝑃′𝑃 yields 𝑃′ ∩ 𝑃 = 𝑃, from which we
infer 𝑃𝑃′ . QED. Another issue that our definition highlights is the use of fitness
functions in program repair (LeGoues, Dewey-Voigt, Forrest,
Whence we write: & Weimer, 2012). Fitness functions are usually computed as
the sum of weights associated to the test data on which the
(𝑃′ 𝑃)  (∀𝑅 ∶ 𝑃′ ≥𝑅 𝑃).
candidate repair runs successfully, where weights are assigned
to test data according to their preponderance in some usage
Figure 5 shows relative correctness as an intermediate pattern; as such, the fitness function is an approximation of the
property between being more reliable (when we quantify with candidate program’s reliability. Yet, we saw in the previous
respect to ), and being more-refined (when we quantify with section that for a given probability distribution ( ) enhanced
respect to 𝑅). For the sake of completeness, we also show the reliability is a necessary but not a sufficient condition of
quantifications in reverse order (𝑅 then ). relative correctness; hence the use of fitness functions may
cause a loss of precision (retrieving candidate repairs that are
𝑷′ ≥ 𝑷 not actually more-correct than the original).
IV. IMPLICATIONS AND APPLICATIONS
∀ ∀𝑅 A. Elementary Faults
We consider a program 𝑃 and we let 𝑓1 and 𝑓2 be two
features in the source code of 𝑃 that admit substitutes, say 𝑓1’
and 𝑓2’ , such that the program 𝑃’ obtained from 𝑃 by
∑ (𝑠)  ∑ (𝑠) 𝑷′ ≥𝑹 𝑷 replacing 𝑓1 by 𝑓1’ and 𝑓2 by 𝑓2’ is strictly more-correct than
𝑠𝑑𝑜𝑚(𝑃∩𝑃′ ) 𝑠𝑑𝑜𝑚(𝑃)
𝑃 with respect to some specification 𝑅 . According to our
definition of a fault, (𝑓1, 𝑓2) is a fault in 𝑃 with respect to 𝑅.
The question we wish to ponder is whether we are looking at a
∀𝑅 ∀ single two-site fault (𝑓1, 𝑓2) or two single-site faults (𝑓1 and
𝑓2 ). The answer to this question depends, of course, on
whether 𝑓1 alone is a fault and whether 𝑓2 alone is a fault. If
𝑅 (𝑃′ )𝑅 (𝑃)
neither 𝑓1 nor 𝑓2 is a fault, but (𝑓1, 𝑓2) is a fault, then we say
that (𝑓1, 𝑓2) is an elementary fault; we also designate as
Figure 5: Reliability, Relative Correctness, Refinement
elementary fault any single-site fault.

D. Faults and Fault Removal Why is it important to characterize elementary faults?


Because in a fault removal process, it is advantageous to
Now that we have a vetted definition of relative correctness, remove faults one elementary fault at a time. Suppose, for the
we are ready to define faults and fault removals. sake of argument, that each feature of the program admits 𝑁
possible patches, and that we want to rectify (repair) k features
in the program; if we want to repair them all at once then we
Given a program 𝑃, a specification 𝑅, and a feature 𝑓 in
need to analyze 𝑁 𝑘 repair candidates, an 𝑂(𝑁 𝑘 ) operation; but
𝑃, we say that 𝑓 is a fault in 𝑃 with respect to 𝑅 if and
if we want to repair them one at a time, assuming each is an
only if there exists a substitute 𝑓’ of 𝑓 such that program
individual fault, then we need to analyze 𝑘 × 𝑁 repair
𝑃’ obtained from 𝑃 by replacing 𝑓 by 𝑓’ is strictly more-
candidates, an 𝑂(𝑁) operation. Of course, not all elementary
correct than 𝑃 . The pair (𝑓, 𝑓’) is then called a fault
faults are single-site faults; we may have to consider two-site
removal of 𝑓 in 𝑃 with respect to 𝑅.
faults, at a cost of an 𝑂(𝑁 2 ) operation, or three-site faults, at a
cost of an 𝑂(𝑁 3 ) operation, or higher multiplicities. But these
Note that this definition highlights an issue with the traditional are probably a very small fraction of faults, in practice.
practice of regression testing, which checks for fault removal
by ensuring that correct behavior is preserved; as we can see To illustrate the difference between a single multi-site fault
from our definition of relative correctness (and from Figure 2), and multiple single-site faults, we consider two sample
the preservation of correct behavior is a sufficient condition, examples, with a multiplicity of 2. We let 𝑆 be the space
defined by a real array 𝑎[0. . 𝑁], for 𝑁 ≥ 1, an index whose competence domain is the domain of 𝑅 (as the
variable 𝑘 , and a real variable 𝑥 , and we consider the interested reader can easily check); hence 𝑃’ is (absolutely)
following specification (𝑅) and program (𝑃), where 𝑠 stands correct with respect to 𝑅 , hence more-correct than 𝑃 with
for the aggregate 𝑎, 𝑘, 𝑥 and 𝑠’ for 𝑎′, 𝑘′, 𝑥′: respect to 𝑅. This makes the aggregate (k=0, k!=N) a single
𝑁 two-site fault. Figure 6 shows the pattern of relative
′ )| ′
𝑅 = {(𝑠, 𝑠 𝑥 = ∑ 𝑎[𝑖]}, correctness relations (represented by thick black arrows) that
𝑖=1 characterize such situations; 1 and 2 represent,
P: {x=0; k=0; while (k!=N) {x=x+a[k]; k=k+1}}. respectively, the substitution of (k=0) by (k=1) and the
substitution of (k!=N) by (k!=N+1); thin blue lines represent
The function of program P can be written as: substitutions that do not engender relative correctness
𝑁−1
relationships.
𝑃 = {(𝑠, 𝑠’)|𝑎′ = 𝑎  𝑘 ′ = 𝑁  𝑥 ′ = ∑ 𝑎[𝑖] }.
𝑖=0
To compute the competence domain of program P with
respect to specification R, we compute the intersection of R 𝑷′
and P, then take its domain:
𝑅∩𝑃
= {substitutions} 2 1
𝑁−1 𝑁
(1, 2)
{(𝑠, 𝑠’)|𝑎′ = 𝑎  𝑘 ′ = 𝑁  𝑥 ′ = ∑ 𝑎[𝑖]  𝑥 ′ = ∑ 𝑎[𝑖] }
𝑖=0 𝑖=1
= {rewriting} 1 2
𝑁−1 𝑁 𝑁−1 𝑷𝟏
𝑷𝟎 𝑷
{(𝑠, 𝑠’)| ∑ 𝑎[𝑖] = ∑ 𝑎[𝑖] 𝑎 = 𝑎 𝑘 = 𝑁 𝑥 = ∑ 𝑎[𝑖]}
′ ′ ′

𝑖=0 𝑖=1 𝑖=0


= {subtracting the sum from 1 to N-1 from both sides} Figure 6: Pattern of a Single Two-Site Fault
{(𝑠, 𝑠’)| 𝑎[0] = 𝑎[𝑁]𝑎′ = 𝑎 𝑘 ′ = 𝑁 𝑥 ′ = ∑𝑁−1
𝑖=0 𝑎[𝑖]}.
For an example of two single-site faults, we consider the same
The domain of this relation, the competence domain of P with space (without variable 𝑥 ) and we consider the following
respect to R, is: specification 𝑅 and program 𝑃 (where 𝑎[𝑖. . 𝑗] = 0 is a
𝐶𝐷 = {𝑠| 𝑎[0] = 𝑎[𝑁]}. shorthand for saying that 𝑎[ℎ] = 0 for all ℎ between 𝑖 and 𝑗:

Indeed, if the specification mandates to compute the sum from 𝑅 = {(𝑠, 𝑠 ′ )| 𝑎[0] = 𝑎′ [0]  𝑎′ [1. . 𝑁] = 0},
1 to N and the program mistakenly computes the sum from 0
to N-1, then the program behaves according to the P: { k=0; while (k!=N) {a[k]=0; k=k+1}}.
specification for those arrays that satisfy the condition
𝑎[0] = 𝑎[𝑁]. One way to fix this program is to change (k=0) The function of program P can be written as:
into (k=1) and (k!=N) into (k!=N+1). This raises the question:
are we dealing with a single two-site fault or two single-site 𝑷 = {(𝒔, 𝒔′ )|𝒌′ = 𝑵  𝑎′ [0. . 𝑁 − 1] = 0  𝑎′ [𝑁] = 𝑎[𝑁]}.
faults? To answer this question, we consider programs 𝑃0,
where we make the first substitution, and 𝑃1, where we make Indeed, 𝑃 assigns 𝑁 to 𝑘, puts 0 in 𝑎[ ] between indices 0 and
the second substitution. 𝑁 − 1, and preserves (does not modify) 𝑎[𝑁]. Taking the
P0: {x=0; k=1; while (k!=N) {x=x+a[k]; k=k+1}}. intersection of 𝑅 and 𝑃, we find:
P1: {x=0; k=0; while (k!=N+1) {x=x+a[k]; k=k+1}}.
𝑅∩𝑃
By computing the functions of these programs then their = {substitutions}
competence domain the same way we did above for P, we {(𝑠, 𝑠’)|𝑎′ [0] = 𝑎[0]𝑎′ [𝑁] = 𝑎[𝑁] 𝑘 ′ = 𝑁 𝑎′ [0. . 𝑁]
find: = 0 }.
𝐶𝐷0 = {𝑠| 𝑎[𝑁] = 0}.
𝐶𝐷1 = {𝑠| 𝑎[0] = 0}. Taking the domain of this relation, we find:

Because there is no inclusion relation between CD and CD0, 𝐶𝐷 = {𝑠| 𝑎[0] = 0𝑎[𝑁] = 0}.
nor between CD and CD1, neither the substitution of (k=0)
We let 𝑃0 and 𝑃1 be the programs obtained from 𝑃 by
into (k=1) nor the substitution of (k!=N) into (k!=N+1) is a fault
removal. But performing both substitutions simultaneously changing (k=0) into (k=1) and (k!=N) into (k!=N+1),
yields
P’: {x=0; k=1; while (k!=N+1) {x=x+a[k]; k=k+1}}, P0: { k=1; while (k!=N) {a[k]=0; k=k+1}}
P1: { k=0; while (k!=N+1) {a[k]=0; k=k+1}} B. Fault Density and Fault Depth
We consider the array sum program presented earlier, along
We find the following functions of 𝑃0 and 𝑃1: with the specification it is supposed to satisfy:
𝑁

𝑃0 = {(𝑠, 𝑠 ′ )|𝑎[0] = 𝑎′ [0]𝑎[𝑁] = 𝑎′ [𝑁] 𝑘 ′ = 𝑁 𝑅 = {(𝑠, 𝑠 ′ )| 𝑥 ′ = ∑ 𝑎[𝑖]},


 𝑎′ [1. . 𝑁 − 1] = 0}, 𝑖=1
𝑃1 = {(𝑠, 𝑠 )| a’[0..N]=0  𝑘 ′ = 𝑁 + 1}.

P: {x=0; k=0; while (k!=N) {x=x+a[k]; k=k+1}}.

Taking the intersection with 𝑅, we find: We let the level of granularity at which we want to analyze
faults be the expression; in other words, we restrict our
𝑅 ∩ 𝑃0 = {(𝑠, 𝑠 ′ )|𝑘 ′ = 𝑁 attention to faults that stem from using the wrong expression
𝑎[0] = 𝑎′ [0]𝑎[𝑁] = 𝑎′ [𝑁]𝑎′ [1. . 𝑁] = 0} (in an assignment statement, an array reference, a function
call, etc). Given this restriction, we see two faults in this
𝑅 ∩ 𝑃1 = {(𝑠, 𝑠 ′ )|𝑘 ′ = 𝑁 + 1 program, i.e. two features that admit substitutions that would
𝑎[0] = 𝑎′ [0]𝑎′ [0. . 𝑁] = 0} make the program strictly more-correct:
 The fault made up of the aggregate (k=0, k!=N), which
From which we derive the competence domains of 𝑃0 and 𝑃1: we had discussed above; we refer to this fault as 𝑓12 and
we refer to its corresponding substitution (into (k=1,
𝐶𝐷0 = 𝑑𝑜𝑚(𝑅 ∩ 𝑃0) = {𝑠|𝑎[𝑁] = 0}, k!=N+1)) as 12.
𝐶𝐷1 = 𝑑𝑜𝑚(𝑅 ∩ 𝑃1) = {𝑠|𝑎[0] = 0}.  The fault 𝑓3 = (a[k]), which admits a substitution 3
from (a[k]) to (a[k+1]); indeed substitution 3 would also
By comparing these competence domains against that of 𝑃, we produce a correct program, hence enhance correctness.
find that they are both supersets of 𝐶𝐷, hence each individual
transformation has improved the correctness of 𝑃. If we let 𝑃’ Even though we have two faults, 𝑓12 and 𝑓3, we are only one
be the program obtained from 𝑃 by performing both fault removal away from a correct program, because when we
transformations, we find that it is correct with respect to 𝑅, apply substitution 12 to remove fault 𝑓12, we find that 𝑓3 is
hence its competence domain is the domain of 𝑅, which is all no longer a fault; and when we apply substitution 3 to
of 𝑆 . Figure 7 shows the pattern of relative correctness remove fault 𝑓3, we find that 𝑓12 is no longer a fault. But if
relations that characterize a situation where we have two we apply both substitutions, we find a program 𝑄 that has two
single-site faults; we use the same abbreviations 1 and 2 to faults, just like 𝑃. See Figure 8, where thick black arrows
represent program substitutions. represent relative correctness relations.

𝑷′ 𝑷′ 𝑷′′
3

2 1
12 12

3
𝑷𝟎 (1, 2) 𝑷𝟏 𝑷 𝑸

Figure 8: Density and Depth

We use the term fault density to refer to the number of faults


1 2 in a program, and we use the term fault depth to refer to the
𝑷 minimal number of elementary fault removals that separate a
program from a correct program. Usually, when we remove a
fault from a program 𝑃 and obtain a program 𝑃’, we want to
Figure 7: Pattern of Two Single-Site Faults think that fault density has decreased by 1; the foregoing
discussion provides a counter-example to this premise, since 𝑃
The contrast between Figure 6 and Figure 7 illustrates the has a density of 2 whereas 𝑃’ and 𝑃’’, which are derived by
difference between a single two-site fault and two single-site removing one fault from 𝑃, have a density of 0. By contrast,
faults. In Figure 6 the two substitutions need to take place the fault depth satisfies the following relation between 𝑃 (a
before we can observe correctness enhancement, whereas in faulty program) and 𝑃’ (obtained from 𝑃 by removing one
Figure 7 each individual substitution enhances correctness. fault):
𝑑𝑒𝑝𝑡ℎ(𝑃′ ) ≥ 𝑑𝑒𝑝𝑡ℎ(𝑃) − 1.
Equality holds if the transition from P to P’ is on a minimal
path from P to a correct program. Proposition. If for all 𝑠 in 𝑇 program 𝑃’ terminates and
satisfies oracle absoracle(), then 𝑃’ is (absolutely) correct with
Whereas fault density measures the number of faults in a respect to 𝑇\𝑅 .
program, fault depth measures the (minimal) number of
(elementary) fault removals that separate a program from a We do not prove this proposition, but we give an intuitive
correct program; we argue that fault depth is a more argument to justify it. To prove the correctness of a candidate
meaningful measure of program faultiness than fault density. program with respect to 𝑅 , we must check the oracle (𝑠 ∈
In the absence of a definition of faults, we are prone to 𝑑𝑜𝑚(𝑅)(𝑠, 𝑠 ′ ) ∈ 𝑅) , rather than the oracle ((𝑠, 𝑠 ′ ) ∈ 𝑅) ,
confuse/ equate these two numbers; we illustrate in section for the following reason: if we execute the program on an
V.B to what extent they are in fact orthogonal/ independent. element of 𝑇 that is outside the domain of 𝑅, then regardless
C. An Infrastructure of Test Oracles of the final state that the program produces, the predicate
(𝑠, 𝑠 ′ ) ∈ 𝑅 will return false (by definition, if 𝑠 is not in the
We consider a space 𝑆, a specification 𝑅 on 𝑆, and a test data
domain of 𝑅, then no 𝑠’ will satisfy the condition (𝑠, 𝑠 ′ ) ∈ 𝑅).
𝑇 that is a subset of 𝑆. We use relational formulae given in
Yet in fact the oracle should return true in such cases since
(Khaireddine, Zakharchenko, & Mili, 2017) to generate
candidate programs are not responsible for initial states
oracles that check the following properties of a candidate
outside the domain of 𝑅.
program 𝑃’:
 The absolute correctness of 𝑃’ with respect to 𝑇\𝑅 , the Relative Correctness. To fix our ideas, we adopt the
pre-restriction of 𝑅 to 𝑇. following framework in which the oracle of relative
 The relative correctness of 𝑃’ over some other program 𝑃 correctness over some base program 𝑃 is invoked:
with respect to 𝑇\𝑅 .
 The strict relative correctness of 𝑃’ over 𝑃 with respect to {bool relcor; relcor=true;
𝑇\𝑅 . forall (s in T)
{statetype inits; inits=s;
Of course, in general we are interested in claims of correctness Pprime(); // modifies s, preserves inits
and relative correctness with respect to 𝑅, not 𝑇\𝑅 . But if our relcor = relcor && reloracle(inits,s);}
analysis is based on testing program 𝑃’ on test data 𝑇, then the return relcor;}
only certifiable claims we can make involve 𝑇\𝑅 , not 𝑅 .
The Boolean function reloracle(inits,s) is then defined as
Whether a claim on 𝑇\𝑅 can be extended to 𝑅 depends on follows:
whether test data set 𝑇 is adequate; we are not discussing the bool reloracle (s, sprime)
adequacy of 𝑇, hence we limit our claims to 𝑇\𝑅 . {statetype inits; inits=s;
P(); // modifies s, preserves inits
Absolute Correctness. To fix our ideas, we adopt the return !(absoracle(inits,s)) || absoracle(inits, sprime);}
following framework in which the oracle of absolute
correctness is invoked: In other words, predicate reloracle (s, sprime) runs program 𝑃,
{bool abscor; abscor=true; checks whether program 𝑃 passes the test for absolute
forall (s in T) correctness, and if it does then it equates absolute correctness
{statetype inits; inits=s; of 𝑃’ with relative correctness over 𝑃; if 𝑃 fails, then 𝑃’ is off
Pprime(); // modifies s, preserves inits the hook, and is considered (vacuously) to pass relative
abscor = abscor && absoracle(inits,s);} correctness over 𝑃.
return abscor;}
Strict Relative Correctness. To fix our ideas, we adopt the
Now we must write code for the Boolean predicate following framework to test for strict relative correctness over
absoracle(inits,s); To this effect, we assume that we have two some base program 𝑃:
Boolean functions:
 A binary Boolean function R(s,sprime), which represents {bool relcor, strict; relcor=true; strict=false;
specification 𝑅. forall (s in T)
 A unary (in S) Boolean function domR(s), which {statetype inits; inits=s;
represents the domain of 𝑅. Pprime(); // modifies s, preserves inits
Whence we write: relcor = relcor && reloracle(inits,s);
strict = strict || strictpredicate(inits, s);}
bool absoracle(s, sprime) return relcor && strict;}
{return !domR(s) || R(s,sprime);}

The following proposition justifies the design of this oracle:


The Boolean function reloracle(inits,s) is defined the same way
as above, whereas the Boolean function strictpredicate(inits, s) is for some equivalence relation 𝐸𝑄. How do we define 𝐸𝑄? if
defined as follows: the space 𝑆 includes several variables, say 𝑥, 𝑦, 𝑧 , then we
exclude some of them from the equality condition. For
bool strictpredicate (s, sprime) example:
{statetype inits; inits=s; bool EQ (s, sprime)
P(); // modifies s, preserves inits {return (x==xprime && y==yprime);}
return !(absoracle(inits,s)) && absoracle(inits, sprime);}
V. ILLUSTRATION
This function returns true whenever the base program has In this section, we run an experiment in which we take a
failed and the program under test (Pprime, in the oracle code) program of the Siemens Benchmark (Georgia Tech, 2007),
succeeds. Program Pprime() is deemed to be strictly more- make a number of modifications (aka faults, but we reserve
correct than P with respect to 𝑇\𝑅 if and only if Pprime() never the name to situations that meet our definition) to its source
fails whenever P() succeeds, and it succeeds at least once code, then use our infrastructure of oracles to locate and
where P() fails. remove faults from it one at a time, until we find an absolutely
correct program. In addition to illustrating the use of our test
D. Deterministic and Non-Deterministic Specifications oracles, this experiment corroborates many of the premises
In the previous subsection, we discuss how to derive an oracle that we put forth in this paper, as we discuss in section V.C.
for absolute correctness from a specification, then how to
derive an oracle for relative correctness from an oracle for A. Premise and Set-Up
absolute correctness, then how to derive an oracle for strict We consider the program named replace in the Siemens
relative correctness from an oracle for relative correctness. Benchmark, and we enter six modifications that are listed for
The question we consider now is: How do we derive the it in the benchmark; the size of this program is 563 LOC and
specification 𝑅 to begin with? Of course, ideally the the size of the test data set that is provided for it is 5542
specification 𝑅 stems from the product requirements; we have elements. We let 𝑃’ be the correct version of replace, and 𝑃
no doubt that it is usually very difficult to capture software be the version to which we have applied the six proposed
requirements in a cohesive, complete form; and probably even modifications; we refer to these as modifications rather than
more difficult to do so in the form of a simple Boolean faults because now that we have a definition of faults, we want
function in some programming language. At the same time, to use the term only if the conditions of the definition are met.
we argue that this study does not depend critically on the
availability of a complete specification. We can use the Our goal in this experiment is not only to remove all the faults
results of this paper even if we have a partial specification, for of the program, but in fact to show all the fault removal
example a specification capturing the most error-prone aspects sequences that lead from the original faulty program to a
of a specification, of the most critical aspects of a correct program. Because all the claims we make in this
specification, or the easiest to capture aspects of a experiment are based on testing candidate programs on test
specification, etc. The downside of using an incomplete data 𝑇 , these claims (of absolute, correctness, and strict
specification is that it exposes fewer faults in the program. relative correctness) pertain to specification 𝑇\𝑅 , rather than
specification 𝑅. To carry out this experiment, we use two
For the sake of our study, however, we usually use a known devices:
correct version of the program under investigation as a basis  Patch Generation. A means to generate possible patches
for producing a specification. Given a program 𝑃 that we that are of the same nature and the same scale as the
know to be correct, we use it as follows to generate a modifications applied to the original program. To this
deterministic relation 𝑅. effect, we have used a mutant generator, which we have
fined-tuned to the task at hand (Delamaro, Maldonado, &
bool R (s, sprime) Vincenzi, 2001). Given the size of the program and the
{statetype inits; inits=s; mutation parameters that we have selected, the mutant
P(); // modifies s, preserves inits generator produces 90 mutants at each invocation.
return (s==sprime);}  Patch Validation. We use our infrastructure of oracles, as
we describe below.
If, for the sake of experimentation, we want to generate a non-
deterministic specification, then we replace the equality (==) The following algorithm aims to build a graph 𝐺 that shows
in the function above by an equivalence relation: all the fault removal sequences from the faulty program 𝑃 to a
(absolutely) correct program; the nodes of this graph represent
bool R (s, sprime) mutants of 𝑃 and its arcs represent strict relative correctness
{statetype inits; inits=s; relations.
P(); // modifies s, preserves inits
return EQ(s,sprime);}
The algorithm starts with 𝐺 initialized to be a single node that 𝑑𝑒𝑛𝑠𝑖𝑡𝑦(𝑃) = 1, but 𝑑𝑒𝑛𝑠𝑖𝑡𝑦(𝑚79) = 3, rather than 0; as
contains program 𝑃; it proceeds by inspecting the maximal another example, 𝑑𝑒𝑛𝑠𝑖𝑡𝑦(𝑚79.42.47) = 1 , but
elements of 𝐺 that are not strictly correct (hence lend 𝑑𝑒𝑛𝑠𝑖𝑡𝑦(𝑚79.3.42.47) = 2. Note that even though program
themselves to more fault removals), generating their mutants, 𝑃 was seeded with six modifications, its fault depth is five, not
and checking if any of the mutants are strictly more correct six, because the last fault removals were done through double
than the base; if a mutant 𝑀 is found to be strictly more- mutation; still, we do not claim a numeric relation between the
correct than a maximal program 𝑄 then the arc (𝑄, 𝑀) is number of modifications and fault depth, as some
added to the graph, and now 𝑀 becomes a maximal node of modifications could be immaterial (i.e. they do not affect the
the graph. This process concludes whenever all the maximal function of the program); we also suspect that the faults of the
elements of the graph are absolutely correct; if we find Siemens benchmark were chosen in such a way as to alter the
maximal elements of the graph that are not absolutely correct function of the program .
but admit no mutant that is strictly more-correct, then we
deploy double mutation, or perhaps higher order mutations. If
despite deploying higher order mutations we still have
maximal nodes that are not absolutely correct, we conclude
that either the program has faults of a higher order (highly
unlikely) or (more plausible) that the patch generation method
is inadequate (wrong type, or wrong scale, etc). In any case,
we feel that the patch validation method is sound, and that
such deadlocks arise only if patch generation is flawed.

B. Empirical Observations
Figure 9 shows the graph that results from applying the above
algorithm to the replace sample of the Siemens Benchmark.
At the conclusion of the four first iterations, the algorithm
produces m79.3.42.47 as the only maximal element of the
graph; this node is not absolutely correct, and none of its
simple mutants turned out to be strictly more-correct. When
we deploy double mutation, however, we find two double-
mutants that are strictly more-correct than it, and they are both
absolutely correct. One of them (m79.3.42.47.37.85) is
actually the original replace program; the other maximal
mutant is different from the original, but is absolutely correct
with respect to 𝑇\𝑅 all the same.
Figure 9: Fault Removal Graph of replace
For the sake of argument, we assume that the mutant
generation method used in this experiment is complete, in the C. Analysis and Lessons Learned
sense that if a program 𝑃 has a fault 𝑓 and (𝑓, 𝑓’) is a fault When we say that program 𝑃 has six faults, we implicitly
removal for 𝑃 then the generator will produce a mutant of 𝑃 assume that these faults are fixed features of the source code
that has 𝑓’ in lieu of 𝑓 in 𝑃; considering the modifications we of 𝑃, that they are all visible in 𝑃, that we can fix any one of
have entered in replace, and the way we have parameterized the six we choose, that there is a unique way to fix each fault,
the mutant generator, this appears to be a legitimate and that the result would be a program with five faults. If this
assumption. Under this assumption, the density of each were the case, then the graph we would find by applying the
program in this graph is the outgoing degree of the algorithm of section V.B to the replace program would be the
corresponding node. Also, the depth of program is the graph shown in Figure 10; note that in this ideal graph the
minimal distance from the node to the top of the graph. We density of each program (node) equals its depth. The vast
find, from Figure 9, that even though we have applied six difference between Figure 9 and Figure 10 reflects the extent
modifications to produce 𝑃, 𝑃 has only one fault, since the to which this vision of faults is unrealistic. For a given test
node of 𝑃 has a single outgoing arc. One may ask: how can 𝑃 data set 𝑇, program 𝑃 does not expose all its faults at once,
have only one fault if we seeded six? The answer is that the and whenever it exposes more than one fault at a time, the
other faults may be hidden by the first one, and can only be choice of which fault to fix first and how to fix it does matter.
seen once the first one is removed. So in fact the fault density
of 𝑃 is one but its fault depth is five, which again shows that Another observation we can infer from this example is that
fault depth is a more meaningful measure than fault density. when we try to repair a program, it is necessary to focus on
In this graph, fault depth decreases by one with each fault removing faults rather than remedying failures. When we
removal but fault density does not: For example, select a specific failure of the program, defined by an initial
state that leads to an execution that violates the specification,
we have no way to tell whether the fault that causes it low or
high in the fault removal graph (Figure 9). If the fault is high, VI. CONCLUSION
say 𝑘 arcs away from program 𝑃, and each mutant generator In this paper we revisit a definition of software faults (given in
produces 𝑁 mutants, then we need to search a space of size earlier work), and discuss its impact on routine software
𝑁 𝑘 to find an adequate repair. By contrast, if we remove engineering processes such as software testing, software
elementary faults one at a time, in the order in which the quality analysis, and program repair. Our definition of a fault
program exposes them, the search space is never larger than assumes an implicit level of granularity (at which we want to
𝑁 𝑚 , where m is the multiplicity of highest order multi-site isolate faults) and involves only the faulty feature, the
fault (m=2 in our example) rather than the fault depth of the program in which this feature appears, and the specification
program (which is usually unknown and unbounded). against which correctness of the program is judged. Our
definition of a software fault is based on a formal definition of
Another important lesson offered by this example is the relative correctness for deterministic programs, which we have
distinction between fault density and fault depth. If programs validated by analyzing its intrinsic properties and its relation
behaved as shown in Figure 10, then density and depth would to (traditional) absolute correctness, reliability, and
be identical: if we have six faults in a program, it takes six refinement.
fault removals before we can turn it into a correct program.
But in reality, density and depth are unrelated: in Figure 9, Several other authors have introduced and discussed some
𝑃 has a fault density of 1 and a fault depth of five; also, as we approximations of absolute correctness that may be construed
climb the graph, fault depth decrease by one at each step, but as definitions of relative correctness (Zhao, Littlewood,
fault density evolves in an unpredictable manner. Povyakalo, Strigini, & Wright, 2016) (Littlewood & Rushby,
2012) (von Essen & Jobstman, 2013) (Lahiri S. , McMillan,
Finally, we point out how a purely semantic test, the test of Sharma, & Hawbiltzel, 2013) (Logozzo, Lahiri, Faehndrich, &
strict relative correctness, was, with the help of adequate Blackshear, 2014) (Logozzo & Ball, Modular and Verifiable
mutant generation, able to detect and remove all the faults of Program Repair, 2014). Our approach can be characterized by
the program, one at a time. Note that if 𝑃’ is absolutely the following distinguishing premises: we model programs as
correct with respect to 𝑇\𝑅 , this does not mean that 𝑃’ is simple input/output mappings (rather than finite state
absolutely correct with respect to 𝑅 of course, though we can automata); we model specifications as relations (rather than
prove under some conditions that 𝑃’ is then more-correct than frames of assertions); we model relative correctness as a
𝑃 with respect to R (not merely to 𝑇\𝑅 ), which is usually what semantic property between two programs and a specification
(rather than an empirical operational property about program
we want to achieve in a program repair operation; we do not
executions); we make provisions for the fact that correct
show the proof, due to lack of space.
program behavior is not unique (by virtue of the fact that
specifications are usually vastly non-deterministic).

As far as prospects are concerned, we consider that while so


far we have used relative correctness for the sole purpose of
validating program repairs, it is possible to envision using
them for the purpose of generating program repairs. In the
same way that (Gries, 1981) (Hehner, 1993) (Morgan, 1998)
(Dijkstra, 1976) and others have used mathematics of absolute
correctness to produce methods for deriving programs that are
correct by design, we envision the possibility of using the
mathematics of relative correctness to produce methods for
deriving programs that are, by design, more-correct than a
base program. This is clearly a long-term goal, but one that
appears to be worthwhile, for the impact it would have on the
practice of software engineering.

Another venue of research we are considering is to explore the


implications of generalizing the concept of relative correctness
to non-deterministic programs, due to (Desharnais, Diallo,
Ghardallou, Frias, Jaoua, & Mili, 2015), as this concept may
be the key to scaling up. By modeling programs with non-
deterministic relations, we can analyze them for relative
Figure 10: Idealized Fault Removal Graph correctness without capturing their functionality in all its
(if faults were independent attributes of the source code) detail; this is currently under investigation.
Aided Verification (pp. 345-355). St Petersburg,
Russia: Springer Verlag.
VII. BIBLIOGRAPHY Lahiri, S., McMillan, K. L., Sharma, R., & Hawbiltzel, C.
Avizienis, A., Laprie, J.-C., Randell, B., & Landwehr, C. E. (2013). Differential Assertion Checking. ESEC/FSE,
(2004). Basic Concepts and Taxonomy of (pp. 345-355).
Dependable and Secure Computing. IEEE LeGoues, C., Dewey-Voigt, M., Forrest, S., & Weimer, W.
Transactions on Dependable and Secure Computing, (2012). A Systematic Study of Automated Program
11-33. Repair: Fixing 55 Bugs out of 105 for $8 each.
Brink, C., Wolfram, K., & Schmidt, G. (1997). Relational Proceedings, ICSE. Zurich, Switzerland: IEEE
Methods in Computer Science. Berlin, Germany: Computer Society.
Springer Verlag. Littlewood, B., & Rushby, J. (2012). Reasoning about the
Delamaro, M. E., Maldonado, J. C., & Vincenzi, A. M. Reliability of Diverse Two-Channel Systems in
(2001). Proteum/ IM 2.0: An Integrated Mutation which One Channel is "Possibly Perfect". IEEE-TSE,
Testing Environment. In E. W. Wong, Mutation 1178-1194.
Testing for the New Century (pp. 91-101). Berlin, Logozzo, F., & Ball, T. (2014). Modular and Verifiable
Germany: Springer Verlag. Program Repair. OOPSLA, (pp. 133-146).
Desharnais, J., Diallo, N., Ghardallou, W., Frias, M. F., Jaoua, Logozzo, F., Lahiri, S., Faehndrich, M., & Blackshear, S.
A., & Mili, A. (2015). Relational Mathematics for (2014). Verification Modulo Versions: Towards
Relative Correctness. RAMICS (pp. 191-208). Braga, Usable Verification. PLDI, (pp. 294-304).
Portugal: Springer Verlag. Mili, A., Frias, M. F., & Jaoua, A. (2014). On Faults and
Dijkstra, E. W. (1976). A Discipline of Programming. Prentice Faulty Programs. RAMICS (pp. 191-207).
Hall: Upper Saddle River, NJ. Marienstatt, Germany: Springer Verlag.
Georgia Tech. (2007). Siemens Suite. Atlanta, GA: Georgia Mills, H. D., Basili, V. R., Gannon, J. D., & Hamlet, D. R.
Institute of Technology. (1986). Structured Programming: A Mathematical
Ghardallou, W., Diallo, N., Frias, M. F., & Mili, A. (2016). Approach. Boston, MA: Allyn and Bacon.
Debugging Without Testing. ICST (pp. 113-123). Morgan, C. (1998). Programming from Specifications, Second
Chicago, IL: IEEE CS. Edition. London, UK: Prentice Hall.
Gries, D. (1981). The Science of Programming. Heidelberg, von Essen, C., & Jobstman, B. (2013). Program Repair
Germany: Springer Verlag. Without Regret. CAV: Computer Assisted
Hehner, E. C. (1993). A Practical Theory of Programming. Verification. St Petersburg, Russia: Springer Verlag.
New York, NY: 1993. Zhao, X., Littlewood, B., Povyakalo, A. A., Strigini, L., &
Khaireddine, B., Zakharchenko, A., & Mili, A. (2017). A Wright, D. (2016). Modeling the probability of
Generic Algorithm for Program Repair. FormaliSE. failure on demand (pfd) of a 1-out-of-2 system in
Buenos Aires, Argentina: IEEE Computer Society. which one channel is “quasi-perfect”. Reliability
Lahiri, S. K., McMillan, K. L., Sharma, R., & Hawblitzel, C. Engineering & System Safety, 230-245.
(2013). Differential Assertion Checking. Computer

You might also like