A Conceptual Model of Software Testing
A Conceptual Model of Software Testing
A. C. Marshall
Centre for Mathematical Software Research, University of Liverpool,
PO Box 147, Liverpool L69 3BX, U.K.
Abstract
As code is executed correctly under test, confidence in the correctness. of
the code increases. In this context, an intuitive conceptual model of the
process of software testing which draws upon experience gained with
mutation analysis is presented. The model is used to explain how the testing
of one path can influence confidence in other (possibly unexecuted) paths. It
is also discussed in relation to software reliability and systematic structural
testing, and is shown to be consistent with observations made during these
forms of testing.
KeywordslPhrases: Mutation Testing, Software Reliability, Structural
Testing.
1.0 Introduction
This paper addresses the question: "what knowledge is gained, in terms of confidence of correctness,
when a program is successfully executed?" An attempt is made to answer, or at least provoke
discussion regarding, this question by presenting a conceptual model which uses the notions of basic
blocks and mutation analysis [1,2] in order to explain the process of software testing and the subsequent
reliability growth. This model is described and then shown to be applicable in a number of situations.
It is assumed that a program contains a number of paths each consisting of a connected set of basic
blocks, and that each basic block resides on a set of distinct program paths. A basic block contains one
or more statements which, if untested, could be totally incorrect. The process of testing is therefore, in
effect, a way of demonstrating that the possible errors are not actually present, and that the code which
is present, is correct. The idea that untested statements contain potential errors was the inspiration
behind mutation testing.
6 Journal of Software Testing, Verification and Reliability Vol. I. Issue No.3
Mutation analysis can generally be performed in two distinct ways; the first, strong mutation [1],
involves inserting one specific error into a program, compiling, linking and then executing the code
with test data designed to expose the recently inserted error. If test data can be designed which exposes
the error and makes the program behave incorrectly, then it can be stated that this particular error
cannot possibly be present in a correct program and therefore may be ruled out from any further
consideration. In theory, an exhaustive set of potential errors could be defined, individually introduced
into the code and then exposed, thus showing that the program does not contain any errors (except
errors of omission which, this author believes, can only be discovered by analysis under actual, or
simulated, operating conditions, for example, by random Testing from an operational profile [3,4]).
The above form of mutation analysis is obviously impractical for any life-sized program since the
amount of CPU time required would be prohibitively large. However, Howden [2], attempted to avoid
this overwhelming deficiency by the introduction of weak mutation which is best described as a cross
between strong mutation and dynamic structural testing.
Weak mutation testing relies on defining a series of test data criteria which must be satisfied in order to
discount the existence of possible errors. As an example, consider the statement if (i.GT.3) then . . .; if i
is integral then the test data set which removes the possibility of an error in the relational operator
would be i E (2, 3, 4} corresponding to just before-, on- and just after- the border between true and
false. This test data is such that if a different relational operator were used in place of .GT. then the
logical result of the predicate would differ on at least one of the test cases. Other test data requirements
can be defined for other classes of errors [2,5].
The test data set given above is said to be reliable for exposing a relational operator error in the above
statement. Use of the word 'reliable' to describe 'probing' test data is not ideal because confusion easily
arises with areas of software reliability. Indeed, in this paper, the author does not refer to individual
basic blocks being 'reliable' but instead writes of degrees of 'confidence' of particular blocks depending
upon the amount of testing which the block has received.
obtained by testing that block on that path; areas which remain unshaded still correspond to potential
errors which have not been shown to be definitely absent. It can be seen that most of the shading is on
path 1, the path which control passed down, but that there is also a degree of implied confidence in the
other paths. This is because a successful execution of the block on one path implies confidence that the
block will also perform correctly on other paths. The different types of shading correspond to the
individual executions of that block along path 1.
1 Basic block i
C
o
n
f I-;:-;-
/
i ~
d
e '"
n '"
c ~
e I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
, I
o I
I
I
I
I
I
I
I
, I
I
I
I
I
I
I
I
I
I
I
I
I
,
I
Figure 1: Execution of a single basic block many times on one path (Note: not to scale)
8 Journal of Software Testing, Verification and Reliability Vol. I. Issue No.3
On average, the first execution of a previously unexecuted basic block, BBj , gives the largest single
step increase in the confidence of correctness of that particular block with this increase being, in some
way, proportional to the number of mutants killed or, more specifically, mutant complexity. Assuming
that all paths can be viewed in the same way, then the flrst successful execution on path i will cause the
confidence to increase from zero to Cji (which will be fairly close to one). (Arguably, the initial
confidence would be greater than zero as, presumably, the code would have been written by a
competent programmer [1], and would have already been subjected to a number of static readings,
walkthroughs, analyses and, presumably, compilations, however, this will be ignored as the first
'proper' execution of the code will instantly absorb this small amount of confidence.) Subsequent
executions of this basic block on the same path will, on average, give smaller and smaller increases in
the confidence of correctness as more, but proportionally less, mutants die - an example of the law of
diminishing returns.
An intuitive and convenient model to apply here is one in which, on average, the increase in confidence
in the correctness of a basic block on a given path between successive executions of the path, is
proportional to the remaining amount of uncoverage of the block on that particular path. (This should be
compared to a similar assumption made, and justified, in [7].) In other words, at each successful
execution of a block BBj along path i, a constant proportion, 0 < Aji < 1, of the residual uncertainty,
(l-lCji), is removed. To formalise this, consider BBj executed once on path i, then:
The growth curve of lCji during the testing of one block on one path is depicted in Figure 2.
1
"'" ')
1 2 3 4 5
Number of executions
Figure 2: Growth of x for the testing of one block on one path (not to scale).
A.C. Marshall A Conceptual Model ofSoftware Testing 9
If, as is usually the case, basic block BBj lies not just on path i but on some other path, k say, then, as
has been argued above, the relationship Kji > 0 implies that Kji > Kjk> O.
This implies that it will not be necessary to test every path in a program, because when enough paths
through the block have been executed, the remaining unexecuted ones will already have gained a
sufficiently large amount of implied confidence to make any further testing superfluous.
Therefore, in theory, only a carefully selected critical subset of paths needs to be tested, however,
explicit determination of this subset is almost certainly not possible! The number of executions needed
to achieve a confidence of approximately unity in block BBj will be related to the number of theoretical
mutants of BBj . Given a comprehensively defmed set of mutant productions, (for example, see [5]) the
theoretical number of possible mutants could be estimated statically and specific factors which govern
the mutant death rate could also be assessed such as the degrees of polyjmultinomials [1,6]; code
complexity; the number of variables active in the block; the number and values of boolean variables
needed to kill all mutant boolean expressions [5]; the maximum dimension of arrays, etc.
As an illustrative example of how the confidence can vary between two statements of different
computational complexity, assume there are two single basic block programs Prand Ps- Pr is a one line
computation:
y = ao + ap:
and P, is another one line computation:
y = ao + aJx + a2x2 + ... + af;X9
It can be shown, see [6], that a polynomial of degree d requires the successful execution of (d+l)
unique test data instances in order to verify the correctness of its operators, coefficients and exponents.
Clearly, P, requires two distinct test data cases, whereas Ps needs ten.
From [6], assuming that the roots lie between 1 and X. one successful execution of a polynomial of
degree 1 gives a (1 - llX) probability that the computation is correct. and one successful execution of a
polynomial of degree 9 gives a (1 - 91X) chance of correctness. This illustrates how the value of ~i and
C)i can vary from block to block.
Basic block j
1
C
o
n /
f
i
... ... '"
d '" ...
e '" '"'"'" ...
n '"
r---;- '"'"'"
c ...
e
, I
...
'"
'"'"'"
I
... '"'"'"
,, I
... '"'"'"
, ... '"'"'"
I ... '"'"'"
... '"'"'"
, , ... '"'"
'"
... '"'"'"
,, ...
I
'"'"'"
'"
... '" '"
, ...
I '"
'" '"
I '"'"'"
... ... ... ... ... . ... ... . .. . .. ... . ..
.,
,, , '"'"'", '"'"'" '"'"'" '"'"'" '"'"'" '"'"'", '"'"'", '"'"'" '"'"'" '"'"'", '"'"'" '"'"'",
'"'"'"
I
'"'"
, ,, , ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,
I I I I I I I I
o I I ,
1 2 3 4 5 6 ... ., . ... . .. ., . ... m- m
Path number
Figure 3: Execution of a basic block once on each path passing through it (not to scale)
An expression for Pj can be formulated in much the same way as before: when BBj is executed once on
path i, then
Pj = Cji·
If, subsequently, BBj is executed on path i+I,
Here, ~j(i+l) is used to denote the proportional increase in confidence in BBj as a result of executing it
along a previously untested path. It should be noted that ~j(i+l) :;; t,(i+l) since the proportional increase
achieved by executing a different path is greater than that achieved by executing the same path again:
executing each of n > 1 different paths in a program once only results in a greater confidence than
executing one path n times.
If BBj is next executed on path i+2,
Pj = 1 + (Cji - 1) n
m=2
(1 -~j(i+m-1)) (2)
Equation (2) involves several undetermined constants: Cji and the ~ji' Clearly, the more of these that
need to be estimated, the less practicable the formula becomes. However, equation (2) can be
simplified. Consider executing BBj on path i and then subsequently on path k :;; i. This yields Pj = p*,
where:
p* = Cji + (1 - CjiJJljk
If, on the other hand, BBj is executed on the same two paths but in reverse order, this gives Pj = p+,
where:
p+ = Cjk + (1 - CjkJ~ji
Clearly, the order of execution is unimportant to the resulting confidence in BBj • that is,
p* = p+ whence:
implying:
By assuming, not unreasonably, that the rise in confidence due to the first execution is path
independent, then Cji = Cjk = Cj (say), which implies ~ji = ~jk = ~j (say).
Using the above simplification, equation (2) now becomes:
Pj = 1 + (cj-l)(l- ~j)n-l (3)
Resultingly, the confidence growth curve for basic block BBj will look like Figure 2 except that the
vertical axis will denote Pj .
A similar simplification can also be made to equation (1). For, consider BBj executed n times on path i,
whence:
12 Journal of Software Testing. Verification and Reliability Vol.i.lssue No.3
If it is now assumed that leji = leji , which is to say that equal confidence is achieved for n executions
along the same path, regardless of which path is executed, it is found that:
(Cji - 1)/(Cjk - 1) = (1 - AjJJn.l/(1 - 'Aji)n.l
and by assuming, as above, that Cji = Cjk = Cj , then Aji = Ajk = Aj (say), and leji = lejk = lej (say).
Whence, equation (1) becomes:
(4)
Here, p is the number of different paths through BBj which have been executed at least once, and q is
the total number of executions of paths through BBj minus p, that is, the total number of executions
which are not the first executions of a path.
Pi reflects both the number of times the testing of any path is duplicated as well as the number of
different paths that have been tested. (It should be noted that a graph similar to that in Figure 2 can also
be drawn for Pj .) The author anticipates that after a small number of executions, the value of Pj would
be in excess of 0.99. If all the blocks in a program were to have Pj = 0.99, then paths involving, for
example, 70 basic blocks, would be predicted to fail at least 50% of the time.
3.0 Implications
How does the above model of software testing relate to current testing / reliability estimation techniques
such as software reliability modelling and structural testing?
Traditionally, after having executed all statements, the tester will attempt to perform exhaustive branch
testing, which means that:
should be forced to unity. Note that Ter2 = 1 implies Ter] = 1, but Ter] = 1 does not imply Ter2 = 1.
All feasible LCSAJs (Linear Code Sequence And Jumps, or 'jump to jump' paths [9]) may be the next
testing goal, where:
If the behaviour detailed by the above relationship is applied to all the blocks on a particular path then,
by following the arguments given in the previous subsection, the probability of locating a bug (or the
size of the 'area' of code which could possibly contain a bug) decreases by an amount which decreases
with each execution, see Figure 4. Therefore, according to the model, the probability of an error
occurring traces an exponential decay with respect to the number of executions. Again, this can be seen
to fit in with expected behaviour [10].
0%
o 1 2 3 4 5
Number of executions
Figure 4: The decrease in probability of finding errors in a program as the number of executions
increases
4.0 Summary
An attempt has been made to introduce a conceptual model of the process of software testing which
explains phenomena which are experienced in practice. It is intended that this model should help the
reader to visualise what happens during testing by presenting the process in a much less abstract way
than usual. The two popular, yet separate, disciplines of structural and random testing are related by this
model and the proposed relationship between reliability, or confidence in the context of this paper, and
coverage, given in [7] and [8], can also be explained in terms of potential-error removal by testing.
5.0 Acknowledgements
The author would like to thank all the partners of the ESPRIT TRUST project, (Software Engineering
Services Gmb.H, City University, Liverpool Data Research Associates, John Bell Technical Systems
and Liverpool University) and, in particular, Dr. Alan Veevers and Dr. Derek Yates for their useful and
constructive comments.
6.0 References
[1] DeMilio R.A., Lipton R.i. and Sayward F.G. Hints on Test Data Selection: Help for the Practical
Programmer. Computer, vol.l l , 1978.
16 Journal of Software Testing. Verification and Reliability Vol. 1. Issue No.3
[2] Howden W. Weak musation and the Completeness of Test Data Sets. IEEE Trans. on Soft Eng., vol.
SE-8, 2, 1982.
[3] Fergus E., Marshall A.C., Veevers A, Hennell M.A and Hedley D. The Quantification of Software
Reliability. Proc. Second IEE/BCS Conference on Software Engineering, Liverpool, 1988.
[4] Veevers A, Petrova E. and Marshall AC. Statistical Methods for Software Reliability Assessment,
Past. Present and Future. in Achieving Safety and Reliability with Computer Systems, Daniels B.K.,
ed., Elsevier Applied Science, London, 1987.
[5] Wu D., Hennell M.A., Hedley D., and Riddell I.J. A Practical Method for Software Quality Control
via Program Mutation. IEEE Proc. Second Workshop on Software Testing, Verification and Analysis,
Banff, Canada, 1988.
[6] DeMilIo R.A and Lipton R.I. A Probabilistic Remark on Algebraic Program Testing. Inf. Proc.
Lett., vol. 7,4, 1978.
[7] Veevers A and Marshall A.C. A Relationship Between Software Coverage Metrics and Reliability.
submitted to IEEE Trans. on Reliability.
[8] Veevers A Software Coverage Metrics and Operational Reliability. in Safety of Computer Control
Systems, Daniels, B.K., ed., Pergamon Press, London, 1990.
[9] Hennell M.A, Hedley D. and Riddell I.J. Assessing a Class of Software Tools, Proc. Seventh ICSE,
Orlando, Florida, 1984.
[10] Littlewood B. A Bayesian Differential Debugging Model for Software Reliability. Proc. COMPSAC
1980, Chicago, 1980.
[11] Marshall A.C., Beattie B. and Veevers A. An Investigation of Alternatives to Time Based Software
Reliability Measures, Technical Report, University of Liverpool, 1991.