0% found this document useful (0 votes)
45 views

Lecture Notes On Software Testing As A Supplement To The Lecture "Dependable Systems"

This document provides an overview of different types of software testing including functional testing, structural testing, and mutation testing. It discusses various test adequacy criteria like statement coverage, decision coverage, data flow coverage, and how they relate to each other. The document also describes how to obtain reliability measures using time-based and structure-based models.

Uploaded by

Naveen John
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Lecture Notes On Software Testing As A Supplement To The Lecture "Dependable Systems"

This document provides an overview of different types of software testing including functional testing, structural testing, and mutation testing. It discusses various test adequacy criteria like statement coverage, decision coverage, data flow coverage, and how they relate to each other. The document also describes how to obtain reliability measures using time-based and structure-based models.

Uploaded by

Naveen John
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Lecture notes on Software Testing as a supplement to the

lecture “Dependable Systems”

Katinka Wolter

5. Januar 2006

Purpose of software testing: evaluate and increase software reliability, or MTTF.


Different kinds of testing exist and different classifications. E.g.

• functional testing evaluates the functionality of a software system without looking


into its structure (modules, classes). For a given input the output is examined.

• structural testing (coverage) examines the structure of software, assignments, loops,


conditions, etc. (This is white-box testing)

• user-oriented testing is similar to functional testing, only that it uses the software as a
whole. While functional testing executes single functions, user-oriented testing evaluates
the software as seen by a user. (This is a form of black-box testing.)

Code coverage analysis is a structural testing technique. Structural testing compares test
program behavior against the apparent intention of the source code. Structural testing ex-
amines how the program works, taking into account possible pitfalls in the structure and
logic. Functional testing examines what the program accomplishes, without regard to how it
works internally.
Structural testing is also called path testing since you choose test cases that cause paths to
be taken through the structure of the program. Do not confuse path testing with the path
coverage measure, explained later.
At first glance, structural testing seems unsafe. Structural testing cannot find errors of omis-
sion. However, requirements specifications sometimes do not exist, and are rarely complete.
This is especially true near the end of the product development time line when the require-
ments specification is updated less frequently and the product itself begins to take over the
role of the specification. The difference between functional and structural testing blurs near
release time.
We can distinguish testing as based on the software phase in which testing is used:

• unit (small software components)

• integration (larger software components)

1
• product (the whole system)

• regression (re-release)

The commonly used white-box testing is used to achieve structural coverage. It consists of

• data and control flow testing

• mutation testing.

In particular we distinguish

1. Statement coverage. Every statement is executed at least once. Does statement co-
verage = 1.0 provide a guarantee for a fault free programm P?
The chief disadvantage of statement coverage is that it is insensitive to some control
structures. For example, consider the following C/C++ code fragment:

int* p = NULL;
if (condition)
p = &variable;
*p = 123;

Without a test case that causes condition to evaluate false, statement coverage rates
this code fully covered. In fact, if condition ever evaluates false, this code fails. This is
the most serious shortcoming of statement coverage. If-statements are very common.
Statement coverage does not report whether loops reach their termination condition
- only whether the loop body was executed. With C, C++, and Java, this limitation
affects loops that contain break statements.
Since do-while loops always execute at least once, statement coverage considers them
the same rank as non-branching statements.
Statement coverage is completely insensitive to the logical operators (k and &&).
Statement coverage cannot distinguish consecutive switch labels.
Test cases generally correlate more to decisions than to statements. You probably would
not have 10 separate test cases for a sequence of 10 non-branching statements; you
would have only one test case. For example, consider an if-else statement containing
one statement in the then-clause and 99 statements in the else-clause. After exercising
one of the two possible paths, statement coverage gives extreme results: either 1% or
99% coverage. Basic block coverage eliminates this problem.
Block coverage uses a sequence of statements instead of single statements and for the
rest is like statement coverage.

2. Decision coverage is a measure indicating whether every decision in the code evaluated
to true and false.
This measure has the advantage of simplicity without the problems of statement cover-
age.

2
A disadvantage is that this measure ignores branches within boolean expressions which
occur due to short-circuit operators. For example, consider the following C/C++/Java
code fragment:

if (condition1 && (condition2 || function1()))


statement1;
else
statement2;

This measure could consider the control structure completely exercised without a call to
function1. The test expression is true when condition1 is true and condition2 is true, and
the test expression is false when condition1 is false. In this instance, the short-circuit
operators preclude a call to function1.

3. Data flow coverage. This variation of path coverage considers only the subpaths from
variable assignments to subsequent references of the variables. It indicates whether all
defuse pairs are covered. Example:

S1 : x = f ()
S2 : p = g(x, .)

S1 is the definition and S2 is the use of variable x, then S1 , S2 is a def-use pair.

– c-use, uses x in a computational expression


– p-use, uses x in a predicate.

A path (S1 , S2 ) is definition-free, if no other statement between S1 and S2 defines x.


A path is feasible, if ∃d ∈ D where d is a test case from the input domain and test set
D, such that P (d) executes (S1 , S2 ).
All statements following statement S are its sucessors.
A c-use or p-use is covered if there exists at least one input d which executes a definition-
free path (S1 , S2 ) and to all its successors.
The advantage of this measure is the paths reported have direct relevance to the way the
program handles data. One disadvantage is that this measure does not include decision
coverage. Another disadvantage is complexity.

4. Mutation testing. Given a program P, generate mutants that are syn-


tactically correct by using some algorithm, rules or tools, such as Mothra
(http://ise.gmu.edu/∼ofut/rsrch/mut.html) implementing heuristics such as genetic al-
gorithms, simulated annealing etc.
Mutation testing is based upon seeding the implementation with a fault (mutating
it), by applying a mutation operator, and determining whether testing identifies this
fault. If a test case d distinguishes between the mutant M and the original program P,
M (d) 6= P (d) it is said to kill the mutant. The idea behind mutation testing is quite
simple: given an appropriate set of mutation operators, if a test set kills the mutants
generated by these operators then, since it is able to find these small differences, it is
likely to be good at finding real faults.

3
Mutation testing may be used to judge the effectiveness of a test set: the test set should
kill all the mutants. Similarly, test generation may be based on mutation testing: tests
are generated to kill the mutants. Interestingly, many test criteria may be represented
using mutation testing by simply choosing appropriate mutation operators.

Functional testing uses operational profiles. An operational profile consists in test inputs
together with their relative frequencies of use.

Operational profile = {(d, p), d ∈ D, p ∈ [0, 1]}

To generate an operational profile first developing a customer profile is needed.


For all four white-box test methods criteria of adequacy exist.

1) A test set T is adequate with respect to (wrt) decision coverage, if all decisions in a
software system are covered when executed against all t ∈ T.

2) A test set T is adequate wrt p-use (or c-use), if all p-uses (c-uses) are covered by T .

3) T is adequate wrt the mutation criterion if it distinguishes all non-equivalent mutants.

Item 1) - 3) are measurable. For functional testing no measurable criterion exists.


Some properties:

• If T is p-use or c-use adequate, then T is decision adequate. (A formal proof exists). We


say that data flow coverage subsumes decision coverage (i.e. decision coverage ⊆ data
flow coverage).

• mutation adequate ⇒ data flow adequate. (⇐ does typically not hold.)

• If a test set is not data flow adequate, it is not mutation adequate.

• for several types of errors structural testing is not sufficient, but functional testing is.
(Errors of ommission).

• Testsequence always first employs functional testing, then structural testing.

How to obtain reliability measures?


Reliability estimates are computed using a time-based model or a structure-based model.
We formalise as follows. Let P be the program to test on a test case d from input domain D.
Let Tk be the time of the k-th failure, Nk the number of tests used by time Tk . Define the
testing effort Ek as
(
. Tk − Tk−1 for the time-based model
Ek =
Nk − Nk−1 for the test-case-based model

4
Let ei be the effort in execution i of P , then
l2
X
Ek = ei
i=l1

where el1 and el2 is the effort of the first and last execution of P during the k-th failure time
interval.
Another view on reliability: The reliability R of P is the probability of no failure over the
entire input domain.
R = P rP (d)is correct for any d ∈ D

Time-based result: Let


k
X
Sk = Ei
i=1

be the cumulative effort over k inter-failure epochs. Let x be the expose period. The probability
that the software will not fail during the next x time units is formalised as

R(x|t) ≡ P rEk > x|Sk−1 = t

Convergence of R(x|t):
R(x|t) → R as x → ∞
if the test inputs are operationally significant.
Example: In studies and semi-formal proofs it could be shown that structural testing is not
able to reveal all faults. For functional testing not even a saturation effect can be proven.
In a study, TEX by Knuth and AWK by Kernighan were tested using the tools TRIPTEST
(TEX) and ATAC.
The coverage statistics are

Block Decision p-use c-use


TEX 85 72 53 48
AWK 70 59 48 55

A possible scenario would look employ a test sequence as the one shown in the following graph.
The dashed fields indicate the saturation region, where the test method employed does not
reveal any more faults.

References

• Handbook of Software Reliability Engineering, Michael Lyu (Ed.)

• http://www.bullseye.com/coverage

5
F residual faults 1111
0000
0000
1111
Mutation 0000
1111
0000
1111
1111111
0000000
0000000
1111111 0000
1111
Data flow 0000000
1111111 0000
1111
0000
1111
0000000
1111111 0000
1111
faults revealed

0000000
1111111
0000000
1111111 0000
1111
Saturation
0000000
1111111 0000
1111
11111
00000
region
0000000
1111111 0000
1111
0000
1111
Decision 00000
11111
00000
11111 0000000
1111111 0000
1111
111111
000000
Functional
00000
11111 0000000
1111111 0000
1111
000000
111111
000000
111111 00000
11111 0000000
1111111 0000
1111
000000
111111 00000
11111 0000000
1111111
0000000
1111111 0000
1111
000000
111111 00000
11111 0000000
1111111 0000
1111
00000
11111 0000
1111
testing effort (t)

You might also like