100% found this document useful (2 votes)
165 views

Software Testing Seminar: Mooly Sagiv Tel Aviv University 640-6706 Sunday 16-18 Monday 10-12 Schrieber 317

The document describes a seminar on software testing. The seminar will be held on Sunday from 16:00-18:00 and Monday from 10:00-12:00 in room 317 of the Schrieber building at Tel Aviv University, led by Professor Mooly Sagiv. The seminar will cover topics such as testing questions, goals of testing, test case design, and techniques for achieving reliability.

Uploaded by

intex002
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
165 views

Software Testing Seminar: Mooly Sagiv Tel Aviv University 640-6706 Sunday 16-18 Monday 10-12 Schrieber 317

The document describes a seminar on software testing. The seminar will be held on Sunday from 16:00-18:00 and Monday from 10:00-12:00 in room 317 of the Schrieber building at Tel Aviv University, led by Professor Mooly Sagiv. The seminar will cover topics such as testing questions, goals of testing, test case design, and techniques for achieving reliability.

Uploaded by

intex002
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 39

Software Testing Seminar

Mooly Sagiv
http://www.math.tau.ac.il/~sagiv/courses/testing.html
Tel Aviv University
640-6706
Sunday 16-18
Monday 10-12 Schrieber 317
Bibliography

• Michael Young, University of Oregon UW/MSR 1999


• Glenford Myers “The Art of Software Testing”
1978!
Outline
• Testing Questions
• Goals of Testing
• The Psychology of Testing
• Program Inspections and Reviews
• Test Case Design
• Achieving Reliability
• Other Techniques
• A Success Story
Standard Testing Questions

• Did this test execution succeed or fail?


– Oracles
• How shall we select test cases?
– Selection; generation
• How do we know when we’ve tested enough?
– Adequacy
• What do we know when we’re done?
– Assessment
Possible Goals of Testing

• Find faults
– Glenford Myers, The Art of Software Testing
• Provide confidence
– of reliability
– of (probable) correctness
– of detection (therefore absence) of particular
faults
Testing Theory (such as it is)

• Plenty of negative results


– Nothing guarantees correctness
– Statistical confidence is prohibitively expensive
– Being systematic may not improve fault detection
• as compared to simple random testing

• So what did you expect, decision procedures


for undecidable problems?
What Information Can We Exploit?

• Specifications (formal or informal)


– in Oracles
– for Selection, Generation, Adequacy
• Designs
–…
• Code
– for Selection, Generation, Adequacy
• Usage (historical or models)
• Organization experience
The Psychology of Testing

• “Testing is the process of demonstrating that


an errors are not present”

• “Testing is the process of establishing


confidence that the program does what it
intends to do”

• “Testing is the process of executing programs


with the intent of finding errors”
– Successful (positive) test: exposes an error
Black Box Testing

• View the program as a black box


• Exhaustive testing is infeasible even for tiny
programs
– Can never guarantee correctness
– Fundamental question is economics
– “Maximize investment”
– Example: “Partition test”
White Box Testing
• Investigate the internal structure of the
program
• Exhaustive paths testing is infeasible
• Does not even guarantee correctness
– Specification is needed
– Missing paths if (a-b < epsilon)
– Data dependent paths

• Becomes an economical question


Testing Principles
• Test case must include the definition of expected
results
• A program should not test her/his own code
• A programming organization should not test its own
programs
• Thoroughly inspect the results of each test
• Test cases must be also written for invalid inputs
• Check that programs do not do unexpected things
• Test cases should not be thrown a way
• Do not plan testing assuming that there are no errors
• The probably of error in a piece of code is
proportional to the errors found so far in this part of
the code
Program Inspections and Reviews
• Conducted in groups/sessions
• A well established process
• Modern programming languages eliminate
many programming errors:
– Type errors
– Memory violations
– Uninitialized variables
• But many errors are not currently found
– Division by zero
– Overflow
– Wrong precedence of logical operators
Test Case Design

• What subset of inputs has the highest


probability of detecting the most errors?
• Combine black and white box testing
Higher Order Testing

• Show that the program does not so what the


end-user expects…
• Different testing levels
– Unit test (procedure boundaries)
– Module test
– Function test (discrepancies external spec.)
– System test (discrepancies with original
objectives)
• Facility test
• Volume test
• Usability test
• Security test
• Performance test
• ...
Black box White box

•Equivalence partitioning •Coverage/Adequacy

•Boundary-value analysis •Data flow analysis

•Cause-effect graphing •Cleanness

•Error guessing •Correctness

•State space exploration •Mutation


•Slicing
Partition Testing
(Equivalence Partitioning)

• Basic idea: Divide program input space into


(quasi-) equivalence classes
– Underlying idea of specification-based, structural,
and fault-based testing
Specification-Based Partition Testing

• Divide the program input space according to


identifiable cases in the specification
– May emphasize boundary cases
– May include combinations of features or values
• If all combinations are considered, the space is usually
too large
• Systematically “cover” the categories
– May be driven by scripting tools or input
generators
– Example: Category-Partition testing [Ostrand]
“Adequate” testing

• Ideally: adequate testing ensures some


property (proof by cases)
– Origins in [Goodenough & Gerhart], [Weyuker and
Ostrand]
– In reality: as impractical as other program proofs
• Practical “adequacy” criteria are really
“inadequacy” criteria
– If no case from class XX has been chosen, surely
more testing is needed ...
Structural Coverage Testing

• (In)adequacy criteria
– If significant parts of program structure are not
tested, testing is surely inadequate
• Control flow coverage criteria
– Statement (node, basic block) coverage
– Branch (edge) and condition coverage
– Data flow (syntactic dependency) coverage
– Various control-flow criteria
• Attempted compromise between the
impossible and the inadequate
Basic structural criteria (ex.)

a Edge ac is required by all-edges


but not by all-nodes coverage
b
Typical loop coverage criterion
c would require zero iterations
(cdf), one iteration (cdedf), and
multiple iterations
d (cdededed...df)

f
Data flow coverage criteria (ex.)

x := 7
Rationale: An untested def-
use association could hide
y := x an erroneous computation

y := y+1 2 reaching definitions


(one is from self)
z := x+y 2 reaching definitions for x,
and 2 reaching definitions for y
The Infeasibility Problem

• Syntactically indicated behaviors (paths, data


flows, etc.) are often impossible
– Infeasible control flow, data flow, and data states
• Adequacy criteria are typically impossible to
satisfy
• Unsatisfactory approaches:
– Manual justification for omitting each impossible
test case (esp. for more demanding criteria)
– Adequacy “scores” based on coverage
• example: 95% statement coverage, 80% def-use
coverage
Challenges in Structural Coverage

• Interprocedural and gross-level coverage


– e.g., interprocedural data flow, call-graph
coverage
• Regression testing
• Late binding (OO programming languages)
– coverage of actual and apparent polymorphism
• Fundamental challenge: Infeasible behaviors
– underlies problems in inter-procedural and
polymorphic coverage, as well as obstacles to
adoption of more sophisticated coverage criteria
and dependence analysis
Structural Coverage in Practice

• Statement and sometimes edge or condition


coverage is used in practice
– Simple lower bounds on adequate testing; may
even be harmful if inappropriately used for test
selection
• Additional control flow heuristics sometimes
used
– Loops (never, once, many), combinations of
conditions
Testing for Reliability

• Reliability is statistical, and requires a


statistically valid sampling scheme
• Programs are complex human artifacts with
few useful statistical properties
• In some cases the environment (usage) of
the program has useful statistical properties
– Usage profiles can be obtained for relatively
stable, pre-existing systems (telephones), or
systems with thoroughly modeled environments
(avionics)
Arbitrary ≠ Random

• A common error in naïve attempts to obtain


statistical confidence measures
– Arbitrary distributions may be modeled by
adversary functions, not by uniform distributions
• Example:
– If failures were distributed randomly through the
execution space of a database program, it would
fail at a uniform rate over time.
– In reality, it may never fail until a critical table
overflows, and then always fail thereafter.
Certifying Ultra-High reliability

• Problem: How can I show that system X has


an expected failure rate of 10-9/hour?
– example: probability that software will ever bring
down an Airbus A320
• Butler & Finelli estimate
– for 10-9 per 10 hour mission
– requires: 1010 hours testing with 1 computer
– or: 106 hours (114 years) testing with 10,000
computers
[ACM Sigsoft 91, Conf. on SW for Critical Systems]
Glimmers of Hope
for Measuring High Reliability

• Random distribution of faults or failures would


enable statistical reasoning and classic
redundancy techniques
– A whole more reliable than its parts
• Randomization approaches
– Blum: Self-checking programs
– Lipton: Redundant computations
– Podgurski: Kolmogorov complexity
• Grail or illusion?
– Difficult to generalize beyond simple functions
Process-Based Reliability Testing

• Rather than relying only on properties of the


program, we may use historical
characteristics of the development process
• Reliability growth models (Musa, Littlewood,
et al) project reliability based on experience
with the current system and previous similar
systems
Fault-based testing

• Given a fault model


– hypothesized set of deviations from correct
program
– typically, simple syntactic mutations; relies on
coupling of simple faults with complex faults
• Coverage criterion: Test set should be
adequate to reveal (all, or x%) faults
generated by the model
– similar to hardware test coverage
Fault Models

• Fault models are key to semiconductor


testing
– Test vectors graded by coverage of accepted
model of faults (e.g., “stuck-at” faults)
• What are fault models for software?
– What would a fault model look like?
– How general would it be?
• Across application domains?
• Across organizations?
• Across time?
• Defect tracking is a start
The Budget Coverage Criterion

• A common answer to “when is testing done”


– When the money is used up
– When the deadline is reached
• This is sometimes a rational approach!
– Implication 1: Test selection is more important
than stopping criteria per se.
– Implication 2: Practical comparison of approaches
must consider the cost of test case selection
Test Selection: Standard Advice

• Specification coverage is good for selection


as well as adequacy
– applicable to informal as well as formal specs
• + Fault-based tests
– usually ad hoc, sometimes from check-lists
• Program coverage last
– to suggest uncovered cases, not just to achieve a
coverage criterion
The Importance of Oracles

• Much testing research has concentrated on


adequacy, and ignored oracles
• Much testing practice has relied on the
“eyeball oracle”
– Expensive, especially for regression testing
• makes large numbers of tests infeasible
– Not dependable
• Automated oracles are essential to cost-
effective testing
Sources of Oracles

• Specifications
– sufficiently formal (e.g., Z spec)
– but possibly incomplete (e.g., assertions in Anna,
ADL, APP, Nana)
• Design models
– treated as specifications, as in protocol
conformance testing
• Prior runs (capture/replay)
– especially important for regression testing and
GUIs; hard problem is parameterization
What can be automated?

• Oracles
– assertions; replay; from some specifications
• Selection (Generation)
– scripting; specification-driven; replay variations
– selective regression test
• Coverage
– statement, branch, dependence
• Management
Design for Test: 3 Principles
Adapted from circuit and chip design

• Observability
– Providing the right interfaces to observe the
behavior of an individual unit or subsystem
• Controllability
– Providing interfaces to force behaviors of interest
• Partitioning
– Separating control and observation of one
component from details of others
Problems & Opportunities

• Compositionality
– for components; for regression
• Specifications
– low entry barrier, incremental payoff
• Synergy with Analysis
– conformance test w/ verified models
– “backstop” for unsafe assumptions
• … (your idea here)
A recent success story
• The Prefix program analysis tool
• Analyzes C/C++ sources
• Scans for cleanness bugs, e.g., dereferences
to NULL pointers
• Symbolically executes the program on some
paths
• May miss some errors and generate false
alarms
• Tried on Windows 2000
• Located 65,000 potential bugs
28,000 out of which are real bugs

You might also like