Enabling Automated Software Testing with Artificial Intelligence

.lusoftware veriﬁcation & validation
VVS
Enabling Automated Software Testing
with Artificial Intelligence
Lionel Briand
Nanjing University, 2018

Objectives
• Applications of main AI techniques to test automation
• Overview (partial) and lessons learned, with pointers for
further information
• Industrial research projects
• Disclaimer: Inevitably biased presentation based on personal
experience. This is not a survey.
2

Collaborative Research @ SnT
3
• Research in context
• Addresses actual needs
• Well-defined problem
• Long-term collaborations
• Our lab is the industry

SVV Dept.
4
• Established in 2012, part of the SnT centre
• Requirements Engineering, Security Analysis, Design Verification,
Automated Testing, Runtime Monitoring
• ~ 30 lab members
• Partnerships with industry
• ERC Advanced grant

Definition of Software Testing
• International Software Testing Qualifications Board:
“Software testing is a process of executing a program or
application with the intent of finding the software bugs. It can
also be stated as the process of validating and verifying that
a software program or application or product meets the
business and technical requirements that guided its design and
development.”
6

Main Challenge
• The main challenge in testing software systems is
scalability
• Scalability: The extent to which a technique can be applied
on large or complex artifacts (e.g., input spaces, code,
models) and still provide useful, automated support with
acceptable effort, CPU, and memory?
• Effective automation is a prerequisite for scalability
7

Software Testing Overview
8
SW Representation
(e.g., specifications)
SW Code
Derive Test cases
Execute Test cases
Compare
Expected
Results or properties
Get Test Results
Test Oracle
[Test Result==Oracle][Test Result!=Oracle]
Automation!

Importance of Software Testing
• Software testing is the most prevalent verification and validation
technique in practice
• It represents a large percentage of software development costs,
e.g., >50% is not rare
• Testing services are a USD 9-Billion market
• The cost of software failures was estimated to be (a very minimum
of) USD 1.1 trillion in 2016
• Inadequate tools and technologies is one of the most important
factors of testing costs and inefficiencies
9

Metaheuristic Search
• Stochastic optimization
• E.g., evolutionary computing
• Efficiently explore the search space in order to find good (near-
optimal) feasible solutions
• Provide no guarantee of global or local optimality
• Address both discrete- and continuous-domain optimization
problems
• Applicable to many practical situations, including SW testing
10

Genetic Algorithms (GAs)
Genetic Algorithm: Population-based, search algorithm
inspired be evolution theory
Natural selection: Individuals that best
fit the natural environment survive
Reproduction: surviving individuals
generate offsprings (next generation)
Mutation: offsprings inherits
properties of their parents with some
mutations
Iteration: generation after generation
the new offspring fit better the
environment than their parents
11

Search-Based Software Testing
• Express test generation
problem as a search or
optimization problem
• Search for test input data
with certain properties, i.e.,
source code coverage
• Non-linearity of software (if,
loops, …): complex,
discontinuous, non-linear
search spaces
e search space neighbouring the
for fitness. If a better candidate
mbing moves to that new point,
rhood of that candidate solution.
the neighbourhood of the current
fers no better candidate solutions;
If the local optimum is not the
gure 3a), the search may benefit
performing a climb from a new
cape (Figure 3b).
le Hill Climbing is Simulated
Simulated Annealing is similar to
ement around the search space is
be made to points of lower fitness
he aim of escaping local optima.
bability value that is dependent
‘temperature’, which decreases
ogresses (Figure 4). The lower
kely the chances of moving to a
ch space, until ‘freezing point’ is
the algorithm behaves identically
d Annealing is named so because
hysical process of annealing in
curve of the fitness landscape until a local optimum is found. The fina
position may not represent the global optimum (part (a)), and restarts ma
be required (part (b))
Fitness
Input domain
Figure 4. Simulated Annealing may temporarily move to points of poore
fitness in the search space
Fitness
Input domain
Figure 5. Genetic Algorithms are global searches, sampling many poin
in the fitness landscape at once
“Search-Based Software Testing: Past, Present and Future”
Phil McMinn
Genetic Algorithm
12
cusses future directions for Search-Based
g, comprising issues involving execution
estability, automated oracles, reduction of
st and multi-objective optimisation. Finally,
udes with closing remarks.
-BASED OPTIMIZATION ALGORITHMS
form of an optimization algorithm, and
mplement, is random search. In test data
s are generated at random until the goal of
mple, the coverage of a particular program
nch) is fulfilled. Random search is very poor
ns when those solutions occupy a very small
ll search space. Such a situation is depicted
re the number of inputs covering a particular
are very few in number compared to the
ut domain. Test data may be found faster
ly if the search is given some guidance.
c searches, this guidance can be provided
a problem-specific fitness function, which
points in the search space with respect to
or their suitability for solving the problem
Input domain
portion of
input domain
denoting required
test data
randomly-generated
inputs
Figure 2. Random search may fail to fulfil low-probability test goals
Fitness
Input domain
(a) Climbing to a local optimum

Machine Learning and Testing
• ML supports decision making
based on data
• Test planning
• Test cost estimation
• Test case management
• Test case prioritization
• Test case design
• Test case refinement
• Test case evaluation
13
• Debugging
• Fault localization
• Bug prioritization
• Fault prediction
• “Machine Learning-based Software
Testing: Towards a Classification
Framework.” SEKE 2011

NLP and Testing
• Natural language is prevalent in software development
• User documentation, procedures, natural language
requirements, etc.
• Natural Language Processing (NLP)
• Can it be used to help automate testing?
• Derive test cases, including oracles, from textual
requirements or specifications
• Traceability between requirements and system test cases
(required by many standards)
14

Industrial Research Projects
15

Testing Advanced Driving
Assistance Systems
[Ben Abdessalem et al.]
16

Cyber-Physical Systems
• A system of collaborating computational elements controlling
physical entities
17
17

Advanced Driver Assistance
Systems (ADAS)
18
Automated Emergency Braking (AEB)
Pedestrian Protection (PP)
Lane Departure Warning (LDW)
Traffic Sign Recognition (TSR)

Advanced Driver Assistance
Systems (ADAS)
Decisions are made over time based on sensor data
19
Sensors
Controller
Actuators Decision
Sensors
/Camera
Environment
ADAS

Automotive Environment
• Highly varied environments, e.g., road topology, weather, building
and pedestrians …
• Huge number of possible scenarios, e.g., determined by
trajectories of pedestrians and cars
• ADAS play an increasingly critical role in modern vehicles
• A challenge for testing
20

A General and Fundamental Shift
• Increasingly so, it is easier to learn behavior from data using
machine learning, rather than specify and code
• ADAS components may rely on deep learning …
• Millions of weights learned (neural networks)
• No explicit code, no specifications
• Verification, testing?
21

CPS Development Process
22
Functional modeling:
• Controllers
• Plant
• Decision
Continuous and discrete
Simulink models
Model simulation and
testing
Architecture modelling
• Structure
• Behavior
• Traceability
System engineering modeling
(SysML)
Analysis:
• Model execution and
testing
• Model-based testing
• Traceability and
change impact
analysis
• ...
(partial) Code generation
Deployed executables on
target platform
Hardware (Sensors ...)
Analog simulators
Testing (expensive)
Hardware-in-the-Loop
Stage
Software-in-the-Loop
Stage
Model-in-the-Loop Stage
22

Our Goal
• Developing an automated testing technique
for ADAS
23
• To help engineers efficiently and
effectively explore the complex test input
space of ADAS
• To identify critical (failure-revealing) test
scenarios
• Characterization of input conditions that
lead to most critical situations, e.g.,
safety violations

24
Automated Emergency Braking
System (AEB)
24
“Brake-request”
when braking is needed
to avoid collisions
Decision making
Vision
(Camera)
Sensor
Brake
Controller
Objects’
position/speed

Example Critical Situation
• “AEB properly detects a pedestrian in front of the car with a
high degree of certainty and applies braking, but an accident
still happens where the car hits the pedestrian with a
relatively high speed”
25

Testing ADAS
26
A simulator based on
Physical/Mathematical models
Time-consuming
Expensive
On-road testing
Simulation-based (model) testing
Unsafe

Testing via Physics-based
Simulation
27
ADAS
(SUT)
Simulator (Matlab/Simulink)
Model
(Matlab/Simulink)
▪ Physical plant (vehicle / sensors / actuators)
▪ Other cars
▪ Pedestrians
▪ Environment (weather / roads / traﬃc signs)
Test input
Test output
time-stamped output

AEB Domain Model
- visibility:
VisibilityRange
- fog: Boolean
- fogColor:
FogColor
Weather
- frictionCoeff:
Real
Road1
- v0 : Real
Vehicle
- : Real
- : Real
- : Real
- :Real
Pedestrian
- simulationTime:
Real
- timeStep: Real
Test
Scenario
1
1
- ModerateRain
- HeavyRain
- VeryHeavyRain
- ExtremeRain
«enumeration»
RainType- ModerateSnow
- HeavySnow
- VeryHeavySnow
- ExtremeSnow
«enumeration»
SnowType
- DimGray
- Gray
- DarkGray
- Silver
- LightGray
- None
«enumeration»
FogColor
1
WeatherC
{{OCL} self.fog=false
implies self.visibility = “300”
and self.fogColor=None}
Straight
- height:
RampHeight
Ramped
- radius:
CurvedRadius
Curved
- snowType:
SnowType
Snow
- rainType:
RainType
Rain
Normal
- 5 - 10 - 15 - 20
- 25 - 30 - 35 - 40
«enumeration»
CurvedRadius (CR)
- 4 - 6 - 8 - 10 - 12
«enumeration»
RampHeight (RH)
- 10 - 20 - 30 - 40 - 50
- 60 - 70 - 80 - 90 - 100
- 110 - 120 - 130 - 140
- 150 - 160 - 170 - 180
- 190 - 200 - 210 - 220
- 230 - 240 - 250 - 260
- 270 - 280 - 290 - 300
«enumeration»
VisibilityRange
- : TTC: Real
- : certaintyOfDetection:
Real
- : braking: Boolean
AEB Output
- : Real
- : Real
Output functions
Mobile
object
Position
vector
- x: Real
- y: Real
Position
1 11
1
1
Static input
1
Output
1
1
Dynamic input
xp
0
yp
0
vp
0
✓p
0
vc
0
v3
v2
v1
F1
F2

ADAS Testing Challenges
• Test input space is large, complex and multidimensional
• Explaining failures and fault localization are difficult
• Execution of physics-based simulation models is computationally
expensive
29

Our Approach
• We use decision tree classification models
• We use multi-objective search algorithm (NSGAII)
• Objective Functions:
• Each search iteration calls simulation to compute objective
functions
30
1. Minimum distance between the pedestrian and the
field of view
2. The car speed at the time of collision
3. The probability that the object detected is a pedestrian

Multiple Objectives: Pareto Front
31
Individual A Pareto
dominates individual B if
A is at least as good as B
in every objective
and better than B in at
least one objective.
Dominated by x
F1
F2
Pareto front
x
• A multi-objective optimization algorithm (e.g., NSGA II) must:
• Guide the search towards the global Pareto-Optimal front.
• Maintain solution diversity in the Pareto-Optimal front.

Search-based Testing Process
32
Test input generation (NSGA II)
Evaluating test inputs
- Select best tests
- Generate new tests
(candidate)
test inputs
- Simulate every (candidate) test
- Compute fitness functions
Fitness
values
Test cases revealing worst case system behaviors
Input data ranges/dependencies + Simulator + Fitness functions

Search: Genetic Evolution
33
Initial input
Fitness
computation
Selection
Breeding

Better Guidance
• Fitness computations rely on simulations and are very
expensive
• Search needs better guidance
34

Decision Trees
35
Partition the input space into homogeneous regions
All points
Count 1200
“non-critical” 79%
“critical” 21%
“critical” 41%
Count 564 Count 636
“critical” 2%
Count 412
“critical” 51%
Count 152
“critical” 16%
Count 230 Count 182
vp
0 >= 7.2km/h vp
0 < 7.2km/h
✓p
0 < 218.6 ✓p
0 >= 218.6
RoadTopology(CR = 5,
Straight, RH = [4 12](m))
RoadTopology
(CR = [10 40](m))
“critical” 69%
“critical” 28%

Genetic Evolution Guided by
Classification
36
Initial input
Fitness
computation
Classification
Selection
Breeding

Search Guided by Classification
37
Test input generation (NSGA II)
Evaluating test inputs
Build a classification tree
Select/generate tests in the fittest regions
Apply genetic operators
Input data ranges/dependencies + Simulator + Fitness functions
(candidate)
test inputs
- Simulate every (candidate) test
- Compute fitness functions
Fitness
values
Test cases revealing worst case system behaviors +
A characterization of critical input regions

NSGAII-DT vs. NSGAII
38
NSGAII-DT outperforms NSGAII
HV
0.0
0.4
0.8
GD
0.05
0.15
0.25
SP
2
0.6
1.0
1.4
6 10 14 18 22 24
Time (h)
NSGAII-DT
NSGAII

Automatic Generation of
System Test Cases
from Requirements
in Natural Language
[Wang et al.]
39

Problem
Automatically verify the compliance of
software systems with their functional
requirements in a cost-effective way
40

Context
Automotive Embedded Systems
41

Working Assumption
Use Case
Specifications
Domain
Model
42

Use Case
Specifications
(RUCM template)
Concise
Mapping Table
Domain
Model
Automated Generation
Regex Mapping
weight=[d+] Sensor.setWeight
initialized=true System.start
43
Executable Test Cases

Use Case
Specifications
(RUCM template)
Concise
Mapping Table
Domain
Model
Automated Generation
Regex Mapping
weight=[d+] Sensor.setWeight
initialized=true System.start
44
Executable Test Cases
NL processing

Use Case Specifications
Example
BodySense: embedded system that determines the occupancy status of
seats in a car
45

Use Case Specifications
Example
Precondition: The system has been initialized
Basic Flow
1. The SeatSensor SENDS the weight TO the system.
2. INCLUDE USE CASE Self Diagnosis.
3. The system VALIDATES THAT no error has been detected.
4. The system VALIDATES THAT the weight is above 20 Kg.
5. The system sets the occupancy status to adult.
6. The system SENDS the occupancy status TO AirbagControlUnit.
--written according to RUCM (Yue’13) template--
46

47
Precondition: The system has been initialized
Basic Flow
1. The SeatSensor SENDS the weight TO the system.
2. INCLUDE USE CASE Self Diagnosis.
3. The system VALIDATES THAT no error has been detected.
4. The system VALIDATES THAT the weight is above 20 Kg.
5. The system sets the occupancy status to adult.
6. The system SENDS the occupancy status TO AirbagControlUnit.
Alternative Flow
RFS 4.
1. IF the weight is above 1 Kg THEN
2. The system sets the occupancy status to child.
3. ENDIF.
4. RESUME STEP 6.

UseCaseStart
Input
Condition
Condition
Output
Exit
Condition
Internal
Internal
Include INCLUDE USE CASE Self Diagnosis.
IF the weight is above 1 Kg THEN
The SeatSensor SENDS the weight TO the system.
The system sets the occupancy status to adult.
The system SENDS the occupant class TO AirbagControlUnit.
The system VALIDATES THAT no error has been detected.
The system sets the occupancy status to child.
The system VALIDATES THAT the weight is above 20 Kg.
Precondition: The system has been initialized.
Model-based
Test Case Generation
driven by
coverage criteria
48

Domain Model:
Formalizing Conditions
OCL constraint:
“The system VALIDATES THAT no error has been detected.”
Error.allInstances()->forAll( i | i.isDetected = false)
49

UseCaseStart
Input
Condition
Condition
Output
Exit
Condition
Internal
Internal
Include INCLUDE USE CASE Self Diagnosis.
IF the weight is above 1 Kg THEN
The SeatSensor SENDS the weight TO the system.
The system sets the occupancy status to adult.
The system VALIDATES THAT no error has been detected.
The system sets the occupancy status to child.
The system VALIDATES THAT the weight is above 20 Kg.
Precondition: The system has been initialized.
OCL
OCL
OCL
OCL
System.allInstances()->forAll( s | s.initialized = true )
AND System.allInstances()->forAll( s | s.initialized = true )
AND Error.allInstances()->forAll( e | e.isDetected = false)
AND System.allInstances()
->forAll( s | s.occupancyStatus = Occupancy::Adult )
Path condition:
Constraint
Solving
Test inputs:
system : BodySense
initialized = true
occupancyStatus = Adult
weight = 40
te : TemperatureError
isDetected = false
ve : VoltageError
isDetected = false
errors
errors

Automated Generation of OCL
Expressions
“The system VALIDATES THAT
no error has been detected.”
OCLgen
51

Entity Name left-hand side
(variable)
right-hand side
(variable/value)
operator
Pattern
52

OCLgen Solution
“The system sets the occupancy status to adult.”
actor affected by the verb final state
1. determine the role of words in a sentence
53

OCLgen solution
2. match words in the sentence with concepts in the domain model
54

OCLgen solution
BodySense.allInstances()
->forAll( i | i.occupancyStatus = Occupancy::Adult)
3. generate the OCL constraint using a verb-specific transformation rule
55

OCLgen solution
BodySense.allInstances()
->forAll( i | i.occupancyStatus = Occupancy::Adult)
Based on Semantic Role Labeling
Lexicons that describe the sets of roles typically
Based on String similarity
56
3. generate the OCL constraint using a verb-specific transformation rule

Schedulability Analysis and
Stress Testing
[Di Alesio et al.]
57

Problem and Context
• Schedulability analysis encompasses techniques that try to
predict whether (critical) tasks are schedulable, i.e., meet
their deadlines
• Stress testing runs carefully selected test cases that have
a high probability of leading to deadline misses
• Stress testing is complementary to schedulability analysis
• Testing is typically expensive, e.g., hardware in the loop
• Finding stress test cases is difficult
58

Finding Stress Test Cases is Hard
59
0
1
2
3
4
5
6
7
8
9
j0, j1 , j2 arrive at at0 , at1 , at2 and must
finish before dl0 , dl1 , dl2
J1 can miss its deadline dl1 depending on
when at2 occurs!
0
1
2
3
4
5
6
7
8
9
j0 j1 j2 j0 j1 j2
at0
dl0
dl1
at1 dl2
at2
T
T
at0
dl0 dl1
at1
at2
dl2

Challenges and Solutions
• Ranges for arrival times form a very large input space
• Task interdependencies and properties constrain what
parts of the space are feasible
• Solution: We re-expressed the problem as a constraint
optimization problem and used a combination of constraint
programming (CP, IBM CPLEX) and meta-heuristic search
(GA)
• GA is scalable and CP offers guarantees
60

Constraint Optimization
61
Constraint Optimization Problem
Static Properties of Tasks
(Constants)
Dynamic Properties of Tasks
(Variables)
Performance Requirement
(Objective Function)
OS Scheduler Behaviour
(Constraints)

Combining CP and GA
62
A:12 S. Di Alesio et al.
Fig. 3: Overview of GA+CP: the solutions x , y and z in the initial population of GA evolve into

Case Study
63
Drivers
(Software-Hardware Interface)
Control Modules
Alarm Devices
(Hardware)
Multicore Architecture
Real-Time Operating System
System monitors gas leaks and fire in
oil extraction platforms

Summary
• We provided a solution for generating stress test cases by
combining meta-heuristic search and constraint programming
• Meta-heuristic search (GA) identifies high risk regions in the
input space
• Constraint programming (CP) finds provably worst-case
schedules within these (limited) regions
• Achieve (nearly) GA efficiency and CP effectiveness
• Our approach can be used both for stress testing and
schedulability analysis (assumption free)
64

Other Industrial Projects
• Delphi: Testing and verification of CPS Simulink models
(e.g., controllers) [Matinnejad et al.]
• SES: Hardware-in-the-Loop, acceptance testing of CPS
[Shin et al.]
• IEE: Testing timing properties in embedded systems
[Wang et al.]
• Luxembourg government: Generating representative,
synthetic test data for information systems [Soltana et
al.]
66

Role of AI
• Metaheuristic search:
• Many test automation problems can be re-expressed into
search and optimization problems
• Machine learning:
• Automation can be better guided and effective when
learning from data: test execution results, fault detection …
• Natural Language Processing:
• Natural language is commonly used and is an obstacle to
automated analysis and therefore test automation
67

Search-Based Solutions
• Versatile
• Helps relax assumptions compared to exact approaches
• Helps decrease modeling requirements
• Scalability, e.g., easy to parallelize
• Requires massive empirical studies
• Search is rarely sufficient by itself
68

Multidisciplinary Approach
• Single-technology approaches rarely work in practice
• Combined search with:
• Machine learning
• Solvers, e.g., CP, SMT
• Statistical approaches, e.g., sensitivity analysis
• System and environment modeling and simulation
69

The Road Ahead
• We need to develop techniques that strike a balance in terms
of scalability, practicality, applicability, and offering a
maximum level of dependability guarantees
• We need more multi-disciplinary research involving AI
• In most industrial contexts, offering absolute guarantees
(correctness, safety, or security) is illusory
• The best trade-offs between cost and level of guarantees is
necessarily context-dependent
• Research in this field cannot be oblivious to context (domain
…)
70

Selected References (SBST)
• Matinnejad et al., “MiL Testing of Highly Configurable Continuous Controllers: Scalable Search
Using Surrogate Models”, ASE 2014
• Di Alesio et al. “Combining genetic algorithms and constraint programming to support stress
testing of task deadlines”, ACM Transactions on Software Engineering and Methodology, 2015
• Soltana et al., “Synthetic Data Generation for Statistical Testing”, ASE 2017.
• Shin et al., “Test case prioritization for acceptance testing of cyber-physical systems”, ISSTA 2018
• Ali et al., “Generating Test Data from OCL Constraints with Search Techniques”, IEEE Transactions
on Software Engineering, 2013
• Hemmati et al., “Achieving Scalable Model-based Testing through Test Case Diversity”, ACM
TOSEM, 2013
72

Selected References (ML+SBST)
• Briand et al., “Using machine learning to refine category-partition test
specifications and test suites”, Information and Software Technology
(Elsevier), 2009
• Appelt et al., “A Machine Learning-Driven Evolutionary Approach for
Testing Web Application Firewalls”, IEEE Transaction on Reliability, 2018
• Ben Abdessalem et al., "Testing Vision-Based Control Systems Using
Learnable Evolutionary Algorithms”, ICSE 2018
• Ben Abdessalem et al., "Testing Autonomous Cars for Feature Interaction
Failures using Many-Objective Search”, ASE 2018
73

Selected References (NLP)
• Wang et al., “Automatic generation of system test cases from use case
specifications”, ISSTA 2015
• Wang et al., “System Testing of Timing Requirements Based on Use Cases
and Timed Automata”, ICST 2017
• Wang et al., “Automated Generation of Constraints from Use Case
Specifications to Support System Testing”, ICST 2018
• Mai et al., “A Natural Language Programming Approach for
Requirements-based Security Testing”, ISSRE 2018
74

Enabling Automated Software Testing with Artificial Intelligence

Recommended

More Related Content

What's hot (20)

Similar to Enabling Automated Software Testing with Artificial Intelligence (20)

More from Lionel Briand (20)

Recently uploaded (20)

Enabling Automated Software Testing with Artificial Intelligence