SlideShare a Scribd company logo
Automatic Test Suite Generation for Key-Points
Detection DNNs using Many-Objective Search
Fitash Ul Haq, Donghwan Shin, Lionel Briand, Thomas Stifter, Jun Wang
Date: 14/07/2021
2
Introduction
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
• Automatically detecting key-points in an image or a video is a fundamental step
for many applications, such as face recognition and drowsiness detection
• With the recent advances in Deep Neural Networks (DNNs), Key-point Detection DNNs (KP-DNNs) are
widely used to detect key-points in an image
Car dash
camera Video
3
Introduction
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
• Automatically detecting key-points in an image or a video is a fundamental step
for many applications, such as face recognition and drowsiness detection
• With the recent advances in Deep Neural Networks (DNNs), Key-point Detection DNNs (KP-DNNs) are
widely used to detect key-points in an image
Car dash
camera Video
DNN
4
Introduction
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
• Automatically detecting key-points in an image or a video is a fundamental step
for many applications, such as face recognition and drowsiness detection
• With the recent advances in Deep Neural Networks (DNNs), Key-point Detection DNNs (KP-DNNs) are
widely used to detect key-points in an image
Car dash
camera Video
DNN Driver is
awake
6
Motivation and Goal
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
• IEE developed a drowsiness detection system based on a Facial KP-DNN
• In the facial key-points detection problem, each Key-Point (KP) is important,
as even one incorrectly predicted KP can have a major impact on system
reliability and safety
• Hence, we should test KPs individually to properly test the DNN
• One test requirement for each KP: “the DNN should correctly predict the KP”
• For our subject DNN (IEE-DNN), we have 27 test requirements as we have 27 KPs
• Our goal is to find a test suite that causes IEE-DNN to severely mis-predict as
many key-points as possible
Example
Input
Reference Image
showing 27 KPs
7
Challenges and Idea
• Challenges
• The input space is too large to be exhaustively explored
• The number of KPs is typically large (e.g., our evaluation uses the IEE-DNN that detects 27 KPs)
• One should not simply consider average prediction errors across all KPs
• It may be infeasible to find a test image causing severe prediction errors for some KPs
• In such cases, it is essential to dynamically and efficiently distribute the computational resources dedicated to testing to the
other KPs
• To address them, we apply many-objective search for test suite generation for the IEE-DNN
• State-of-the-art algorithms (i.e., MOSA* and FITEST**) aim to efficiently achieve each objective individually
• We set the misprediction of each KP as one objective
* Panichella, Annibale, Fitsum Meshesha Kifetew, and Paolo Tonella. "Reformulating branch coverage as a many-objective optimization problem." 2015 IEEE 8th international conference on software
testing, verification and validation (ICST). IEEE, 2015.
** Abdessalem, Raja Ben, et al. "Testing autonomous cars for feature interaction failures using many-objective search." 2018 33rd IEEE/ACM International Conference on Automated Software
Engineering (ASE). IEEE, 2018.
8
Overview: Automatic Test Suite Generation using Many-Objective Search
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
9
Overview: Automatic Test Suite Generation using Many-Objective Search
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
Search Engine
10
Overview: Automatic Test Suite Generation using Many-Objective Search
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
Search Engine Simulator
Input (vector)
11
Overview: Automatic Test Suite Generation using Many-Objective Search
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
Search Engine Simulator
Input (vector)
DNN
Actual Key-points Positions
Test
Image
12
Overview: Automatic Test Suite Generation using Many-Objective Search
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
Search Engine Simulator
Input (vector)
DNN
Actual Key-points Positions
Predicted Key-points Positions
Test
Image
13
Overview: Automatic Test Suite Generation using Many-Objective Search
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
Search Engine Simulator
Input (vector)
DNN
Actual Key-points Positions
Predicted Key-points Positions
Fitness Score
(Error Value)
Test
Image
14
Overview: Automatic Test Suite Generation using Many-Objective Search
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
Search Engine Simulator
Input (vector)
DNN
Actual Key-points Positions
Predicted Key-points Positions
Fitness Score
(Error Value)
Test
Image
Most Critical
Test Inputs
15
Research Questions
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
• RQ1: How do alternative many-objective search algorithms fare in terms of test effectiveness?
• Check whether using many-objective search is indeed a suitable solution for the problem
• RQ2: Can we further distinguish search algorithms using the degree of mispredictions caused by
the test suites they generate?
• Compares how severely key-points are mispredicted by test suites generated across different search algorithms
• RQ3: Can we explain individual key-point mispredictions in terms of image characteristics?
• Investigate whether it is possible to provide accurate and interpretable explanations of mispredictions based on
image characteristics
16
Subject DNN and Simulator
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
• IEE-DNNv1.0
• Architecture: Stacked hourglass*
• Training set: 18,120 synthetic images generated by Blender
using make-human and 4D faces models
• Test set: 2738 synthetic images
• Input: Takes 256 x 256 pixel image
• Output: locations of 27 key-points
• NME: 0.018
• IEE-SIMv1.0
• Input: Model ID, roll, pitch and yaw (Range: -30 to +30; defined
by IEE)
• Output: Image and ground truth for locations of key-points
• Number of models available: 10
* Newell, Alejandro, Kaiyu Yang, and Jia Deng. "Stacked hourglass networks for human pose estimation." European conference on computer vision. Springer, Cham, 2016.
Sample Images from 3D models
17
RQ1: Effectiveness of Test Suites
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
• Objective
• Find the best search algorithm for generating test suites with maximum Effectiveness Score (ES)
• Search algorithms
• Random Search (RS), MOSA, FITEST
• MOSA+ and FITEST+: identical to MOSA and FITEST, but use a different crossover strategy (i.e.,
using dynamic distribution index) to better guide new test data towards uncovered objectives
• Experiment Parameters
• Search budget — 2 hours
• Repetition: 20 times
!" =
$%&'() *+ ,-.*))(./01 2)(34./(3 5(16*4-/7
8*/90 $%&'() *+ 5(16*4-/7
• Statistical analysis
• Significance: Mann–Whitney U test
• Effect Size: Vargha and Delaney’s effect size
18
RQ1: Results
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
• MOSA and FITEST families outperform RS
• Overall, MOSA+ is the best in terms of maximizing the number of severely mispredicted KPs
A B p-value Effect Size
MOSA RS 0 1
MOSA+ RS 0 1
FITEST RS 0 1
FITEST+ RS 0 1
MOSA FITEST 0 0.837
MOSA+ FITEST 0 0.945
FITEST+ FITEST 0.7502 0.53
MOSA FITEST+ 0.0045 0.7575
MOSA+ FITEST+ 0 0.86
MOSA+ MOSA 0.1091 0.6375
Statistical Analysis
Average of ES for 20 Runs of different search algorithms
19
RQ1: Implications
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
• Our approach is effective in generating test suites that cause IEE-DNN to severely mispredict
more than 93% of all key-points on average
• MOSA and MOSA+ are significantly better than FITEST and FITEST+ in terms of ES
• There is no significant difference between MOSA (and FITEST) and MOSA+ (and FITEST+), this
shows that dynamically controlling the similarity between parents and children in crossover does
not significantly improve effectiveness
• RQ1 only considers the number of severely mispredicted key-points, differences in effectiveness
across search algorithms may not appear clearly and completely
• For example, two test suites generated by different algorithms may cause the same number of severely
mispredicted key-points.
20
RQ2: Misprediction Severity for Individual Key-points
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
• Objective
• Find the best search algorithm for generating test suite with maximum Misprediction Severity (maximum error) for
each key-point.
• Search algorithms (Same as RQ1)
• Random Search (RS), MOSA, FITEST, MOSA+, FITEST+
• Experiment Parameters (Same as RQ1)
• Search budget — 2 hours
• Repetition: 20 times
• Statistical analysis
• Significance: Wilcoxon signed-rank test
• Effect Size: Vargha and Delaney’s effect size
21
RQ2: Results
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
• MOSA and FITEST families subsume RS
• We found that there are specific KPs that are more severely mis-predicted than others
MS for individual key-points for search algorithms
A B p-values Effect Size
MOSA RS 0 0.897
MOSA+ RS 0 0.902
FITEST RS 0 0.876
FITEST+ RS 0 0.873
MOSA FITEST 0.57 0.541
MOSA+ FITEST 0.594 0.543
FITEST+ FITEST 0.052 0.507
MOSA FITEST+ 0.177 0.545
MOSA+ FITEST+ 0.009 0.556
MOSA+ MOSA 0.78 0.5
Statistical Analysis
22
RQ2: Implications
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
• Some KPs are more severely mis-predicted than others,
mainly because:
• Under-representation of some KPs in the training data (e.g., KP7 is
only present in 79% of training data)
• Large variation in the shape and size of the mouth across different
3D models, KP24, KP25, KP26, and KP27 are located on the mouth
which shows the largest variation among face features
• There is no statistically significant difference in MS
between MOSA and MOSA+, and between FITEST and
FITEST+
• This implies that, consistent with RQ1, dynamically adjusting the
distribution index in crossover does not increase misprediction
severity for individual key-points Sample Images showing different variations of mouth
23
RQ3: Explaining Mispredictions
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
• Objective
• Investigate whether it is possible to provide accurate and
interpretable explanations of mispredictions based on image
characteristics used by the simulator to generate test images
Example Regression Tree
Model-ID
Pitch
NE = 0.04
< 18.41 ≥ 18.41
= 9 ≠ 9
…
…
24
RQ3: Explaining Mispredictions
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
• Objective
• Investigate whether it is possible to provide accurate and
interpretable explanations of mispredictions based on image
characteristics used by the simulator to generate test images
• Approach
• Build a regression tree for each KP using test results
• Dataset: test images generated during the execution of our approach
• Input variables: roll, pitch, yaw, and 3D model ID
• Target variable: normalized prediction error (NE) of the IEE-DNN
• Evaluate the (predictive) error of generated regression trees
using 10-fold CV
Example Regression Tree
Model-ID
Pitch
NE = 0.04
< 18.41 ≥ 18.41
= 9 ≠ 9
…
…
25
RQ3: Results
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
Representative rules derived from the decision tree for KP26
(M: Model-ID, P: Pitch, R: Roll, Y: Yaw)
Image Characteristics Condition NE
! = 9 ∧ # < 18.41 0.04
! = 9 ∧ # ≥ 18.41 ∧ $ < −22.31 ∧ % < 17.06 0.26
! = 9 ∧ # ≥ 18.41 ∧ $ < −22.31 ∧ 17.06 ≤ % < 19 0.71
! = 9 ∧ # ≥ 18.41 ∧ $ < −22.31 ∧ % ≥ 19 0.36
(A) A test image satisfying
the first condition
(B) A test image satisfying
the third condition
NE= 0.013 NE= 0.89
• Using the conditions, we performed detailed analysis to find the root causes of high NE value and found out
that shadow on the location of KP26 is the cause of high NE value
• The average MAE from all the trees is 0.01 (far less than IEE threshold: 0.05) with average tree size 25.7
26
RQ3: Implications
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
Knowing under what conditions severe mispredictions are occurring can help engineers in two
ways:
• Helps to assess the risks associated with individual key-points for specific conditions, in the
context of a specific application
• Enables the generation of specific test images, using the simulator, that are expected to cause
particularly severe mispredictions and can be used for retraining the DNN
27
Lessons Learned
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
• Automated test suite generation is indeed useful in practice
• Testing results helped IEE assess and improve the IEE-DNN
• They continuously enriched the dataset by adding more training images from diverse 3D face models
• They Improved the IEE-DNN’s architecture by doubling the number of hidden layers to drastically increase its accuracy
• The results also helped IEE improve the simulator
• The detailed analysis of the testing results showed the labeled KP positions were not accurate; this was later fixed
• Understanding mispredictions is critical
• Such findings led IEE to better target their development resources to improve the driver’s gaze detection system rather
than just focusing on the IEE-DNN itself
• Simulation-based testing brings key benefits
• We can effectively generate as many different test images as needed, with ground truth
28
Conclusion
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
• We formalize the problem definition of KP-DNN testing and present an approach to automatically
generate test data for KP-DNNs with many independent outputs
• We empirically compare state-of-the-art, many-objective search algorithms and their variants
tailored for test suite generation
• We further investigate and demonstrate a way, based on regression trees, to learn the conditions,
in terms of image characteristics, that cause severe mispredictions for individual key-point
Automatic Test Suite Generation for Key-
Points Detection DNNs using Many-
Objective Search
Fitash Ul Haq, Donghwan Shin, Lionel Briand, Thomas Stifter, Jun Wang
Date: 14-07-2021
Backup
31
Pseudo-code: Many Objective Search Algorithm (animations will be added)
Initialization Calculating
Objectives Updating Archive
and Objectives
Remaining
Budget
Offspring
Generation
Calculating
Objectives
Updating Archive
and Objectives
Generating
Next Generation
Archive
32
Applying Meta-heuristic Search Algorithm
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience
Paper)
Automatic test data generation using meta-heuristic search algorithms is widely studied in software testing
• Transform the test data generation as an optimization problem and apply
meta-heuristics to cost-effectively solve it
Fitness function f
Initial solution xi
Vector representation of facial
image features (e.g., head posture)
Prediction error for the IEE-DNN Search Algorithm
Best solution x such that
f(x) is the maximum
Best facial image (features)
that maximizes f
33
Incorrectly Predicted Key-Points
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience
Paper)
Originally, IEE defines a test input (image) to be unsafe if the NME (Normalized Mean Error) is greater than or equal to 0.05
• NME is the average error of all key-points
We define a key-point is “incorrectly predicted” if its normalized error is greater than or equal to 0.05
Ad

More Related Content

What's hot (20)

Automating System Test Case Classification and Prioritization for Use Case-Dr...
Automating System Test Case Classification and Prioritization for Use Case-Dr...Automating System Test Case Classification and Prioritization for Use Case-Dr...
Automating System Test Case Classification and Prioritization for Use Case-Dr...
Lionel Briand
 
Artificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software Testing
Lionel Briand
 
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
Lionel Briand
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance Systems
Lionel Briand
 
Metamorphic Security Testing for Web Systems
Metamorphic Security Testing for Web SystemsMetamorphic Security Testing for Web Systems
Metamorphic Security Testing for Web Systems
Lionel Briand
 
Keynote SBST 2014 - Search-Based Testing
Keynote SBST 2014 - Search-Based TestingKeynote SBST 2014 - Search-Based Testing
Keynote SBST 2014 - Search-Based Testing
Lionel Briand
 
Testing of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven StrategiesTesting of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven Strategies
Lionel Briand
 
SSBSE 2020 keynote
SSBSE 2020 keynoteSSBSE 2020 keynote
SSBSE 2020 keynote
Shiva Nejati
 
Scalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and TestingScalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and Testing
Lionel Briand
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Sung Kim
 
Testing of artificial intelligence; AI quality engineering skils - an introdu...
Testing of artificial intelligence; AI quality engineering skils - an introdu...Testing of artificial intelligence; AI quality engineering skils - an introdu...
Testing of artificial intelligence; AI quality engineering skils - an introdu...
Rik Marselis
 
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCLOCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
Lionel Briand
 
Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...
Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...
Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...
Lionel Briand
 
Achieving Scalability in Software Testing with Machine Learning and Metaheuri...
Achieving Scalability in Software Testing with Machine Learning and Metaheuri...Achieving Scalability in Software Testing with Machine Learning and Metaheuri...
Achieving Scalability in Software Testing with Machine Learning and Metaheuri...
Lionel Briand
 
Automated Test Suite Generation for Time-Continuous Simulink Models
Automated Test Suite Generation for Time-Continuous Simulink ModelsAutomated Test Suite Generation for Time-Continuous Simulink Models
Automated Test Suite Generation for Time-Continuous Simulink Models
Lionel Briand
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
Sung Kim
 
HITECS: A UML Profile and Analysis Framework for Hardware-in-the-Loop Testing...
HITECS: A UML Profile and Analysis Framework for Hardware-in-the-Loop Testing...HITECS: A UML Profile and Analysis Framework for Hardware-in-the-Loop Testing...
HITECS: A UML Profile and Analysis Framework for Hardware-in-the-Loop Testing...
Lionel Briand
 
Improving Fault Localization for Simulink Models using Search-Based Testing a...
Improving Fault Localization for Simulink Models using Search-Based Testing a...Improving Fault Localization for Simulink Models using Search-Based Testing a...
Improving Fault Localization for Simulink Models using Search-Based Testing a...
Lionel Briand
 
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsTest Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Lionel Briand
 
Approximation-Refinement Testing of Compute-Intensive Cyber-Physical Models: ...
Approximation-Refinement Testing of Compute-Intensive Cyber-Physical Models: ...Approximation-Refinement Testing of Compute-Intensive Cyber-Physical Models: ...
Approximation-Refinement Testing of Compute-Intensive Cyber-Physical Models: ...
Lionel Briand
 
Automating System Test Case Classification and Prioritization for Use Case-Dr...
Automating System Test Case Classification and Prioritization for Use Case-Dr...Automating System Test Case Classification and Prioritization for Use Case-Dr...
Automating System Test Case Classification and Prioritization for Use Case-Dr...
Lionel Briand
 
Artificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software Testing
Lionel Briand
 
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
Lionel Briand
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance Systems
Lionel Briand
 
Metamorphic Security Testing for Web Systems
Metamorphic Security Testing for Web SystemsMetamorphic Security Testing for Web Systems
Metamorphic Security Testing for Web Systems
Lionel Briand
 
Keynote SBST 2014 - Search-Based Testing
Keynote SBST 2014 - Search-Based TestingKeynote SBST 2014 - Search-Based Testing
Keynote SBST 2014 - Search-Based Testing
Lionel Briand
 
Testing of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven StrategiesTesting of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven Strategies
Lionel Briand
 
SSBSE 2020 keynote
SSBSE 2020 keynoteSSBSE 2020 keynote
SSBSE 2020 keynote
Shiva Nejati
 
Scalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and TestingScalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and Testing
Lionel Briand
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Sung Kim
 
Testing of artificial intelligence; AI quality engineering skils - an introdu...
Testing of artificial intelligence; AI quality engineering skils - an introdu...Testing of artificial intelligence; AI quality engineering skils - an introdu...
Testing of artificial intelligence; AI quality engineering skils - an introdu...
Rik Marselis
 
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCLOCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
Lionel Briand
 
Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...
Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...
Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...
Lionel Briand
 
Achieving Scalability in Software Testing with Machine Learning and Metaheuri...
Achieving Scalability in Software Testing with Machine Learning and Metaheuri...Achieving Scalability in Software Testing with Machine Learning and Metaheuri...
Achieving Scalability in Software Testing with Machine Learning and Metaheuri...
Lionel Briand
 
Automated Test Suite Generation for Time-Continuous Simulink Models
Automated Test Suite Generation for Time-Continuous Simulink ModelsAutomated Test Suite Generation for Time-Continuous Simulink Models
Automated Test Suite Generation for Time-Continuous Simulink Models
Lionel Briand
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
Sung Kim
 
HITECS: A UML Profile and Analysis Framework for Hardware-in-the-Loop Testing...
HITECS: A UML Profile and Analysis Framework for Hardware-in-the-Loop Testing...HITECS: A UML Profile and Analysis Framework for Hardware-in-the-Loop Testing...
HITECS: A UML Profile and Analysis Framework for Hardware-in-the-Loop Testing...
Lionel Briand
 
Improving Fault Localization for Simulink Models using Search-Based Testing a...
Improving Fault Localization for Simulink Models using Search-Based Testing a...Improving Fault Localization for Simulink Models using Search-Based Testing a...
Improving Fault Localization for Simulink Models using Search-Based Testing a...
Lionel Briand
 
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsTest Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Lionel Briand
 
Approximation-Refinement Testing of Compute-Intensive Cyber-Physical Models: ...
Approximation-Refinement Testing of Compute-Intensive Cyber-Physical Models: ...Approximation-Refinement Testing of Compute-Intensive Cyber-Physical Models: ...
Approximation-Refinement Testing of Compute-Intensive Cyber-Physical Models: ...
Lionel Briand
 

Similar to Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (20)

Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Lionel Briand
 
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Lionel Briand
 
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Lionel Briand
 
Machine learning yearning
Machine learning yearningMachine learning yearning
Machine learning yearning
mohammad pourheidary
 
Long-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep LearningLong-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep Learning
Elaheh Rashedi
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
Sujit Pal
 
MOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptxMOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptx
Ayushkumar417871
 
Scalawox deeplearning
Scalawox deeplearningScalawox deeplearning
Scalawox deeplearning
scalawox
 
Comparing Offline and Online Testing of Deep Neural Networks: An Autonomous C...
Comparing Offline and Online Testing of Deep Neural Networks: An Autonomous C...Comparing Offline and Online Testing of Deep Neural Networks: An Autonomous C...
Comparing Offline and Online Testing of Deep Neural Networks: An Autonomous C...
Lionel Briand
 
Deploying ML models to production (frequently and safely) - PYCON 2018
Deploying ML models to production (frequently and safely) - PYCON 2018Deploying ML models to production (frequently and safely) - PYCON 2018
Deploying ML models to production (frequently and safely) - PYCON 2018
David Tan
 
“Image Signal Processing Optimization for Object Detection,” a Presentation f...
“Image Signal Processing Optimization for Object Detection,” a Presentation f...“Image Signal Processing Optimization for Object Detection,” a Presentation f...
“Image Signal Processing Optimization for Object Detection,” a Presentation f...
Edge AI and Vision Alliance
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptx
Jadna Almeida
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptx
Jadna Almeida
 
Unit test
Unit testUnit test
Unit test
Tran Duc
 
Steps in Simulation Study
Steps in Simulation StudySteps in Simulation Study
Steps in Simulation Study
Nalin Adhikari
 
Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...
yaevents
 
[DSC Europe 22] Starting deep learning projects without sufficient amount of ...
[DSC Europe 22] Starting deep learning projects without sufficient amount of ...[DSC Europe 22] Starting deep learning projects without sufficient amount of ...
[DSC Europe 22] Starting deep learning projects without sufficient amount of ...
DataScienceConferenc1
 
Improving neural question generation using answer separation
Improving neural question generation using answer separationImproving neural question generation using answer separation
Improving neural question generation using answer separation
NAVER Engineering
 
AI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
AI-Powered Computer Vision Applications in Media Industry - Yulia PavlovaAI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
AI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
Alexey Grigorev
 
How EVERFI Moved from No Automation to Continuous Test Generation in 9 Months
How EVERFI Moved from No Automation to Continuous Test Generation in 9 MonthsHow EVERFI Moved from No Automation to Continuous Test Generation in 9 Months
How EVERFI Moved from No Automation to Continuous Test Generation in 9 Months
Applitools
 
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Lionel Briand
 
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Lionel Briand
 
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Lionel Briand
 
Long-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep LearningLong-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep Learning
Elaheh Rashedi
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
Sujit Pal
 
MOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptxMOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptx
Ayushkumar417871
 
Scalawox deeplearning
Scalawox deeplearningScalawox deeplearning
Scalawox deeplearning
scalawox
 
Comparing Offline and Online Testing of Deep Neural Networks: An Autonomous C...
Comparing Offline and Online Testing of Deep Neural Networks: An Autonomous C...Comparing Offline and Online Testing of Deep Neural Networks: An Autonomous C...
Comparing Offline and Online Testing of Deep Neural Networks: An Autonomous C...
Lionel Briand
 
Deploying ML models to production (frequently and safely) - PYCON 2018
Deploying ML models to production (frequently and safely) - PYCON 2018Deploying ML models to production (frequently and safely) - PYCON 2018
Deploying ML models to production (frequently and safely) - PYCON 2018
David Tan
 
“Image Signal Processing Optimization for Object Detection,” a Presentation f...
“Image Signal Processing Optimization for Object Detection,” a Presentation f...“Image Signal Processing Optimization for Object Detection,” a Presentation f...
“Image Signal Processing Optimization for Object Detection,” a Presentation f...
Edge AI and Vision Alliance
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptx
Jadna Almeida
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptx
Jadna Almeida
 
Steps in Simulation Study
Steps in Simulation StudySteps in Simulation Study
Steps in Simulation Study
Nalin Adhikari
 
Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...
yaevents
 
[DSC Europe 22] Starting deep learning projects without sufficient amount of ...
[DSC Europe 22] Starting deep learning projects without sufficient amount of ...[DSC Europe 22] Starting deep learning projects without sufficient amount of ...
[DSC Europe 22] Starting deep learning projects without sufficient amount of ...
DataScienceConferenc1
 
Improving neural question generation using answer separation
Improving neural question generation using answer separationImproving neural question generation using answer separation
Improving neural question generation using answer separation
NAVER Engineering
 
AI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
AI-Powered Computer Vision Applications in Media Industry - Yulia PavlovaAI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
AI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
Alexey Grigorev
 
How EVERFI Moved from No Automation to Continuous Test Generation in 9 Months
How EVERFI Moved from No Automation to Continuous Test Generation in 9 MonthsHow EVERFI Moved from No Automation to Continuous Test Generation in 9 Months
How EVERFI Moved from No Automation to Continuous Test Generation in 9 Months
Applitools
 
Ad

More from Lionel Briand (20)

FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...
Lionel Briand
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
Lionel Briand
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
Lionel Briand
 
Metamorphic Testing for Web System Security
Metamorphic Testing for Web System SecurityMetamorphic Testing for Web System Security
Metamorphic Testing for Web System Security
Lionel Briand
 
Fuzzing for CPS Mutation Testing
Fuzzing for CPS Mutation TestingFuzzing for CPS Mutation Testing
Fuzzing for CPS Mutation Testing
Lionel Briand
 
Data-driven Mutation Analysis for Cyber-Physical Systems
Data-driven Mutation Analysis for Cyber-Physical SystemsData-driven Mutation Analysis for Cyber-Physical Systems
Data-driven Mutation Analysis for Cyber-Physical Systems
Lionel Briand
 
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled SystemsMany-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Lionel Briand
 
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
Lionel Briand
 
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Lionel Briand
 
PRINS: Scalable Model Inference for Component-based System Logs
PRINS: Scalable Model Inference for Component-based System LogsPRINS: Scalable Model Inference for Component-based System Logs
PRINS: Scalable Model Inference for Component-based System Logs
Lionel Briand
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software Testing
Lionel Briand
 
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
Autonomous Systems: How to Address the Dilemma between Autonomy and SafetyAutonomous Systems: How to Address the Dilemma between Autonomy and Safety
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
Lionel Briand
 
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Lionel Briand
 
Reinforcement Learning for Test Case Prioritization
Reinforcement Learning for Test Case PrioritizationReinforcement Learning for Test Case Prioritization
Reinforcement Learning for Test Case Prioritization
Lionel Briand
 
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Lionel Briand
 
On Systematically Building a Controlled Natural Language for Functional Requi...
On Systematically Building a Controlled Natural Language for Functional Requi...On Systematically Building a Controlled Natural Language for Functional Requi...
On Systematically Building a Controlled Natural Language for Functional Requi...
Lionel Briand
 
Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Lionel Briand
 
A Theoretical Framework for Understanding the Relationship between Log Parsin...
A Theoretical Framework for Understanding the Relationship between Log Parsin...A Theoretical Framework for Understanding the Relationship between Log Parsin...
A Theoretical Framework for Understanding the Relationship between Log Parsin...
Lionel Briand
 
Requirements in Cyber-Physical Systems: Specifications and Applications
Requirements in Cyber-Physical Systems: Specifications and ApplicationsRequirements in Cyber-Physical Systems: Specifications and Applications
Requirements in Cyber-Physical Systems: Specifications and Applications
Lionel Briand
 
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...
Lionel Briand
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
Lionel Briand
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
Lionel Briand
 
Metamorphic Testing for Web System Security
Metamorphic Testing for Web System SecurityMetamorphic Testing for Web System Security
Metamorphic Testing for Web System Security
Lionel Briand
 
Fuzzing for CPS Mutation Testing
Fuzzing for CPS Mutation TestingFuzzing for CPS Mutation Testing
Fuzzing for CPS Mutation Testing
Lionel Briand
 
Data-driven Mutation Analysis for Cyber-Physical Systems
Data-driven Mutation Analysis for Cyber-Physical SystemsData-driven Mutation Analysis for Cyber-Physical Systems
Data-driven Mutation Analysis for Cyber-Physical Systems
Lionel Briand
 
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled SystemsMany-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Lionel Briand
 
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
Lionel Briand
 
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Lionel Briand
 
PRINS: Scalable Model Inference for Component-based System Logs
PRINS: Scalable Model Inference for Component-based System LogsPRINS: Scalable Model Inference for Component-based System Logs
PRINS: Scalable Model Inference for Component-based System Logs
Lionel Briand
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software Testing
Lionel Briand
 
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
Autonomous Systems: How to Address the Dilemma between Autonomy and SafetyAutonomous Systems: How to Address the Dilemma between Autonomy and Safety
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
Lionel Briand
 
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Lionel Briand
 
Reinforcement Learning for Test Case Prioritization
Reinforcement Learning for Test Case PrioritizationReinforcement Learning for Test Case Prioritization
Reinforcement Learning for Test Case Prioritization
Lionel Briand
 
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Lionel Briand
 
On Systematically Building a Controlled Natural Language for Functional Requi...
On Systematically Building a Controlled Natural Language for Functional Requi...On Systematically Building a Controlled Natural Language for Functional Requi...
On Systematically Building a Controlled Natural Language for Functional Requi...
Lionel Briand
 
Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Lionel Briand
 
A Theoretical Framework for Understanding the Relationship between Log Parsin...
A Theoretical Framework for Understanding the Relationship between Log Parsin...A Theoretical Framework for Understanding the Relationship between Log Parsin...
A Theoretical Framework for Understanding the Relationship between Log Parsin...
Lionel Briand
 
Requirements in Cyber-Physical Systems: Specifications and Applications
Requirements in Cyber-Physical Systems: Specifications and ApplicationsRequirements in Cyber-Physical Systems: Specifications and Applications
Requirements in Cyber-Physical Systems: Specifications and Applications
Lionel Briand
 
Ad

Recently uploaded (20)

Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Microsoft Excel Core Points Training.pptx
Microsoft Excel Core Points Training.pptxMicrosoft Excel Core Points Training.pptx
Microsoft Excel Core Points Training.pptx
Mekonnen
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
DVDFab Crack FREE Download Latest Version 2025
DVDFab Crack FREE Download Latest Version 2025DVDFab Crack FREE Download Latest Version 2025
DVDFab Crack FREE Download Latest Version 2025
younisnoman75
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Odoo ERP for Education Management to Streamline Your Education Process
Odoo ERP for Education Management to Streamline Your Education ProcessOdoo ERP for Education Management to Streamline Your Education Process
Odoo ERP for Education Management to Streamline Your Education Process
iVenture Team LLP
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Implementing promises with typescripts, step by step
Implementing promises with typescripts, step by stepImplementing promises with typescripts, step by step
Implementing promises with typescripts, step by step
Ran Wahle
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
PRTG Network Monitor Crack Latest Version & Serial Key 2025 [100% Working]
PRTG Network Monitor Crack Latest Version & Serial Key 2025 [100% Working]PRTG Network Monitor Crack Latest Version & Serial Key 2025 [100% Working]
PRTG Network Monitor Crack Latest Version & Serial Key 2025 [100% Working]
saimabibi60507
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Microsoft Excel Core Points Training.pptx
Microsoft Excel Core Points Training.pptxMicrosoft Excel Core Points Training.pptx
Microsoft Excel Core Points Training.pptx
Mekonnen
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
DVDFab Crack FREE Download Latest Version 2025
DVDFab Crack FREE Download Latest Version 2025DVDFab Crack FREE Download Latest Version 2025
DVDFab Crack FREE Download Latest Version 2025
younisnoman75
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Odoo ERP for Education Management to Streamline Your Education Process
Odoo ERP for Education Management to Streamline Your Education ProcessOdoo ERP for Education Management to Streamline Your Education Process
Odoo ERP for Education Management to Streamline Your Education Process
iVenture Team LLP
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Implementing promises with typescripts, step by step
Implementing promises with typescripts, step by stepImplementing promises with typescripts, step by step
Implementing promises with typescripts, step by step
Ran Wahle
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
PRTG Network Monitor Crack Latest Version & Serial Key 2025 [100% Working]
PRTG Network Monitor Crack Latest Version & Serial Key 2025 [100% Working]PRTG Network Monitor Crack Latest Version & Serial Key 2025 [100% Working]
PRTG Network Monitor Crack Latest Version & Serial Key 2025 [100% Working]
saimabibi60507
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 

Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search

  • 1. Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search Fitash Ul Haq, Donghwan Shin, Lionel Briand, Thomas Stifter, Jun Wang Date: 14/07/2021
  • 2. 2 Introduction Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) • Automatically detecting key-points in an image or a video is a fundamental step for many applications, such as face recognition and drowsiness detection • With the recent advances in Deep Neural Networks (DNNs), Key-point Detection DNNs (KP-DNNs) are widely used to detect key-points in an image Car dash camera Video
  • 3. 3 Introduction Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) • Automatically detecting key-points in an image or a video is a fundamental step for many applications, such as face recognition and drowsiness detection • With the recent advances in Deep Neural Networks (DNNs), Key-point Detection DNNs (KP-DNNs) are widely used to detect key-points in an image Car dash camera Video DNN
  • 4. 4 Introduction Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) • Automatically detecting key-points in an image or a video is a fundamental step for many applications, such as face recognition and drowsiness detection • With the recent advances in Deep Neural Networks (DNNs), Key-point Detection DNNs (KP-DNNs) are widely used to detect key-points in an image Car dash camera Video DNN Driver is awake
  • 5. 6 Motivation and Goal Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) • IEE developed a drowsiness detection system based on a Facial KP-DNN • In the facial key-points detection problem, each Key-Point (KP) is important, as even one incorrectly predicted KP can have a major impact on system reliability and safety • Hence, we should test KPs individually to properly test the DNN • One test requirement for each KP: “the DNN should correctly predict the KP” • For our subject DNN (IEE-DNN), we have 27 test requirements as we have 27 KPs • Our goal is to find a test suite that causes IEE-DNN to severely mis-predict as many key-points as possible Example Input Reference Image showing 27 KPs
  • 6. 7 Challenges and Idea • Challenges • The input space is too large to be exhaustively explored • The number of KPs is typically large (e.g., our evaluation uses the IEE-DNN that detects 27 KPs) • One should not simply consider average prediction errors across all KPs • It may be infeasible to find a test image causing severe prediction errors for some KPs • In such cases, it is essential to dynamically and efficiently distribute the computational resources dedicated to testing to the other KPs • To address them, we apply many-objective search for test suite generation for the IEE-DNN • State-of-the-art algorithms (i.e., MOSA* and FITEST**) aim to efficiently achieve each objective individually • We set the misprediction of each KP as one objective * Panichella, Annibale, Fitsum Meshesha Kifetew, and Paolo Tonella. "Reformulating branch coverage as a many-objective optimization problem." 2015 IEEE 8th international conference on software testing, verification and validation (ICST). IEEE, 2015. ** Abdessalem, Raja Ben, et al. "Testing autonomous cars for feature interaction failures using many-objective search." 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2018.
  • 7. 8 Overview: Automatic Test Suite Generation using Many-Objective Search Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper)
  • 8. 9 Overview: Automatic Test Suite Generation using Many-Objective Search Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) Search Engine
  • 9. 10 Overview: Automatic Test Suite Generation using Many-Objective Search Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) Search Engine Simulator Input (vector)
  • 10. 11 Overview: Automatic Test Suite Generation using Many-Objective Search Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) Search Engine Simulator Input (vector) DNN Actual Key-points Positions Test Image
  • 11. 12 Overview: Automatic Test Suite Generation using Many-Objective Search Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) Search Engine Simulator Input (vector) DNN Actual Key-points Positions Predicted Key-points Positions Test Image
  • 12. 13 Overview: Automatic Test Suite Generation using Many-Objective Search Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) Search Engine Simulator Input (vector) DNN Actual Key-points Positions Predicted Key-points Positions Fitness Score (Error Value) Test Image
  • 13. 14 Overview: Automatic Test Suite Generation using Many-Objective Search Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) Search Engine Simulator Input (vector) DNN Actual Key-points Positions Predicted Key-points Positions Fitness Score (Error Value) Test Image Most Critical Test Inputs
  • 14. 15 Research Questions Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) • RQ1: How do alternative many-objective search algorithms fare in terms of test effectiveness? • Check whether using many-objective search is indeed a suitable solution for the problem • RQ2: Can we further distinguish search algorithms using the degree of mispredictions caused by the test suites they generate? • Compares how severely key-points are mispredicted by test suites generated across different search algorithms • RQ3: Can we explain individual key-point mispredictions in terms of image characteristics? • Investigate whether it is possible to provide accurate and interpretable explanations of mispredictions based on image characteristics
  • 15. 16 Subject DNN and Simulator Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) • IEE-DNNv1.0 • Architecture: Stacked hourglass* • Training set: 18,120 synthetic images generated by Blender using make-human and 4D faces models • Test set: 2738 synthetic images • Input: Takes 256 x 256 pixel image • Output: locations of 27 key-points • NME: 0.018 • IEE-SIMv1.0 • Input: Model ID, roll, pitch and yaw (Range: -30 to +30; defined by IEE) • Output: Image and ground truth for locations of key-points • Number of models available: 10 * Newell, Alejandro, Kaiyu Yang, and Jia Deng. "Stacked hourglass networks for human pose estimation." European conference on computer vision. Springer, Cham, 2016. Sample Images from 3D models
  • 16. 17 RQ1: Effectiveness of Test Suites Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) • Objective • Find the best search algorithm for generating test suites with maximum Effectiveness Score (ES) • Search algorithms • Random Search (RS), MOSA, FITEST • MOSA+ and FITEST+: identical to MOSA and FITEST, but use a different crossover strategy (i.e., using dynamic distribution index) to better guide new test data towards uncovered objectives • Experiment Parameters • Search budget — 2 hours • Repetition: 20 times !" = $%&'() *+ ,-.*))(./01 2)(34./(3 5(16*4-/7 8*/90 $%&'() *+ 5(16*4-/7 • Statistical analysis • Significance: Mann–Whitney U test • Effect Size: Vargha and Delaney’s effect size
  • 17. 18 RQ1: Results Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) • MOSA and FITEST families outperform RS • Overall, MOSA+ is the best in terms of maximizing the number of severely mispredicted KPs A B p-value Effect Size MOSA RS 0 1 MOSA+ RS 0 1 FITEST RS 0 1 FITEST+ RS 0 1 MOSA FITEST 0 0.837 MOSA+ FITEST 0 0.945 FITEST+ FITEST 0.7502 0.53 MOSA FITEST+ 0.0045 0.7575 MOSA+ FITEST+ 0 0.86 MOSA+ MOSA 0.1091 0.6375 Statistical Analysis Average of ES for 20 Runs of different search algorithms
  • 18. 19 RQ1: Implications Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) • Our approach is effective in generating test suites that cause IEE-DNN to severely mispredict more than 93% of all key-points on average • MOSA and MOSA+ are significantly better than FITEST and FITEST+ in terms of ES • There is no significant difference between MOSA (and FITEST) and MOSA+ (and FITEST+), this shows that dynamically controlling the similarity between parents and children in crossover does not significantly improve effectiveness • RQ1 only considers the number of severely mispredicted key-points, differences in effectiveness across search algorithms may not appear clearly and completely • For example, two test suites generated by different algorithms may cause the same number of severely mispredicted key-points.
  • 19. 20 RQ2: Misprediction Severity for Individual Key-points Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) • Objective • Find the best search algorithm for generating test suite with maximum Misprediction Severity (maximum error) for each key-point. • Search algorithms (Same as RQ1) • Random Search (RS), MOSA, FITEST, MOSA+, FITEST+ • Experiment Parameters (Same as RQ1) • Search budget — 2 hours • Repetition: 20 times • Statistical analysis • Significance: Wilcoxon signed-rank test • Effect Size: Vargha and Delaney’s effect size
  • 20. 21 RQ2: Results Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) • MOSA and FITEST families subsume RS • We found that there are specific KPs that are more severely mis-predicted than others MS for individual key-points for search algorithms A B p-values Effect Size MOSA RS 0 0.897 MOSA+ RS 0 0.902 FITEST RS 0 0.876 FITEST+ RS 0 0.873 MOSA FITEST 0.57 0.541 MOSA+ FITEST 0.594 0.543 FITEST+ FITEST 0.052 0.507 MOSA FITEST+ 0.177 0.545 MOSA+ FITEST+ 0.009 0.556 MOSA+ MOSA 0.78 0.5 Statistical Analysis
  • 21. 22 RQ2: Implications Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) • Some KPs are more severely mis-predicted than others, mainly because: • Under-representation of some KPs in the training data (e.g., KP7 is only present in 79% of training data) • Large variation in the shape and size of the mouth across different 3D models, KP24, KP25, KP26, and KP27 are located on the mouth which shows the largest variation among face features • There is no statistically significant difference in MS between MOSA and MOSA+, and between FITEST and FITEST+ • This implies that, consistent with RQ1, dynamically adjusting the distribution index in crossover does not increase misprediction severity for individual key-points Sample Images showing different variations of mouth
  • 22. 23 RQ3: Explaining Mispredictions Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) • Objective • Investigate whether it is possible to provide accurate and interpretable explanations of mispredictions based on image characteristics used by the simulator to generate test images Example Regression Tree Model-ID Pitch NE = 0.04 < 18.41 ≥ 18.41 = 9 ≠ 9 … …
  • 23. 24 RQ3: Explaining Mispredictions Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) • Objective • Investigate whether it is possible to provide accurate and interpretable explanations of mispredictions based on image characteristics used by the simulator to generate test images • Approach • Build a regression tree for each KP using test results • Dataset: test images generated during the execution of our approach • Input variables: roll, pitch, yaw, and 3D model ID • Target variable: normalized prediction error (NE) of the IEE-DNN • Evaluate the (predictive) error of generated regression trees using 10-fold CV Example Regression Tree Model-ID Pitch NE = 0.04 < 18.41 ≥ 18.41 = 9 ≠ 9 … …
  • 24. 25 RQ3: Results Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) Representative rules derived from the decision tree for KP26 (M: Model-ID, P: Pitch, R: Roll, Y: Yaw) Image Characteristics Condition NE ! = 9 ∧ # < 18.41 0.04 ! = 9 ∧ # ≥ 18.41 ∧ $ < −22.31 ∧ % < 17.06 0.26 ! = 9 ∧ # ≥ 18.41 ∧ $ < −22.31 ∧ 17.06 ≤ % < 19 0.71 ! = 9 ∧ # ≥ 18.41 ∧ $ < −22.31 ∧ % ≥ 19 0.36 (A) A test image satisfying the first condition (B) A test image satisfying the third condition NE= 0.013 NE= 0.89 • Using the conditions, we performed detailed analysis to find the root causes of high NE value and found out that shadow on the location of KP26 is the cause of high NE value • The average MAE from all the trees is 0.01 (far less than IEE threshold: 0.05) with average tree size 25.7
  • 25. 26 RQ3: Implications Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) Knowing under what conditions severe mispredictions are occurring can help engineers in two ways: • Helps to assess the risks associated with individual key-points for specific conditions, in the context of a specific application • Enables the generation of specific test images, using the simulator, that are expected to cause particularly severe mispredictions and can be used for retraining the DNN
  • 26. 27 Lessons Learned Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) • Automated test suite generation is indeed useful in practice • Testing results helped IEE assess and improve the IEE-DNN • They continuously enriched the dataset by adding more training images from diverse 3D face models • They Improved the IEE-DNN’s architecture by doubling the number of hidden layers to drastically increase its accuracy • The results also helped IEE improve the simulator • The detailed analysis of the testing results showed the labeled KP positions were not accurate; this was later fixed • Understanding mispredictions is critical • Such findings led IEE to better target their development resources to improve the driver’s gaze detection system rather than just focusing on the IEE-DNN itself • Simulation-based testing brings key benefits • We can effectively generate as many different test images as needed, with ground truth
  • 27. 28 Conclusion Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) • We formalize the problem definition of KP-DNN testing and present an approach to automatically generate test data for KP-DNNs with many independent outputs • We empirically compare state-of-the-art, many-objective search algorithms and their variants tailored for test suite generation • We further investigate and demonstrate a way, based on regression trees, to learn the conditions, in terms of image characteristics, that cause severe mispredictions for individual key-point
  • 28. Automatic Test Suite Generation for Key- Points Detection DNNs using Many- Objective Search Fitash Ul Haq, Donghwan Shin, Lionel Briand, Thomas Stifter, Jun Wang Date: 14-07-2021
  • 30. 31 Pseudo-code: Many Objective Search Algorithm (animations will be added) Initialization Calculating Objectives Updating Archive and Objectives Remaining Budget Offspring Generation Calculating Objectives Updating Archive and Objectives Generating Next Generation Archive
  • 31. 32 Applying Meta-heuristic Search Algorithm Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) Automatic test data generation using meta-heuristic search algorithms is widely studied in software testing • Transform the test data generation as an optimization problem and apply meta-heuristics to cost-effectively solve it Fitness function f Initial solution xi Vector representation of facial image features (e.g., head posture) Prediction error for the IEE-DNN Search Algorithm Best solution x such that f(x) is the maximum Best facial image (features) that maximizes f
  • 32. 33 Incorrectly Predicted Key-Points Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Objective Search (Experience Paper) Originally, IEE defines a test input (image) to be unsafe if the NME (Normalized Mean Error) is greater than or equal to 0.05 • NME is the average error of all key-points We define a key-point is “incorrectly predicted” if its normalized error is greater than or equal to 0.05