Machine Learning: Generative and Discriminative Modelsbutest
The document discusses machine learning models, specifically generative and discriminative models. It provides examples of generative models like Naive Bayes classifiers and hidden Markov models. Discriminative models discussed include logistic regression and conditional random fields. The document contrasts how generative models estimate class-conditional probabilities while discriminative models directly estimate posterior probabilities. It also compares how hidden Markov models model sequential data generatively while conditional random fields model sequential data discriminatively.
Collaborative filtering is a technique used in recommender systems to predict a user's preferences based on other similar users' preferences. It involves collecting ratings data from users, calculating similarities between users or items, and making recommendations. Common approaches include user-user collaborative filtering, item-item collaborative filtering, and probabilistic matrix factorization. Recommender systems are evaluated both offline using metrics like MAE and RMSE, and through online user testing.
Ανάπτυξη Ελληνικού Συστήματος Απάντησης Ερωτήσεων Ανοιχτού ΤύπουISSEL
Ένας από τους σημαντικότερους και ταχύτερα αναπτυσσόμενους τομείς της Επιστήμης των Υπολογιστών και της Πληροφορικής είναι η Τεχνητή Νοημοσύνη. Από τα πλέον βασικά ζητήματα με τα οποία ασχολείται είναι η Επεξεργασία Φυσικής Γλώσσας, δηλαδή η ανάλυση και κατανόηση των φυσικών ανθρώπινων γλωσσών από υπολογιστικά συστήματα και η δυνατότητα αλληλεπίδρασης ανθρώπων και ”ευφυιών” συστημάτων με τη χρήση των γλωσσών αυτών. Καθώς το πλήθος των πληροφοριών αυξάνεται συνεχώς και οι άνθρωποι χρειάζονται όλο και περισσότερες πληροφορίες, ένα πολύ σημαντικό πεδίο της έρευνας στον τομέα της Επεξεργασίας Φυσικής Γλώσσας είναι η Απάντηση Ερωτήσεων. Ήδη από την έναρξη της χρήσης των υπολογιστών ήταν βασική στόχευση η δυνατότητα υποβολής ερωτήσεων και λήψης σωστών απαντήσεων από αυτούς. Μια από τις σημαντικότερες κατηγορίες συστημάτων Απάντησης Ερωτήσεων είναι τα Συστήματα Απάντησης Ερωτήσεων Ανοιχτού Τύπου, τα οποία δύνανται να απαντούν σε ερωτήσεις γενικών γνώσεων στηριζόμενα σε μια βασική πηγή γνώσης όπως είναι η Wikipedia. Η ανάπτυξη των μοντέλων Transformers και του BERT έχει οδηγήσει σε βελτιώσεις στην απόδοση των Συστημάτων Απάντησης Ερωτήσεων. Αν και η ύπαρξη αυτών των μοντέλων έχει οδηγήσει σε άνθιση του αντικειμένου της Απάντησης Ερωτήσεων, όπως και άλλων ζητημάτων με τα οποία ασχολείται το πεδίο της Επεξεργασίας Φυσικής Γλώσσας, είναι γεγονός πως τα περισσότερα Συστήματα Απάντησης Ερωτήσεων και ιδιαίτερα τα Ανοικτού-Τύπου, λειτουργούν στην αγγλική γλώσσα, ενώ τα συστήματα σε άλλες γλώσσες είναι ελάχιστα. Η παρούσα εργασία επιχειρεί να δημιουργήσει ένα Συστήματα Απάντησης Ερω τήσεων Ανοιχτού Τύπου στα ελληνικά. Για το σκοπό αυτό, ελλείψει των απαραί τητων δεδομένων εκπαίδευσης στα Ελληνικά, επιχειρείται η μηχανική μετάφραση ορισμένων κατάλληλων datasets, από την αγγλική στην ελληνική γλώσσα. Στη συ νέχεια, εκπαιδεύεται μια σειρά μοντέλων τόσο για την Απάντηση Ερωτήσεων όσο και για την Ανάκτηση Πληροφορίας, η οποία αποτελεί βασικό τμήμα κάθε συστή ματος απάντησης ερωτήσεων ανοικτού τύπου. Έπειτα, εγκαθίσταται το συνολικό σύστημα, το οποίο στηρίζεται σε δεδομένα της ελληνικής Wikipedia. Η πρόσβαση σε αυτό γίνεται με τη χρήση μιας διαδικτυακής εφαρμογής που αναπτύχθηκε. Τέλος, παρουσιάζονται τα αποτελέσματα της αξιολόγησης της απόδοσης που έγινε τόσο για το συνολικό σύστημα, όσο και για τα επιμέρους τμήματά του.
Support Vector Machine ppt presentationAyanaRukasar
Support vector machines (SVM) is a supervised machine learning algorithm used for both classification and regression problems. However, it is primarily used for classification. The goal of SVM is to create the best decision boundary, known as a hyperplane, that separates clusters of data points. It chooses extreme data points as support vectors to define the hyperplane. SVM is effective for problems that are not linearly separable by transforming them into higher dimensional spaces. It works well when there is a clear margin of separation between classes and is effective for high dimensional data. An example use case in Python is presented.
SA is a global optimization technique.
It distinguishes between different local optima.
It is a memory less algorithm & the algorithm does not use any information gathered during the search.
SA is motivated by an analogy to annealing in solids.
& it is an iterative improvement algorithm.
محاضرات متقدمة تدرس لطلاب حاسبات بنى سويف السنة الثالثة لتنمية قدراتهم البحثية وهذة الموضوعات تدرس على مستوى الدكتوراة - - نريد تميز طلاب حاسبات ليتميزو فى البحث العلمى -
The document discusses hyperparameters and hyperparameter tuning in deep learning models. It defines hyperparameters as parameters that govern how the model parameters (weights and biases) are determined during training, in contrast to model parameters which are learned from the training data. Important hyperparameters include the learning rate, number of layers and units, and activation functions. The goal of training is for the model to perform optimally on unseen test data. Model selection, such as through cross-validation, is used to select the optimal hyperparameters. Training, validation, and test sets are also discussed, with the validation set used for model selection and the test set providing an unbiased evaluation of the fully trained model.
The document introduces data preprocessing techniques for data mining. It discusses why data preprocessing is important due to real-world data often being dirty, incomplete, noisy, inconsistent or duplicate. It then describes common data types and quality issues like missing values, noise, outliers and duplicates. The major tasks of data preprocessing are outlined as data cleaning, integration, transformation and reduction. Specific techniques for handling missing values, noise, outliers and duplicates are also summarized.
Design Test Case Technique (Equivalence partitioning And Boundary value analy...Ryan Tran
At the end of this course, you are going to know:
To provide an approach to design test case.
Understand how to apply equivalence partitioning and boundary to design test case.
This document provides an overview of software testing techniques. It discusses verification, validation, different testing levels including unit, integration and system testing. It also describes various testing techniques such as white box and black box testing. Specifically, it outlines verification as ensuring user expectations are met, validation as evaluating if a system meets its specifications, and different testing levels like unit, integration and system testing. It also summarizes white box testing as analyzing internal code and black box testing as evaluating external specifications without knowledge of internal workings.
This document discusses unsupervised learning and clustering. It defines unsupervised learning as modeling the underlying structure or distribution of input data without corresponding output variables. Clustering is described as organizing unlabeled data into groups of similar items called clusters. The document focuses on k-means clustering, describing it as a method that partitions data into k clusters by minimizing distances between points and cluster centers. It provides details on the k-means algorithm and gives examples of its steps. Strengths and weaknesses of k-means clustering are also summarized.
Genetic algorithm for hyperparameter tuningDr. Jyoti Obia
This document discusses using genetic algorithms to tune hyperparameters in predictive models. It begins by providing an overview of genetic algorithms, describing them as a heuristic approach that mimics natural selection to generate multiple solutions. It then defines key terms related to genetic algorithms and chromosomes. The document outlines the genetic algorithm methodology and provides pseudocode. It applies this approach to tune hyperparameters C and gamma in an SVM model and finds it achieves higher accuracy than grid search in less computation time. In an appendix, it references related work and describes a spam email dataset used to classify emails as spam or not spam.
Machine learning(ML) is the scientific study of algorithms and statistical models that computer systems used to progressively improve their performance on a specific task. Machine learning algorithms build a mathematical model of sample data, known as “Training Data", in order to make predictions or decisions without being explicitly programmed to perform the task. Machine learning algorithms are used in the applications of email filtering, detection of network intruders and computer vision, where it is infeasible to develop an algorithm of specific instructions for performing the task. Machine learning is closely related to computational statistics, which focuses on making predictions using computers. The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning. Data mining is a field of study within machine learning and focuses on exploratory data analysis through unsupervised learning. In its application across business problems, Machine learning is the study of computer systems that learn from data and experience. It is applied in an incredibly wide variety of application areas, from medicine to advertising, from military to pedestrian. Any area in which you need to make sense of data is a potential customer of machine learning.
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...Benjamin Le
The document summarizes the practical challenges faced and lessons learned from building a personalized job recommendation system at LinkedIn. It discusses 3 key challenges - candidate selection using decision trees to generate queries, training personalized relevance models at scale using generalized linear mixed models, and realizing an ideal jobs marketplace through early intervention to redistribute job applications. The summary provides an overview while hitting the main points discussed in the document in 3 sentences or less.
The Presentation is about the quantum computers and quantum computing describing the quantum phenomena which makes the future computers 1000 times more powerful than the current computers .Also include an Artificial intelligence to tell the difference of computing power between the a conventional computer computing and a quantum computer computing.Quantum computers are still under research and development and not available for common peoples and businesses but major organization are investing highly on these future machine hardware especially U.S is spending billions of Dollars to make it happened for their future security purposes.
Important Classification and Regression Metrics.pptxChode Amarnath
This document provides an overview of important classification and regression metrics used in machine learning. It defines metrics such as mean squared error, root mean squared error, R-squared, accuracy, precision, recall, F1 score, and AUC for evaluating regression and classification models. For each metric, it provides an intuitive explanation of what the metric measures, includes examples to illustrate how it is calculated, and discusses advantages and disadvantages as well as when the metric would be appropriate. It also explains concepts like confusion matrices, true positives/negatives, and false positives/negatives that are important for understanding various classification evaluation metrics.
Ant colony optimization is a swarm intelligence technique inspired by the behavior of ants. It is used to find optimal paths or solutions to problems. The key aspects are that ants deposit pheromones as they move, influencing the paths other ants take, with shorter paths receiving more pheromones over time. This results in the emergence of the shortest path as the most favorable route. The algorithm is often applied to problems like the traveling salesman problem to find the shortest route between nodes.
This document provides an overview of model generalization and legal notices related to using Intel technologies. It discusses how the number of neighbors (k) used in k-nearest neighbors algorithms affects the decision boundary. It also compares underfitting versus overfitting based on how well models generalize during training and prediction. Key aspects covered include the bias-variance tradeoff, using training and test splits to evaluate model performance, and performing cross-validation.
The document discusses modelling and evaluation in machine learning. It defines what models are and how they are selected and trained for predictive and descriptive tasks. Specifically, it covers:
1) Models represent raw data in meaningful patterns and are selected based on the problem and data type, like regression for continuous numeric prediction.
2) Models are trained by assigning parameters to optimize an objective function and evaluate quality. Cross-validation is used to evaluate models.
3) Predictive models predict target values like classification to categorize data or regression for continuous targets. Descriptive models find patterns without targets for tasks like clustering.
4) Model performance can be affected by underfitting if too simple or overfitting if too complex,
Abstract: This PDSG workshop introduces basic concepts of splitting a dataset for training a model in machine learning. Concepts covered are training, test and validation data, serial and random splitting, data imbalance and k-fold cross validation.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
This is the presentation describing different techniques used to write test cases for software testing. You can have overview with detailed example for test case techniques. After reading this, You'll able to assume which technique can be more useful to you software testing.
Artificial Intelligence for Automated Software TestingLionel Briand
This document provides an overview of applying artificial intelligence techniques such as metaheuristic search, machine learning, and natural language processing to problems in automated software testing. It begins with introductions to software testing, relevant AI techniques including genetic algorithms, machine learning, and natural language processing. It then discusses search-based software testing (SBST) as an application of metaheuristic search to problems in test case generation and optimization. Examples are provided of representing test cases as chromosomes for genetic algorithms and defining fitness functions to guide the search for test cases that maximize code coverage.
Deep neural methods have recently demonstrated significant performance improvements in several IR tasks. In this lecture, we will present a brief overview of deep models for ranking and retrieval.
This is a follow-up lecture to "Neural Learning to Rank" (https://www.slideshare.net/BhaskarMitra3/neural-learning-to-rank-231759858)
This document provides an overview of unit testing. It defines a unit as a software component containing routines and variables. Unit testing involves testing individual units in isolation to find defects. The benefits of unit testing include refactoring code easily and making integration testing simpler. Various test types are covered, including functional, non-functional, and structure-based testing. Static and white box testing techniques like statement coverage and branch coverage are also discussed. The document concludes with guidelines for effective unit testing.
This document provides an overview of software testing concepts and definitions. It discusses key topics such as software quality, testing methods like static and dynamic testing, testing levels from unit to acceptance testing, and testing types including functional, non-functional, regression and security testing. The document is intended as an introduction to software testing principles and terminology.
This document summarizes a presentation on machine learning models, adversarial attacks, and defense strategies. It discusses adversarial attacks on machine learning systems, including GAN-based attacks. It then covers various defense strategies against adversarial attacks, such as filter-based adaptive defenses and outlier-based defenses. The presentation also addresses issues around bias in AI systems and the need for explainable and accountable AI.
Basic Introduction to Adversarial machine learning, It has topics like What is ML? What is Adversarial ML? How does one generate and manipulate models?
Autonomous Systems: How to Address the Dilemma between Autonomy and SafetyLionel Briand
Autonomous systems present safety challenges due to their complexity and use of machine learning. Two key approaches are needed to address these challenges: (1) design-time assurance cases to validate safety requirements and (2) run-time monitoring architectures to detect unsafe behavior. Automated testing techniques leveraging metaheuristics and machine learning can help provide evidence for assurance cases and learn conditions to guide run-time monitoring. However, more industrial experience is still needed to properly validate these approaches at scale for autonomous systems.
Automated Testing of Autonomous Driving Assistance SystemsLionel Briand
This document discusses automated testing of autonomous driving assistance systems. It begins by introducing autonomous systems and their testing challenges due to large and complex input spaces and lack of explicit specifications. The document then describes an approach that combines evolutionary algorithms and decision tree classification models to guide testing towards critical scenarios. Evolutionary algorithms are used to search the input space while decision trees learn to predict scenario criticality and guide the search towards critical regions. The technique iteratively refines the decision tree model and focuses search on critical regions identified in the trees. The goal is to efficiently generate failure-revealing test cases and characterize input conditions that lead to critical situations.
Design Test Case Technique (Equivalence partitioning And Boundary value analy...Ryan Tran
At the end of this course, you are going to know:
To provide an approach to design test case.
Understand how to apply equivalence partitioning and boundary to design test case.
This document provides an overview of software testing techniques. It discusses verification, validation, different testing levels including unit, integration and system testing. It also describes various testing techniques such as white box and black box testing. Specifically, it outlines verification as ensuring user expectations are met, validation as evaluating if a system meets its specifications, and different testing levels like unit, integration and system testing. It also summarizes white box testing as analyzing internal code and black box testing as evaluating external specifications without knowledge of internal workings.
This document discusses unsupervised learning and clustering. It defines unsupervised learning as modeling the underlying structure or distribution of input data without corresponding output variables. Clustering is described as organizing unlabeled data into groups of similar items called clusters. The document focuses on k-means clustering, describing it as a method that partitions data into k clusters by minimizing distances between points and cluster centers. It provides details on the k-means algorithm and gives examples of its steps. Strengths and weaknesses of k-means clustering are also summarized.
Genetic algorithm for hyperparameter tuningDr. Jyoti Obia
This document discusses using genetic algorithms to tune hyperparameters in predictive models. It begins by providing an overview of genetic algorithms, describing them as a heuristic approach that mimics natural selection to generate multiple solutions. It then defines key terms related to genetic algorithms and chromosomes. The document outlines the genetic algorithm methodology and provides pseudocode. It applies this approach to tune hyperparameters C and gamma in an SVM model and finds it achieves higher accuracy than grid search in less computation time. In an appendix, it references related work and describes a spam email dataset used to classify emails as spam or not spam.
Machine learning(ML) is the scientific study of algorithms and statistical models that computer systems used to progressively improve their performance on a specific task. Machine learning algorithms build a mathematical model of sample data, known as “Training Data", in order to make predictions or decisions without being explicitly programmed to perform the task. Machine learning algorithms are used in the applications of email filtering, detection of network intruders and computer vision, where it is infeasible to develop an algorithm of specific instructions for performing the task. Machine learning is closely related to computational statistics, which focuses on making predictions using computers. The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning. Data mining is a field of study within machine learning and focuses on exploratory data analysis through unsupervised learning. In its application across business problems, Machine learning is the study of computer systems that learn from data and experience. It is applied in an incredibly wide variety of application areas, from medicine to advertising, from military to pedestrian. Any area in which you need to make sense of data is a potential customer of machine learning.
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...Benjamin Le
The document summarizes the practical challenges faced and lessons learned from building a personalized job recommendation system at LinkedIn. It discusses 3 key challenges - candidate selection using decision trees to generate queries, training personalized relevance models at scale using generalized linear mixed models, and realizing an ideal jobs marketplace through early intervention to redistribute job applications. The summary provides an overview while hitting the main points discussed in the document in 3 sentences or less.
The Presentation is about the quantum computers and quantum computing describing the quantum phenomena which makes the future computers 1000 times more powerful than the current computers .Also include an Artificial intelligence to tell the difference of computing power between the a conventional computer computing and a quantum computer computing.Quantum computers are still under research and development and not available for common peoples and businesses but major organization are investing highly on these future machine hardware especially U.S is spending billions of Dollars to make it happened for their future security purposes.
Important Classification and Regression Metrics.pptxChode Amarnath
This document provides an overview of important classification and regression metrics used in machine learning. It defines metrics such as mean squared error, root mean squared error, R-squared, accuracy, precision, recall, F1 score, and AUC for evaluating regression and classification models. For each metric, it provides an intuitive explanation of what the metric measures, includes examples to illustrate how it is calculated, and discusses advantages and disadvantages as well as when the metric would be appropriate. It also explains concepts like confusion matrices, true positives/negatives, and false positives/negatives that are important for understanding various classification evaluation metrics.
Ant colony optimization is a swarm intelligence technique inspired by the behavior of ants. It is used to find optimal paths or solutions to problems. The key aspects are that ants deposit pheromones as they move, influencing the paths other ants take, with shorter paths receiving more pheromones over time. This results in the emergence of the shortest path as the most favorable route. The algorithm is often applied to problems like the traveling salesman problem to find the shortest route between nodes.
This document provides an overview of model generalization and legal notices related to using Intel technologies. It discusses how the number of neighbors (k) used in k-nearest neighbors algorithms affects the decision boundary. It also compares underfitting versus overfitting based on how well models generalize during training and prediction. Key aspects covered include the bias-variance tradeoff, using training and test splits to evaluate model performance, and performing cross-validation.
The document discusses modelling and evaluation in machine learning. It defines what models are and how they are selected and trained for predictive and descriptive tasks. Specifically, it covers:
1) Models represent raw data in meaningful patterns and are selected based on the problem and data type, like regression for continuous numeric prediction.
2) Models are trained by assigning parameters to optimize an objective function and evaluate quality. Cross-validation is used to evaluate models.
3) Predictive models predict target values like classification to categorize data or regression for continuous targets. Descriptive models find patterns without targets for tasks like clustering.
4) Model performance can be affected by underfitting if too simple or overfitting if too complex,
Abstract: This PDSG workshop introduces basic concepts of splitting a dataset for training a model in machine learning. Concepts covered are training, test and validation data, serial and random splitting, data imbalance and k-fold cross validation.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
This is the presentation describing different techniques used to write test cases for software testing. You can have overview with detailed example for test case techniques. After reading this, You'll able to assume which technique can be more useful to you software testing.
Artificial Intelligence for Automated Software TestingLionel Briand
This document provides an overview of applying artificial intelligence techniques such as metaheuristic search, machine learning, and natural language processing to problems in automated software testing. It begins with introductions to software testing, relevant AI techniques including genetic algorithms, machine learning, and natural language processing. It then discusses search-based software testing (SBST) as an application of metaheuristic search to problems in test case generation and optimization. Examples are provided of representing test cases as chromosomes for genetic algorithms and defining fitness functions to guide the search for test cases that maximize code coverage.
Deep neural methods have recently demonstrated significant performance improvements in several IR tasks. In this lecture, we will present a brief overview of deep models for ranking and retrieval.
This is a follow-up lecture to "Neural Learning to Rank" (https://www.slideshare.net/BhaskarMitra3/neural-learning-to-rank-231759858)
This document provides an overview of unit testing. It defines a unit as a software component containing routines and variables. Unit testing involves testing individual units in isolation to find defects. The benefits of unit testing include refactoring code easily and making integration testing simpler. Various test types are covered, including functional, non-functional, and structure-based testing. Static and white box testing techniques like statement coverage and branch coverage are also discussed. The document concludes with guidelines for effective unit testing.
This document provides an overview of software testing concepts and definitions. It discusses key topics such as software quality, testing methods like static and dynamic testing, testing levels from unit to acceptance testing, and testing types including functional, non-functional, regression and security testing. The document is intended as an introduction to software testing principles and terminology.
This document summarizes a presentation on machine learning models, adversarial attacks, and defense strategies. It discusses adversarial attacks on machine learning systems, including GAN-based attacks. It then covers various defense strategies against adversarial attacks, such as filter-based adaptive defenses and outlier-based defenses. The presentation also addresses issues around bias in AI systems and the need for explainable and accountable AI.
Basic Introduction to Adversarial machine learning, It has topics like What is ML? What is Adversarial ML? How does one generate and manipulate models?
Autonomous Systems: How to Address the Dilemma between Autonomy and SafetyLionel Briand
Autonomous systems present safety challenges due to their complexity and use of machine learning. Two key approaches are needed to address these challenges: (1) design-time assurance cases to validate safety requirements and (2) run-time monitoring architectures to detect unsafe behavior. Automated testing techniques leveraging metaheuristics and machine learning can help provide evidence for assurance cases and learn conditions to guide run-time monitoring. However, more industrial experience is still needed to properly validate these approaches at scale for autonomous systems.
Automated Testing of Autonomous Driving Assistance SystemsLionel Briand
This document discusses automated testing of autonomous driving assistance systems. It begins by introducing autonomous systems and their testing challenges due to large and complex input spaces and lack of explicit specifications. The document then describes an approach that combines evolutionary algorithms and decision tree classification models to guide testing towards critical scenarios. Evolutionary algorithms are used to search the input space while decision trees learn to predict scenario criticality and guide the search towards critical regions. The technique iteratively refines the decision tree model and focuses search on critical regions identified in the trees. The goal is to efficiently generate failure-revealing test cases and characterize input conditions that lead to critical situations.
Scalable Software Testing and Verification of Non-Functional Properties throu...Lionel Briand
This document discusses scalable software testing and verification of non-functional properties through heuristic search and optimization. It describes several projects with industry partners that use metaheuristic search techniques like hill climbing and genetic algorithms to generate test cases for non-functional properties of complex, configurable software systems. The techniques address issues of scalability and practicality for engineers by using dimensionality reduction, surrogate modeling, and dynamically adjusting the search strategy in different regions of the input space. The results provided worst-case scenarios more effectively than random testing alone.
Enabling Automated Software Testing with Artificial IntelligenceLionel Briand
1. The document discusses using artificial intelligence techniques like machine learning and natural language processing to help automate software testing. It focuses on applying these techniques to testing advanced driver assistance systems.
2. A key challenge in software testing is scalability as the input spaces and code bases grow large and complex. Effective automation is needed to address this challenge. The document describes several industrial research projects applying AI to help automate testing of advanced driver assistance systems.
3. One project aims to develop an automated testing technique for emergency braking systems in cars using a physics-based simulation. The goal is to efficiently explore complex test scenarios and identify critical situations like failures to avoid collisions.
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Lionel Briand
This document discusses search-based approaches for testing artificial intelligence systems. It covers testing at different levels, from model-level testing of individual machine learning components to system-level testing of AI-enabled systems. At the model level, search-based techniques are used to generate test inputs that target weaknesses in deep learning models. At the system level, simulations and reinforcement learning are used to test AI components integrated into complex systems. The document outlines many open challenges in AI testing and argues that search-based approaches are well-suited to address challenges due to the complex, non-linear behaviors of AI systems.
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Lionel Briand
The document discusses experiences and lessons learned from making model-driven verification practical and scalable. It describes several projects collaborating with industry partners to develop model-based solutions for verification. Key challenges addressed include achieving applicability for engineers, scalability to large systems, and developing solutions informed by real-world problems. Lessons learned emphasize the importance of collaborative applied research, defining problems in context, and validating solutions realistically.
This document discusses techniques for testing advanced driver assistance systems (ADAS) through physics-based simulation. It faces challenges due to the large, complex, and multidimensional test input space as well as the computational expense of simulation. The document proposes using a genetic algorithm guided by decision trees to more efficiently search for critical test cases. Classification trees are built to partition the input space into homogeneous regions in order to better guide the selection and generation of test inputs toward more critical areas.
Functional Safety in ML-based Cyber-Physical SystemsLionel Briand
This document discusses verification and validation of machine learning systems used in cyber-physical systems. It presents research on developing practical and scalable techniques to systematically verify the safety of deep neural network-based systems. The goals are to efficiently test for safety violations and explain any violations found to enable risk assessment. The document outlines challenges in verifying DNN components and proposes focusing on testing entire DNN-based systems. It reviews existing work and identifies limitations, such as focusing only on single images rather than scenarios involving object dynamics. Standards like ISO 26262 and SOTIF that require testing under different environmental conditions are also discussed. Explanations of any misclassifications found during testing are important for interpreting results and performing risk analysis.
Automated Testing of Autonomous Driving Assistance SystemsLionel Briand
This document discusses automated testing techniques for autonomous driving assistance systems (ADAS). It proposes using decision tree classification models and a multi-objective genetic search algorithm (NSGAII) to efficiently explore the complex scenario space of ADAS. The objectives are to identify critical, failure-revealing test scenarios by characterizing input conditions that lead to safety violations, such as the car hitting a pedestrian. Simulator-based testing of the automated emergency braking system is computationally expensive, so decision trees provide better guidance to the search by partitioning the input space into homogeneous regions.
Applications of Machine Learning and Metaheuristic Search to Security TestingLionel Briand
This document discusses testing web application firewalls (WAFs) for SQL injection (SQLi) vulnerabilities. It states that the testing goal is to generate test cases that result in executable malicious SQL statements that can bypass the WAF. It also notes that WAF filter rules often need customization to avoid false positives and protect against new attacks, but that customization is error-prone due to complex rules, time/resource constraints, and a lack of automated tools.
This document provides an overview of software defect prediction approaches from the 1970s to the present. It discusses early approaches using simple metrics like lines of code and complexity metrics. It then covers the development of prediction models using machine learning techniques like regression and classification. More recent topics discussed include just-in-time prediction models, practical applications in industry, using historical metrics from software repositories, addressing noise in data, and the feasibility of cross-project prediction. The document outlines challenges and opportunities for future work in the field of software defect prediction.
Achieving Scalability in Software Testing with Machine Learning and Metaheuri...Lionel Briand
This document discusses challenges in testing advanced driver assistance systems (ADAS) and approaches to address scalability. It describes using physics-based simulation and search-based testing to generate test cases for an automated emergency braking system. The testing faces challenges due to the large, complex input space and computational expense of simulations. Decision trees are proposed to better guide the search by partitioning the input space into homogeneous regions based on criticality.
Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)lifove
This document provides an outline and overview of approaches to software defect prediction. It discusses early approaches using lines of code and complexity metrics from the 1970s-1980s and the development of prediction models using regression and classification in the 1990s-2000s. More recent focus areas discussed include just-in-time prediction models, practical applications of prediction, using history metrics from software repositories, and assessing cross-project prediction feasibility. The document aims to survey the field of software defect prediction.
This presentation introduces the concept of Machine Learning and then discusses how Machine Learning is being used in the Predictive Maintenance domain.
Measuring the Validity of Clustering Validation Datasetsmichaelaupetit1
1-minute and 15-minute summaries of our IEEE TPAMI paper:
H. Jeon, M. Aupetit, D. Shin, A. Cho, S. Park and J. Seo, "Measuring the Validity of Clustering Validation Datasets," in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2025.3548011
Clustering is essential to data analytics.
Practitioners (Data Scientists, Domain Experts) pick a clustering technique to explore their specific domain dataset.
Researchers design clustering techniques and rank them on benchmark datasets representative of an application domain to help practitioners choose the most suitable technique.
We question the validity of benchmark datasets used for clustering validation.
We propose an axiomatic approach and its practical implementation to evaluate and rank benchmark datasets for clustering evaluation.
We show that many benchmark datasets are of low quality, which has drastic consequences when used for ranking clustering techniques.
We discuss future usage of our approach to explore how concepts cluster in the representation spaces of GenAI foundation models.
Ranked datasets
https://github.com/hj-n/labeled-datasets
Adjusted IVMs
https://github.com/hj-n/clm
Other amazing work of Hyeon Jeon
https://www.hyeonjeon.com/publications
This document provides an outline and overview of approaches to software defect prediction. It discusses early approaches using simple metrics like lines of code in the 1970s and complexity metrics/fitting models in the 1980s. Prediction models using regression and classification emerged in the 1990s. Just-in-time prediction models and practical applications in industry are discussed for the 2000s. The use of history metrics from software repositories and challenges of cross-project prediction are also summarized.
Autonomous Control AI Training from DataIvo Andreev
Simulators are the absolute necessity to mimic physical conditions and processes at scale. They provide a safe environment to test hypothesis, investigate edge cases, reduce expenses, perform training and accelerate innovation. The challenge resides in the knowledge to create simulation unless one is a domain expert with deep understanding in physics, chemistry, HVAC, manufacturing, etc.. Professionals develop dedicated models with powerful software like Simulink, AnyLogic and Matlab, though these have steep learning curve, cost, require time for tuning and customization and would severely affect the ability of a software solution to be applied in various domains. The session is about a universal approach of building model-based simulators for solving optimization and control tasks.
Testing the Untestable: Model Testing of Complex Software-Intensive SystemsLionel Briand
This document discusses model testing as an approach to testing complex, software-intensive systems that are difficult or impossible to fully automate. It presents model testing as shifting the focus of testing from implemented systems to executable models that capture relevant system behavior and properties. Model testing aims to find and execute high-risk test scenarios in large input spaces and help guide targeted testing of implemented systems. Challenges include defining testable models that include dynamic and uncertain behavior, performing effective test selection, and detecting failures under uncertainty.
The Power of Auto ML and How Does it WorkIvo Andreev
Automated ML is an approach to minimize the need of data science effort by enabling domain experts to build ML models without having deep knowledge of algorithms, mathematics or programming skills. The mechanism works by allowing end-users to simply provide data and the system automatically does the rest by determining approach to perform particular ML task. At first this may sound discouraging to those aiming to the “sexiest job of the 21st century” - the data scientists. However, Auto ML should be considered as democratization of ML, rather that automatic data science.
In this session we will talk about how Auto ML works, how is it implemented by Microsoft and how it could improve the productivity of even professional data scientists.
Precise and Complete Requirements? An Elusive GoalLionel Briand
The document discusses the challenges of achieving precise and complete requirements upfront in software development projects. It notes that while academics assume detailed requirements are needed, practitioners find this difficult to achieve in reality due to limited resources, uncertainty, and changing needs. The document provides perspectives from practice that emphasize starting with prototypes and visions rather than detailed specifications. It also summarizes research finding diverse requirements practices across different domains and organizations. The document concludes that while precise requirements may be desirable, they are often elusive goals, and the focus should be on achieving compliance and delivering working software.
Large Language Models for Test Case Evolution and RepairLionel Briand
Large language models show promise for test case repair tasks. LLMs can be applied to tasks like test case generation, classification of flaky tests, and test case evolution and repair. The paper presents TaRGet, a framework that uses LLMs for automated test case repair. TaRGet takes as input a broken test case and code changes to the system under test, and outputs a repaired test case. Evaluation shows TaRGet achieves over 80% plausible repair accuracy. The paper analyzes repair characteristics, evaluates different LLM and input/output formats, and examines the impact of fine-tuning data size on performance.
Metamorphic Testing for Web System SecurityLionel Briand
This document summarizes a presentation on metamorphic testing for web system security given by Nazanin Bayati on September 13, 2023. Metamorphic testing uses relations between the outputs of multiple test executions to test systems when specifying expected outputs is difficult. It was applied to web systems by generating follow-up inputs based on transformations of valid interactions and checking that output relations held. The approach detected over 60% of vulnerabilities in tested systems and addressed more vulnerability types than static and dynamic analysis tools. It provides an effective and automated way to test for security issues in web systems.
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...Lionel Briand
This document proposes a method called SEDE (Simulator-based Explanations for DNN Errors) to automatically generate explanations for errors in DNN-based safety-critical systems by constraining simulator parameters. SEDE first identifies clusters of error-inducing images, then uses an evolutionary algorithm to generate simulator images within each cluster, including failing, passing, and representative images. SEDE extracts rules characterizing the unsafe parameter space and uses the generated images to retrain DNNs, improving accuracy compared to alternative methods. The paper evaluates SEDE on head pose and face landmark detection DNNs in terms of generating diverse cluster images, delimiting unsafe spaces, and enhancing DNN performance.
This document summarizes a research paper on using grey-box fuzzing (MOTIF) for mutation testing of C/C++ code in cyber-physical systems (CPS). It introduces mutation testing and grey-box fuzzing, and proposes MOTIF which generates a fuzzing driver to test functions with live mutants. An empirical evaluation compares MOTIF to symbolic execution-based mutation testing on three subject programs. MOTIF killed more mutants within 10,000 seconds and was able to test programs that symbolic execution could not handle due to limitations like floating-point values. Seed inputs alone killed few mutants, showing the importance of fuzzing. MOTIF is an effective approach for mutation testing of CPS software.
Data-driven Mutation Analysis for Cyber-Physical SystemsLionel Briand
Data-driven mutation analysis is proposed to assess if test suites for cyber-physical systems properly exercise component interoperability. Fault models are developed for different data types and dependencies, and are used to automatically generate mutants by injecting faults. Empirical results on industrial systems demonstrate the feasibility and effectiveness of the approach in identifying test suite shortcomings and poor oracles.
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled SystemsLionel Briand
This document proposes MORLOT (Many-Objective Reinforcement Learning for Online Testing) to address challenges in online testing of DNN-enabled systems. MORLOT leverages many-objective search and reinforcement learning to choose test actions. It was evaluated on the Transfuser autonomous driving system in the CARLA simulator using 6 safety requirements. MORLOT was significantly more effective and efficient at finding safety violations than random search or other many-objective approaches, achieving a higher average test effectiveness for any given test budget.
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...Lionel Briand
1. The document presents ATM, a new approach for black-box test case minimization that transforms test code into abstract syntax trees and uses tree-based similarity measures and genetic algorithms to minimize test suites.
2. ATM was evaluated on the DEFECTS4J dataset and achieved a fault detection rate of 0.82 on average, significantly outperforming existing techniques, while requiring only practical execution times.
3. The best configuration of ATM used a genetic algorithm with a combined similarity measure, achieving a fault detection rate of 0.80 within 1.2 hours on average.
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...Lionel Briand
The document is a journal paper that proposes a method for black-box safety analysis and retraining of deep neural networks (DNNs) based on feature extraction and clustering of failure-inducing images. The method uses a pre-trained VGG16 model to extract features from failure images, clusters the features using DBSCAN, selects clusters that likely caused failures, and retrains the DNN to improve safety based on images in problematic clusters. An empirical evaluation on various DNNs for tasks like gaze detection showed the method effectively determined failure causes through clustering and improved models with fewer images than other approaches.
PRINS: Scalable Model Inference for Component-based System LogsLionel Briand
PRINS is a technique for scalable model inference of component-based system logs. It divides the problem into inferring individual component models and then stitching them together. The paper evaluates PRINS on several systems and compares its execution time and accuracy to MINT, a state-of-the-art model inference tool. Results show that PRINS is significantly faster than MINT, especially on larger logs, with comparable accuracy. However, stitching component models can result in larger overall system models. The paper contributes an empirical evaluation of the PRINS technique and makes its implementation publicly available.
Revisiting the Notion of Diversity in Software TestingLionel Briand
The document discusses the concept of diversity in software testing. It provides examples of how diversity has been applied in various testing applications, including test case prioritization and minimization, mutation analysis, and explaining errors in deep neural networks. The key aspects of diversity discussed are the representation of test cases, measures of distance or similarity between cases, and techniques for maximizing diversity. The document emphasizes that the best approach depends on factors like information access, execution costs, and the specific application context.
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...Lionel Briand
This document discusses the split identities of software engineering researchers between being mathematicians, social scientists, or engineers. It notes there are three main communities - formal methods and guarantees, human and social studies, and engineering automated solutions - that have different backgrounds, languages, and research methods. While diversity is good, the communities need to be better connected to work together to solve problems. The document calls for more demand-driven, collaborative research with industry to have a greater impact and produce practical solutions.
Reinforcement Learning for Test Case PrioritizationLionel Briand
1) The document discusses using reinforcement learning for test case prioritization in continuous integration environments. It compares different ranking models (listwise, pairwise, pointwise) and reinforcement learning algorithms.
2) Pairwise and pointwise ranking models generally perform better than listwise, and pairwise training times are better than pointwise. The best configuration is pairwise ranking with the ACER algorithm.
3) When compared to traditional machine learning ranking models, the best reinforcement learning configuration provides significantly better ranking accuracy than the state-of-the-art MART model.
4) However, relying solely on test execution history may not provide sufficient features for an accurate prioritization policy regardless of the approach. Enriched datasets with more features
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...Lionel Briand
The document summarizes a paper that presents Mutation Analysis for Space Software (MASS), a scalable and automated pipeline for mutation testing of cyber-physical systems software in the space domain. The pipeline includes steps to create mutants, sample and prioritize mutants, discard equivalent mutants, and compute mutation scores. An empirical evaluation on space software case studies found that MASS provides accurate mutation scores with fewer sampled mutants compared to other sampling approaches. It also enables significant time savings over non-optimized mutation analysis through test case prioritization and reduction techniques. MASS helps uncover weaknesses in test suites and ensures thorough software testing for safety-critical space systems.
On Systematically Building a Controlled Natural Language for Functional Requi...Lionel Briand
The document presents a qualitative methodology for systematically building a controlled natural language (CNL) for functional requirements. It describes extracting requirements from software requirements specifications, identifying codes within the requirements, labeling and grouping the requirements, creating a grammar by identifying the content in requirements and deriving grammar rules. An evaluation of the developed CNL called Rimay showed it could express 88% of requirements from unseen documents and reached stability after analyzing three documents.
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Lionel Briand
This document proposes SAMOTA, a surrogate-assisted many-objective optimization approach for online testing of DNN-enabled systems. SAMOTA uses global and local surrogate models to replace expensive function evaluations. It clusters local data points and builds individual surrogate models for each cluster, rather than one model for all data. An evaluation on a DNN-enabled autonomous driving system shows SAMOTA achieves better test effectiveness and efficiency than alternative approaches, and clustering local data points leads to more effective local searches than using a single local model. SAMOTA is an effective method for online testing of complex DNN systems.
Guidelines for Assessing the Accuracy of Log Message Template Identification ...Lionel Briand
The document provides guidelines for assessing the accuracy of log message template identification techniques. It discusses issues with existing accuracy metrics and proposes new metrics like Template Accuracy that are not sensitive to message frequency. It also recommends performing oracle template correction as templates extracted without source code are often incorrect. Additionally, it suggests analyzing incorrectly identified templates to understand weaknesses and provide insights to improve techniques. The guidelines aim to help properly evaluate template identification techniques for different use cases.
A Theoretical Framework for Understanding the Relationship between Log Parsin...Lionel Briand
This document proposes a theoretical framework to understand the relationship between log parsing and anomaly detection. It argues that log parsing should be viewed as an information abstraction process that converts unstructured logs into structured logs. The goal of log parsing should be to extract the minimum amount of information necessary to distinguish normal behavior from anomalies. This "minimality" and "distinguishability" can be used to define ideal log parsing results. The framework aims to provide guidance on how log parsing quality impacts anomaly detection accuracy and determine the root causes of any inaccuracies.
Requirements in Cyber-Physical Systems: Specifications and ApplicationsLionel Briand
This document discusses requirements engineering challenges for cyber-physical systems (CPS) and provides examples of applications. It presents research on specifying and verifying requirements for CPS through signal-based temporal properties (SBTPs). Formal languages like STL, STL*, and SFO are assessed for expressing SBTPs. Applications discussed include generating test oracles for automotive controllers, developing a taxonomy and formal specification framework for SBTPs, generating online test oracles using a restricted first-order logic, and developing a domain-specific language called SB-TemPsy-DSL for specifying SBTPs to enable trace checking of system requirements.
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Eric D. Schabell
It's time you stopped letting your telemetry data pressure your budgets and get in the way of solving issues with agility! No more I say! Take back control of your telemetry data as we guide you through the open source project Fluent Bit. Learn how to manage your telemetry data from source to destination using the pipeline phases covering collection, parsing, aggregation, transformation, and forwarding from any source to any destination. Buckle up for a fun ride as you learn by exploring how telemetry pipelines work, how to set up your first pipeline, and exploring several common use cases that Fluent Bit helps solve. All this backed by a self-paced, hands-on workshop that attendees can pursue at home after this session (https://o11y-workshops.gitlab.io/workshop-fluentbit).
Download Wondershare Filmora Crack [2025] With Latesttahirabibi60507
Copy & Past Link 👉👉
http://drfiles.net/
Wondershare Filmora is a video editing software and app designed for both beginners and experienced users. It's known for its user-friendly interface, drag-and-drop functionality, and a wide range of tools and features for creating and editing videos. Filmora is available on Windows, macOS, iOS (iPhone/iPad), and Android platforms.
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Ranjan Baisak
As software complexity grows, traditional static analysis tools struggle to detect vulnerabilities with both precision and context—often triggering high false positive rates and developer fatigue. This article explores how Graph Neural Networks (GNNs), when applied to source code representations like Abstract Syntax Trees (ASTs), Control Flow Graphs (CFGs), and Data Flow Graphs (DFGs), can revolutionize vulnerability detection. We break down how GNNs model code semantics more effectively than flat token sequences, and how techniques like attention mechanisms, hybrid graph construction, and feedback loops significantly reduce false positives. With insights from real-world datasets and recent research, this guide shows how to build more reliable, proactive, and interpretable vulnerability detection systems using GNNs.
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?steaveroggers
Migrating from Lotus Notes to Outlook can be a complex and time-consuming task, especially when dealing with large volumes of NSF emails. This presentation provides a complete guide on how to batch export Lotus Notes NSF emails to Outlook PST format quickly and securely. It highlights the challenges of manual methods, the benefits of using an automated tool, and introduces eSoftTools NSF to PST Converter Software — a reliable solution designed to handle bulk email migrations efficiently. Learn about the software’s key features, step-by-step export process, system requirements, and how it ensures 100% data accuracy and folder structure preservation during migration. Make your email transition smoother, safer, and faster with the right approach.
Read More:- https://www.esofttools.com/nsf-to-pst-converter.html
WinRAR Crack for Windows (100% Working 2025)sh607827
copy and past on google ➤ ➤➤ https://hdlicense.org/ddl/
WinRAR Crack Free Download is a powerful archive manager that provides full support for RAR and ZIP archives and decompresses CAB, ARJ, LZH, TAR, GZ, ACE, UUE, .
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)Andre Hora
Software testing plays a crucial role in the contribution process of open-source projects. For example, contributions introducing new features are expected to include tests, and contributions with tests are more likely to be accepted. Although most real-world projects require contributors to write tests, the specific testing practices communicated to contributors remain unclear. In this paper, we present an empirical study to understand better how software testing is approached in contribution guidelines. We analyze the guidelines of 200 Python and JavaScript open-source software projects. We find that 78% of the projects include some form of test documentation for contributors. Test documentation is located in multiple sources, including CONTRIBUTING files (58%), external documentation (24%), and README files (8%). Furthermore, test documentation commonly explains how to run tests (83.5%), but less often provides guidance on how to write tests (37%). It frequently covers unit tests (71%), but rarely addresses integration (20.5%) and end-to-end tests (15.5%). Other key testing aspects are also less frequently discussed: test coverage (25.5%) and mocking (9.5%). We conclude by discussing implications and future research.
Interactive Odoo Dashboard for various business needs can provide users with dynamic, visually appealing dashboards tailored to their specific requirements. such a module that could support multiple dashboards for different aspects of a business
✅Visit And Buy Now : https://bit.ly/3VojWza
✅This Interactive Odoo dashboard module allow user to create their own odoo interactive dashboards for various purpose.
App download now :
Odoo 18 : https://bit.ly/3VojWza
Odoo 17 : https://bit.ly/4h9Z47G
Odoo 16 : https://bit.ly/3FJTEA4
Odoo 15 : https://bit.ly/3W7tsEB
Odoo 14 : https://bit.ly/3BqZDHg
Odoo 13 : https://bit.ly/3uNMF2t
Try Our website appointment booking odoo app : https://bit.ly/3SvNvgU
👉Want a Demo ?📧 [email protected]
➡️Contact us for Odoo ERP Set up : 091066 49361
👉Explore more apps: https://bit.ly/3oFIOCF
👉Want to know more : 🌐 https://www.axistechnolabs.com/
#odoo #odoo18 #odoo17 #odoo16 #odoo15 #odooapps #dashboards #dashboardsoftware #odooerp #odooimplementation #odoodashboardapp #bestodoodashboard #dashboardapp #odoodashboard #dashboardmodule #interactivedashboard #bestdashboard #dashboard #odootag #odooservices #odoonewfeatures #newappfeatures #odoodashboardapp #dynamicdashboard #odooapp #odooappstore #TopOdooApps #odooapp #odooexperience #odoodevelopment #businessdashboard #allinonedashboard #odooproducts
🌍📱👉COPY LINK & PASTE ON GOOGLE http://drfiles.net/ 👈🌍
Adobe Illustrator is a powerful, professional-grade vector graphics software used for creating a wide range of designs, including logos, icons, illustrations, and more. Unlike raster graphics (like photos), which are made of pixels, vector graphics in Illustrator are defined by mathematical equations, allowing them to be scaled up or down infinitely without losing quality.
Here's a more detailed explanation:
Key Features and Capabilities:
Vector-Based Design:
Illustrator's foundation is its use of vector graphics, meaning designs are created using paths, lines, shapes, and curves defined mathematically.
Scalability:
This vector-based approach allows for designs to be resized without any loss of resolution or quality, making it suitable for various print and digital applications.
Design Creation:
Illustrator is used for a wide variety of design purposes, including:
Logos and Brand Identity: Creating logos, icons, and other brand assets.
Illustrations: Designing detailed illustrations for books, magazines, web pages, and more.
Marketing Materials: Creating posters, flyers, banners, and other marketing visuals.
Web Design: Designing web graphics, including icons, buttons, and layouts.
Text Handling:
Illustrator offers sophisticated typography tools for manipulating and designing text within your graphics.
Brushes and Effects:
It provides a range of brushes and effects for adding artistic touches and visual styles to your designs.
Integration with Other Adobe Software:
Illustrator integrates seamlessly with other Adobe Creative Cloud apps like Photoshop, InDesign, and Dreamweaver, facilitating a smooth workflow.
Why Use Illustrator?
Professional-Grade Features:
Illustrator offers a comprehensive set of tools and features for professional design work.
Versatility:
It can be used for a wide range of design tasks and applications, making it a versatile tool for designers.
Industry Standard:
Illustrator is a widely used and recognized software in the graphic design industry.
Creative Freedom:
It empowers designers to create detailed, high-quality graphics with a high degree of control and precision.
Solidworks Crack 2025 latest new + license codeaneelaramzan63
Copy & Paste On Google >>> https://dr-up-community.info/
The two main methods for installing standalone licenses of SOLIDWORKS are clean installation and parallel installation (the process is different ...
Disable your internet connection to prevent the software from performing online checks during installation
Douwan Crack 2025 new verson+ License codeaneelaramzan63
Copy & Paste On Google >>> https://dr-up-community.info/
Douwan Preactivated Crack Douwan Crack Free Download. Douwan is a comprehensive software solution designed for data management and analysis.
FL Studio Producer Edition Crack 2025 Full Versiontahirabibi60507
Copy & Past Link 👉👉
http://drfiles.net/
FL Studio is a Digital Audio Workstation (DAW) software used for music production. It's developed by the Belgian company Image-Line. FL Studio allows users to create and edit music using a graphical user interface with a pattern-based music sequencer.
Download YouTube By Click 2025 Free Full Activatedsaniamalik72555
Copy & Past Link 👉👉
https://dr-up-community.info/
"YouTube by Click" likely refers to the ByClick Downloader software, a video downloading and conversion tool, specifically designed to download content from YouTube and other video platforms. It allows users to download YouTube videos for offline viewing and to convert them to different formats.
Who Watches the Watchmen (SciFiDevCon 2025)Allon Mureinik
Tests, especially unit tests, are the developers’ superheroes. They allow us to mess around with our code and keep us safe.
We often trust them with the safety of our codebase, but how do we know that we should? How do we know that this trust is well-deserved?
Enter mutation testing – by intentionally injecting harmful mutations into our code and seeing if they are caught by the tests, we can evaluate the quality of the safety net they provide. By watching the watchmen, we can make sure our tests really protect us, and we aren’t just green-washing our IDEs to a false sense of security.
Talk from SciFiDevCon 2025
https://www.scifidevcon.com/courses/2025-scifidevcon/contents/680efa43ae4f5
Not So Common Memory Leaks in Java WebinarTier1 app
This SlideShare presentation is from our May webinar, “Not So Common Memory Leaks & How to Fix Them?”, where we explored lesser-known memory leak patterns in Java applications. Unlike typical leaks, subtle issues such as thread local misuse, inner class references, uncached collections, and misbehaving frameworks often go undetected and gradually degrade performance. This deck provides in-depth insights into identifying these hidden leaks using advanced heap analysis and profiling techniques, along with real-world case studies and practical solutions. Ideal for developers and performance engineers aiming to deepen their understanding of Java memory management and improve application stability.
Societal challenges of AI: biases, multilinguism and sustainabilityJordi Cabot
Towards a fairer, inclusive and sustainable AI that works for everybody.
Reviewing the state of the art on these challenges and what we're doing at LIST to test current LLMs and help you select the one that works best for you
This presentation explores code comprehension challenges in scientific programming based on a survey of 57 research scientists. It reveals that 57.9% of scientists have no formal training in writing readable code. Key findings highlight a "documentation paradox" where documentation is both the most common readability practice and the biggest challenge scientists face. The study identifies critical issues with naming conventions and code organization, noting that 100% of scientists agree readable code is essential for reproducible research. The research concludes with four key recommendations: expanding programming education for scientists, conducting targeted research on scientific code quality, developing specialized tools, and establishing clearer documentation guidelines for scientific software.
Presented at: The 33rd International Conference on Program Comprehension (ICPC '25)
Date of Conference: April 2025
Conference Location: Ottawa, Ontario, Canada
Preprint: https://arxiv.org/abs/2501.10037
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMaxim Salnikov
Imagine if apps could think, plan, and team up like humans. Welcome to the world of AI agents and agentic user interfaces (UI)! In this session, we'll explore how AI agents make decisions, collaborate with each other, and create more natural and powerful experiences for users.
Join Ajay Sarpal and Miray Vu to learn about key Marketo Engage enhancements. Discover improved in-app Salesforce CRM connector statistics for easy monitoring of sync health and throughput. Explore new Salesforce CRM Synch Dashboards providing up-to-date insights into weekly activity usage, thresholds, and limits with drill-down capabilities. Learn about proactive notifications for both Salesforce CRM sync and product usage overages. Get an update on improved Salesforce CRM synch scale and reliability coming in Q2 2025.
Key Takeaways:
Improved Salesforce CRM User Experience: Learn how self-service visibility enhances satisfaction.
Utilize Salesforce CRM Synch Dashboards: Explore real-time weekly activity data.
Monitor Performance Against Limits: See threshold limits for each product level.
Get Usage Over-Limit Alerts: Receive notifications for exceeding thresholds.
Learn About Improved Salesforce CRM Scale: Understand upcoming cloud-based incremental sync.
Adobe After Effects Crack FREE FRESH version 2025kashifyounis067
🌍📱👉COPY LINK & PASTE ON GOOGLE http://drfiles.net/ 👈🌍
Adobe After Effects is a software application used for creating motion graphics, special effects, and video compositing. It's widely used in TV and film post-production, as well as for creating visuals for online content, presentations, and more. While it can be used to create basic animations and designs, its primary strength lies in adding visual effects and motion to videos and graphics after they have been edited.
Here's a more detailed breakdown:
Motion Graphics:
.
After Effects is powerful for creating animated titles, transitions, and other visual elements to enhance the look of videos and presentations.
Visual Effects:
.
It's used extensively in film and television for creating special effects like green screen compositing, object manipulation, and other visual enhancements.
Video Compositing:
.
After Effects allows users to combine multiple video clips, images, and graphics to create a final, cohesive visual.
Animation:
.
It uses keyframes to create smooth, animated sequences, allowing for precise control over the movement and appearance of objects.
Integration with Adobe Creative Cloud:
.
After Effects is part of the Adobe Creative Cloud, a suite of software that includes other popular applications like Photoshop and Premiere Pro.
Post-Production Tool:
.
After Effects is primarily used in the post-production phase, meaning it's used to enhance the visuals after the initial editing of footage has been completed.
5. Importance
• ML components are increasingly part of safety- or mission-
critical systems (ML-enabled systems - MLS)
• Many domains, including aerospace, automotive, health care,
…
• Many ML algorithms, supervised vs. unsupervised,
classification vs regression, etc.
• But increasing use of deep learning and reinforcement
learning
5
6. Example Applications in Automotive
• Object detection, identification, classification,
localization and prediction of movement
• Sensor fusion and scene comprehension, e.g., lane
detection
• Driver monitoring
• Driver replacement
• Functional safety, security
• Powertrains, e.g., improve motor control and battery
management
6
Tian et al. 2018
8. Testing Levels
• Levels: Input, model, integration, system (Riccio et al., 2020)
• Research largely focused on model testing
• Integration: Issues that arise when multiple models and
components are integrated
• System: Test the MLS in its target environment, in-field or
simulated
• Cross-cutting concerns: scalability, realism
8
9. Information Access
• Black-box: Model inputs and outputs
• Data-box: Training and test set originally used (Riccio et al.,
2020)
• White-box: runtime state (neuron activation),
hyperparameters, weight and biases
• In practice, data-box and white-box access are not always
guaranteed, e.g., third party provider
9
10. ML Model Testing Objectives
• Correctness of classifications and predictions (regression)
• Robustness (to noise or attacks)
• Fairness (e.g., gender, race …)
• Efficient: Learning and prediction speed
• Failures: imperfect training (training set, overfitting …), hyper-
parameters, model structure …
• But what do these failures really entail for the system?
10
11. Challenges: Overview
• Behavior driven by training data and learning process
• Neither specifications nor code
• Huge input space, especially for autonomous systems
• Test suite adequacy, i.e., when is it good enough?
• Automated oracles
• Model results may be hard to interpret
11
13. Large Input Space
• Inputs take a variety of forms: images, code, text, simulation
configuration parameters, …
• Incredibly large input spaces
• Cost of test execution (including simulation) can be high
• Labelling effort, when no automation is possible, is high
13
14. 14
Automated Emergency Braking
System (AEB)
14
“Brake-request”
when braking is needed
to avoid collisions
Decision making
Vision
(Camera)
Sensor
Brake
Controller
Objects’
position/speed
15. AEB Input-Output Domain
15
- intensity: Real
SceneLight
Dynamic
Object
1
- weatherType:
Condition
Weather
- fog
- rain
- snow
- normal
«enumeration»
Condition
- field of view:
Real
Camera
Sensor
RoadSide
Object
- roadType: RT
Road
1 - curved
- straight
- ramped
«enumeration»
RT
- v0: Real
Vehicle
- x0: Real
- y0: Real
- θ: Real
- v0: Real
Pedestrian
- x: Real
- y: Real
Position
1
*
1
*
1
1
- state: Boolean
Collision
Parked
Cars
Trees
- simulationTime:
Real
- timeStep: Real
Test Scenario
AEB
- certainty: Real
Detection
1
1
1
1
1
1
1
1
«positioned»
«uses»
1 1
- AWA
Output
Trajectory
Environment inputs
Mobile object inputs
Outputs
16. Inputs: Adversarial or “Natural”?
• Adversarial inputs: Focus on robustness, e.g., to noise or
attacks
• Natural inputs: Focus on functional aspects, e.g., functional
safety
16
17. Adversarial Examples
• Szegedy et al. first indicated an intriguing weakness of DNNs
in the context of image classification
• “Applying an imperceptible perturbation to a test image is
possible to arbitrarily change the DNN’s prediction”
17
Adversarial example due to noise (Goodfellow et al., 2014)
18. Adversarial Inputs
• Input changes that are not expected to lead to any
(significant) change in model prediction or decision
• Are often not realistic
• Techniques: Image processing, image transformations (GAN),
fuzzing …
• Many papers, most not in software engineering venues
18
19. “Natural” Inputs
• Functional aspects
• Inputs should normally be realistic
• May suffer from the oracle problem, i.e., what should be the
expected classification or prediction for new inputs?
19
20. Generating Realistic Inputs
• Characterizing and measuring realism (naturalness) of inputs
• Domain-specific, semantic-preserving transformations
• Metamorphic transformations and relations
• High fidelity simulator
20
21. Single-Image Test Inputs
• Several works have proposed in the context of ADAS, where the
test inputs are generated by applying label-preserving changes to
existing already-labeled data (Tian et al., Zhang et al., 2018)
21
Original image Test image
(generated by adding fog)
22. Testing via Physics-based
Simulation
22
ADAS
(SUT)
Simulator (Matlab/Simulink)
Model
(Matlab/Simulink)
▪ Physical plant (vehicle / sensors / actuators)
▪ Other cars
▪ Pedestrians
▪ Environment (weather / roads / traffic signs)
Test input
Test output
time-stamped output
23. Test Scenarios
• Most of existing research focus on
• Testing DNN components, not systems containing them
• Label-preserving changes, e.g., to images
• Limited in terms of searching for functional (safety) violations
• Research accounting for the impact of object dynamics (e.g., car
speed) in different scenarios (e.g., specific configurations of roads)
is limited.
• ISO/PAS Road vehicles SOTIF requirements: In-the-loop testing of
“relevant” scenarios in different environmental conditions
23
24. Test Adequacy Criteria
• Work focused on DNNs, many papers (~30 criteria)
• Neuron activation values, comparison of training and test data
• Questionable empirical evaluations
• Evaluations focused on finding adversarial inputs
• Require access to the DNN internals and sometimes the
training set. Not realistic in many practical settings.
24
25. Examples
• Neuron coverage: counts activated neurons over total neurons (Tian
et al., 2018) -- coarse and easy to achieve
• Variants of neuron coverage: based on activation distributions
during training, inspired by combinatorial testing, e.g., k-
multisection neuron coverage and t-way combination sparse
coverage (Ma et al., 2018)
• Surprise adequacy: relying on the training data, calculate diversity
of test inputs using continuous neuron activation values (Kim et al.
2019)
25
26. DeepImportance Coverage
26
The Importance-Driven test adequacy criterion of
DeepImportance is satisfied when all combinations of
important neurons clusters are exercised
Gerasimou et al., 2020
27. Limitations
• Code coverage assumes:
• (1) the homogeneity of inputs covering the same part of a program
• (2) the diversity of inputs as indicated by coverage metrics
• According to Li et al. (2019):
• these assumptions break down for DNNs and adversarial inputs
• Using coverage and found adversarial inputs as a measure of robustness is
questionable
• There is a weak correlation between coverage and misclassification for natural
inputs
• Scalability for the most complex coverage metrics?
27
28. What is the purpose of coverage
adequacy criteria?
29. Use Cases
• Adequacy of test suites: Is the coverage metric a good indicator of
“quality” (e.g., robustness, safety) for a DNN?
• Guiding test generation to optimize coverage
• Simulator: Search the configuration parameter space
• Input transformations: Explore the space of possible
transformations, e.g., weather and light transformations on road
images
• Support test selection, e.g., from image banks, a subset of images
to be labelled, such as to optimize coverage
29
30. Comparison Criteria
• Criteria:
• Performance (accuracy, correlation …)
• Prerequisites and assumptions , e.g., activation functions
• Supported DNN architectures
• Computational complexity
• Instance-level or set-level analysis
• Automation level
• Existing empirical studies are not systematic and consistent
30
31. Use Cases vs. Performance
• Model-level performance: mispredictions and misclassifications
must be evaluated in the context of each use case
• Test adequacy: correlation between coverage and mispredictions
detected
• Test selection: mispredictions detected for a given test set size
• Test generation: cost-effectiveness of mispredictions detection,
e.g., pace of increase in detections as test suite increases (e.g.,
APFD)
31
32. Failures in MLS
• Model level: misclassifications, square error (regression)
• Uncertainty inherent to ML training
• What is a failure then in an MLS?
• Expected robustness of MLS to ML errors
• Domain-specific definition of failure at system level
• MLS failures result from both mispredictions and
effectiveness of countermeasures, e.g., safety monitors
32
33. Example: Key-points Detection
• DNNs used for key-points detection in
images
• Many applications, e.g., face recognition
• Testing: Find test suite that causes DNN
to poorly predict as many key-points as
possible within time budget
• Impact of poor predictions on MLS?
Alternative key-points can be used for
the same purpose.
33
Ground truth
Predicted
34. Oracles (1)
• It may be difficult to manually determine the correct outputs
of a model for a (large) set of inputs
• Effort-intensive, third-party data labelling companies
• Comparing multiple DNN implementations (practical?
Effective?)
• Semantic-preserving mutations, metamorphic
transformations. May require domain expertise.
34
35. Oracles (2)
• Domain-specific requirements, e.g., system safety violations
• Simulators can help automate the oracle, if they have
sufficient fidelity. Common in many industrial domains.
• Mispredictions may be unavoidable, and accepted
35
36. Example: Key-points Detection
36
Input Generator Simulator
Input (vector)
DNN
Fitness
Calculator
Actual Key-points Positions
Predicted Key-points Positions
Fitness Score
(Error Value)
Test
Image Most Critical
Test Input
37. Offline and Online Testing
• For many MLS, considering single inputs is not adequate. Sequences
must be considered as context.
• Offline testing is less expensive but does not account for physical
dynamics and cumulative effects of prediction uncertainty over time.
• How do offline and online testing results differ and complement each
other? 37
Offline Online
38. Simulation
• A necessity for testing in most domains, e.g., avionics
• Reduce cost and risk
• Level of fidelity
• Completeness and realism (scenario space)
• Level of control through configuration parameters
• Run-time efficiency
• Technology varies widely across domains
38
40. Offline vs. Online Testing?
• How do offline and online testing results differ and complement
each other?
• For the same simulator-generated datasets, we compared the
offline and online testing results
40
Fitash Ul Haq Shiva Nejati
Donghwan Shin
41. Offline vs. Online Testing?
• With online testing, in a closed-loop context, small prediction errors
accumulate, eventually causing a critical lane departure
• The experimental results imply that offline testing cannot properly
reveal safety violations in ADAS-DNNs: It is too optimistic
• But offline testing is the main focus of published research
Online Testing Result
Offline Testing Result
42. ML and Functional Safety
• Requires to assess risks in a realistic
fashion
• Account for conditions and consequences
of failures
• Is the uncertainty associated with an ML
model acceptable?
• With ML, automated support is required,
given the difficulties in interpreting model
test results
42
43. Explaining Misclassifications
• Based on visual heatmaps: use colors to capture the extent to which
different features contribute to the misclassification of the input.
• State-of-the-art
• black-box techniques: perturbations of input image
• white-box techniques: backward propagation of prediction score
• They require, in our context, unreasonable amounts of manual analysis
work to help explain safety violations based on image heatmaps
43
Black sheep
misclassified as
cow
44. Empirical Studies
• Relatively few studies in industrial contexts, despite the
widespread use of ML
• Lesser focus on integration and system testing levels, much more
studies at the model level
• Limited domains: Focus on digit recognition and image
classification at the model level, ADAS and a few other autonomous
systems at the system level
• No widespread agreement on how to perform such studies – every
paper is different, e.g., Li et al. 2019
44
46. What are the factors affecting
MLS testing solutions?
47. Objectives
• Robustness to noise (e.g., sensors) and attacks?
• Requirements (e.g., safety) violations
• Testing level
47
48. Constraints
• Importance of environment dynamics
• Availability of simulator with sufficient fidelity
• Availability of a data bank, e.g., road images
• Access to model internal details, training set
• Domain expertise, e.g., required for metamorphic
transformations
48
49. Test Generation: Strategies
• Input mutation with semantic-preserving transformations,
e.g., changing weather or lighting conditions in images,
adversarial attacks
• Metamorphic transformations and relations, e.g., rotation of
coordinates for drone control
• Meta-heuristic search, for example through the
configuration space of a simulator or an image bank
49
50. Many papers make general claims
without clearly positioning
themselves in the problem space
51. Example 1
• Robustness to real-world changes on images, e.g., weather,
light, on automated driving decisions, e.g., steering angle
• No oracle problem (classification or prediction should not
change)
• Solution: Generation with Generative Adversarial Network
• Challenges: Limited to label-preserving changes to existing
images, offline testing
51
52. Example 1
52
Snowy and rainy scenes synthesized by Generative Adversarial Network (GAN)
Zhang et al. 2018
53. Example 2
• Autonomous driving system with important dynamics in the environments
• Compliance with safety requirements
• Online testing is therefore required
• Availability of a high-fidelity simulator
• No access to model internal information
• Challenges: Computational complexity due to simulator, large input space
53
55. 55
Automated Emergency Braking
System (AEB)
55
“Brake-request”
when braking is needed
to avoid collisions
Decision making
Vision
(Camera)
Sensor
Brake
Controller
Objects’
position/speed
56. Our Approach
• We use multi-objective search algorithm (NSGAII).
• Objective Functions:
• We use decision tree classification models to speed up the
search and explain violations.
• Each search iteration calls simulation to compute objective
functions.
56
1. Minimum distance between the pedestrian and the
field of view
2. The car speed at the time of collision
3. The probability that the object detected is a pedestrian
57. Multiple Objectives: Pareto Front
57
Individual A Pareto
dominates individual B if
A is at least as good as B
in every objective
and better than B in at
least one objective.
Dominated by x
F1
F2
Pareto front
x
• A multi-objective optimization algorithm (e.g., NSGA II) must:
• Guide the search towards the global Pareto-Optimal front.
• Maintain solution diversity in the Pareto-Optimal front.
60. Search Guided by Classification
60
Test input generation (NSGA II)
Evaluating test inputs
Build a classification tree
Select/generate tests in the fittest regions
Apply genetic operators
Input data ranges/dependencies + Simulator + Fitness functions
(candidate)
test inputs
Simulate every (candidate) test
Compute fitness functions
Fitness
values
Test cases revealing worst case system behaviors +
A characterization of critical input regions
61. Generated Decision Trees
61
GoodnessOfFit
RegionSize
1 5 6
4
2 3
0.40
0.50
0.60
0.70
tree generations
(b)
0.80
7
1 5 6
4
2 3
0.00
0.05
0.10
0.15
tree generations
(a)
0.20
7
GoodnessOfFit
-crt
1 5 6
4
2 3
0.30
0.50
0.70
tree generations
(c)
0.90
7
The generated critical regions consistently become smaller, more
homogeneous and more precise over successive tree generations of
NSGAII-DT
62. Engineers’ Feedback
• The characterizations (decision trees) of the different
critical regions can help with:
(1) Debugging the system model
(2) Identifying possible hardware changes to increase
ADAS safety
(3) Providing proper warnings to drivers
62
63. Meta-heuristic Search
• An important solution element in MLS testing as well, perhaps even
more important given the lack of specifications and code for ML
components
• Search guided by:
• coverage, e.g., DNN neuron coverage
• (safety) requirements
• Research: Comparisons of white-box and black-box approaches?
• Though in practice this is determined by practical considerations
63
64. Example: Key-points Detection
•Automatically detecting key-points in
an image or a video, e.g., face
recognition, drowsiness detection
• Key-point Detection DNNs (KP-DNNs) are
widely used to detect key-points in an image
•It is essential to check how accurate
KP-DNNs are when applied to various
test data
64
Ground truth
Predicted
65. Problem Definition
• In the drowsiness or gaze detection problem, each Key-Point (KP) may be
highly important for safety
• Each KP leads to a requirement and test objective
• For our subject DNN, we have 27 requirements
• Goal: cause the DNN to mis-predict as many key-points as possible
• Solution: many-objective search algorithms combined with simulator
65
Fitash
Ul Haq
Donghwan
Shin
66. Overview
66
Input Generator Simulator
Input (vector)
DNN
Fitness
Calculator
Actual Key-points Positions
Predicted Key-points Positions
Fitness Score
(Error Value)
Most Critical
Test Input
Test
Image
67. Results
• Our approach is effective in generating test suites that cause the DNN to
severely mispredict more than 93% of all key-points on average
• Not all mispredictions can be considered failures …
• Some key-points are more severely predicted than others, detailed
analysis revealed two reasons:
• Under-representation of some key-points (hidden) in the training data
• Large variation in the shape and size of the mouth across different 3D
models (more training needed)
67
68. Interpretation
• Regression trees
• Detailed analysis to find the root causes of high NE value, e.g., shadow on the location of
KP26 is the cause of high NE value
• The average MAE from all the trees is 0.01 (far less than the pre-defined threshold: 0.05)
with average tree size of 25.7. Excellent accuracy, reasonable size.
68
Image Characteristics Condition NE
! = 9 ∧ # < 18.41 0.04
! = 9 ∧ # ≥ 18.41 ∧ $ < −22.31 ∧ % < 17.06 0.26
! = 9 ∧ # ≥ 18.41 ∧ $ < −22.31 ∧ 17.06 ≤ % < 19 0.71
! = 9 ∧ # ≥ 18.41 ∧ $ < −22.31 ∧ % ≥ 19 0.36
Representative rules derived from the decision tree for KP26
(M: Model-ID, P: Pitch, R: Roll, Y: Yaw, NE: Normalized Error)
(A) A test image satisfying
the first condition
(B) A test image satisfying
the third condition
NE = 0.013 NE = 0.89
70. Safety Engineering for ML
Systems
• Understand conditions of critical failures in various settings
• Simulator: In terms of configuration parameters
• Real images: In terms of the presence of concepts
• Required for risk assessment
• Research: Techniques to achieve such understanding
70
71. Typical DNN Evaluation
• Example with images
71
Step A.
DNN Training
DNN
model
Step B.
DNN Testing
Error-inducing
test set images
Training set
images
Test set
images
DNN accuracy
72. Identification of Unsafe
Situations
• Current practice is based on manual root cause analysis:
identification of the characteristics of the system inputs that
induce the DNN to generate erroneous results
• manual inspection is error prone (many images)
• automated identification of such characteristics is the
objective of research on DNN safety analysis approaches
72
73. DNN Heatmaps
• Generate heatmaps that capture the extent to which the pixels of
an image impacted on a specific result
• Limitations:
• Heatmaps should be manually inspected to determine the reason for
misclassification
• Underrepresented (but dangerous) failure causes might be unnoticed
• DNN debugging (i.e., improvement) not automated
73
An heatmap can show that long hair
is what caused a female doctor to be
classified as nurse [Selvaraju'16]
74. Heatmap-based Unsupervised
Debugging of DNNs (HUDD)
Rely on hierarchical agglomerative clustering
to identify the distinct root causes of DNN errors in
the heatmaps of internal DNN layers
and use this information
to automatically retrain the DNN
74
Hazem
Fahmy
Fabrizio
Pastore
Mojtaba
Bagherzadeh
75. • Classification
• Gaze Detection
• Open/Closed Eyes Detection
• Headpose detection
• Regression
• Landmarks detection
75
90
270
180 0
45
22.5
67.5
337.5
315
292.5
247.5
225
202.5
157.5
135
112.5
Top
Center
B
o
t
t
o
m
L
e
f
t
Bottom
Center
B
o
t
t
o
m
R
i
g
h
t
Top
Right
Middle
Right
T
o
p
L
e
f
t
Middle
Left
Example Applications
76. 76
Step1.
Heatmap
based
clustering
Root cause clusters
C1
Step 5.
Label images
Step 4. Identify
Unsafe Images
Error-inducing
TestSet images
+
TrainSet images
Unsafe Set:
improvement set
images
belonging
to the root cause
clusters
C2 C3
Simulator execution
Step 3. Generate
new images
Collection of field data
Improvement
set: new images
(unlabeled)
C1 C2 C3
Labeled
Unsafe Set
C1 C2 C3
Step 6. DNN
Retraining
Legend:
Manual
Step
Automated
Step
Data flow
Step 2. Inspection of subset
of cluster elements.
Training set
images
Balanced
Labeled
Unsafe Set
C1 C2 C3
Improved
DNN
model
Step 6.
Bootstrap
Resampling
HUDD
77. for both the input and the internal layers
77
77
• Classification • Regression
(worst landmark propagation)
Heatmap Generation
81. MLS Robustness
• Inherent uncertainty in ML models
• Research: Testing robustness and mitigation mechanisms in MLS for
misclassifications or mispredictions
• Goal: We want to learn, as accurately as possible, the subspace in the
space defined by I’ and O’ that leads to system safety violations
• Applications: This is expected to help guide and focus the testing of B and
implement safety monitors for it.
81
82. Surrogate Models
• Online testing, coupled with a simulator, is highly important in
many domains, such as autonomous driving systems.
• E.g., more likely to find safety violations
• But online testing is computationally expensive
• Surrogate model: Model that mimics the simulator, to a certain
extent, while being much less computationally expensive
• Research: Combine search with surrogate modeling to decrease the
computational cost of testing
82
83. Test Generation with Surrogate
Automotive example, with road and driving simulation
83
Initialisation
Execute on Real
Simulation Database
Execute on Real
Simulation
Surrogate
Model
Many Objective
Search Algorithm
Most Critical
Test Cases
Most Uncertain
Test Cases
Test Generation (Search)
Minimal
Test Suite
84. Oracles
• Automation is key
• Learn metamorphic relations from user interactions, e.g.,
with active learning?
• Methodologies for stochastic oracles
• Comparisons of different types of oracles, e.g., metamorphic
relations, simulator output, in different situations
84
85. Empirical Studies
• Methodological issues, e.g., mutations
• Non-determinism in training
• Generalizability (benchmarks etc.)
• Computational costs and scalability of solutions should be
assessed, not just fault detection effectiveness
• Evaluation of models in MLS context
85
87. Testing Community
• It contributes by adapting techniques from classical software
testing
• SBST
• Adequacy criteria
• Metamorphic testing
• Mutation analysis
• Empirical methodology for software testing
87
88. Re-Focus Research (1)
• But, as usual, research is taking the path of least resistance but we
need to shift the focus to increase impact
• More focus on integration and system testing
• Not only model accuracy, but model-induced risks within a system
• Safety engineering in MLS
• More focus on black-box approaches
• Comparisons between white-box and black box approaches
88
89. Re-Focus Research (2)
• More industrial case studies, especially outside the
automotive domain
• Beyond the perception layer, the control aspects need to be
considered as well
• Online testing for autonomous systems, with hardware-in-
the-loop
• Scalability issues, e.g., due to simulations, large networks
• Beyond stateless DNNs: Reinforcement learning …
89
90. Testing ML-enabled Systems:
A Personal Perspective
Lionel Briand
http://www.lbriand.info
ICST 2021
92. Selected References
• Briand et al. "Testing the untestable: model testing of complex software-intensive systems." In Proceedings of the 38th
international conference on software engineering companion, pp. 789-792. 2016.
• Ul Haq et al. "Comparing offline and online testing of deep neural networks: An autonomous car case study." In 2020 IEEE 13th
International Conference on Software Testing, Validation and Verification (ICST), pp. 85-95. IEEE, 2020.
• Ul Haq et al. "Can Offline Testing of Deep Neural Networks Replace Their Online Testing?." arXiv preprint arXiv:2101.11118 (2021).
• Ul Haq et al. "Automatic Test Suite Generation for Key-points Detection DNNs Using Many-Objective Search." ACM International
Symposium on Software Testing and Analysis (ISSTA 2021), preprint arXiv:2012.06511 (2020).
• Fahmy et al. "Supporting DNN Safety Analysis and Retraining through Heatmap-based Unsupervised Learning." IEEE Transactions
on Reliability, Special section on Quality Assurance of Machine Learning Systems, preprint arXiv:2002.00863 (2020).
• Ben Abdessalem et al., "Testing Vision-Based Control Systems Using Learnable Evolutionary Algorithms”, ICSE 2018
• Ben Abdessalem et al., "Testing Autonomous Cars for Feature Interaction Failures using Many-Objective Search”, ASE 2018
92
93. Selected References
• Goodfellow et al. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014).
• Zhang et al. "DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems." In 33rd
IEEE/ACM International Conference on Automated Software Engineering (ASE), 2018.
• Tian et al. "DeepTest: Automated testing of deep-neural-network-driven autonomous cars." In Proceedings of the 40th
international conference on software engineering, 2018.
• Li et al. “Structural Coverage Criteria for Neural Networks Could Be Misleading”, IEEE/ACM 41st International Conference on
Software Engineering: New Ideas and Emerging Results (NIER)
• Kim et al. "Guiding deep learning system testing using surprise adequacy." In IEEE/ACM 41st International Conference on Software
Engineering (ICSE), 2019.
• Ma et al. "DeepMutation: Mutation testing of deep learning systems." In 2018 IEEE 29th International Symposium on Software
Reliability Engineering (ISSRE), 2018.
• Zhang et al. "Machine learning testing: Survey, landscapes and horizons." IEEE Transactions on Software Engineering (2020).
• Riccio et al. "Testing machine learning based systems: a systematic mapping." Empirical Software Engineering 25, no. 6 (2020)
• Gerasimou et al., “Importance-Driven Deep Learning System Testing”, IEEE/ACM 42nd International Conference on Software
Engineering, 2020
93
95. Testing in ISO 26262
• Several recommendations for testing at the unit and system levels
• e.g., Different structural coverage metrics, black-box testing
• However, such testing practices are not adequate for MLS
• The input space of ADAS is much larger than traditional automotive systems.
• No specifications or code for DNN components.
• MLS may fail without the presence of a systematic fault, e.g., inherent
limitations, incomplete training.
• Imperfect environment simulators.
• Traditional testing notions (e.g., coverage) are not clear for DNN components.
95
96. SOTIF
• ISO/PAS 21448:2019 standard: Safety of the intended functionality (SOTIF).
• Autonomy: Huge increase in functionalities relying on advanced sensing,
algorithms (ML), and actuation.
• SOTIF accounts for limitations and risks related to nominal performance of
sensors and software:
• The inability of the function to correctly comprehend the situation and operate
safely; this also includes functions that use machine learning algorithms;
• Insufficient robustness of the function with respect to sensor input variations
or diverse environmental conditions.
96