Improving the accuracy and reliability of data analysis codeJohan Carlin
1) The document discusses improving the accuracy and reliability of data analysis code through testing and version control. It emphasizes that reliable code is well-documented, generalizable beyond specific datasets, and includes tests to verify functionality.
2) Common approaches to testing include null simulations to calculate error rates and parameter recovery tests to confirm models can learn known weights. Version control through Git or SVN provides a record of code states over time.
3) The document argues that testing makes sense for scientific computing given demands on accuracy and risks of errors influencing research. Tests can target hypotheses, analysis methods, and experiment scripts.
Predicting Test Results without Execution (FSE 2024)Andre Hora
As software systems grow, test suites may become complex, making it challenging to run the tests frequently and locally. Recently, Large Language Models (LLMs) have been adopted in multiple software engineering tasks. It has demonstrated great results in code generation, however, it is not yet clear whether these models understand code execution. Particularly, it is unclear whether LLMs can be used to predict test results, and, potentially, overcome the issues of running real-world tests. To shed some light on this problem, in this paper, we explore the capability of LLMs to predict test results without execution. We evaluate the performance of the state-of-the-art GPT-4 in predicting the execution of 200 test cases of the Python Standard Library. Among these 200 test cases, 100 are passing and 100 are failing ones. Overall, we find that GPT-4 has a precision of 88.8%, recall of 71%, and accuracy of 81% in the test result prediction. However, the results vary depending on the test complexity: GPT-4 presented better precision and recall when predicting simpler tests (93.2% and 82%) than complex ones (83.3% and 60%). We also find differences among the analyzed test suites, with the precision ranging from 77.8% to 94.7% and recall between 60% and 90%. Our findings suggest that GPT-4 still needs significant progress in predicting test results.
Large Language Models for Test Case Evolution and RepairLionel Briand
Large language models show promise for test case repair tasks. LLMs can be applied to tasks like test case generation, classification of flaky tests, and test case evolution and repair. The paper presents TaRGet, a framework that uses LLMs for automated test case repair. TaRGet takes as input a broken test case and code changes to the system under test, and outputs a repaired test case. Evaluation shows TaRGet achieves over 80% plausible repair accuracy. The paper analyzes repair characteristics, evaluates different LLM and input/output formats, and examines the impact of fine-tuning data size on performance.
Compeition-Level Code Generation with AlphaCode.pptxSan Kim
AlphaCode is a system for competitive code generation that achieves top 54.3% performance on average in competitions with over 5,000 participants. It uses a large transformer model pre-trained on GitHub code and fine-tuned on a competitive programming dataset. During fine-tuning, it employs techniques like tempering and GOLD to focus on precision over recall. At test time, it generates a large number of samples, filters them based on example tests, and clusters similar programs to select submissions. Extensive evaluations on CodeContests and APPS benchmarks show AlphaCode's performance scales log-linearly with more samples and compute.
The document discusses code quality control for Joomla projects using automated tools for testing, analysis, and integration. It covers unit testing with PHPUnit, static analysis with PHP Code Sniffer and PHP Mess Detector, code coverage with PHPUnit, profiling with Xdebug, documentation with PHPDocumentor, and continuous integration with Phing and CruiseControl. Automating these processes improves code quality by detecting issues early.
Software testing: an introduction - 2017XavierDevroey
Software testing involves dynamically verifying that a program behaves as expected on a finite set of test cases. This is done because exhaustively testing every possible case is not feasible. Unit testing involves testing individual program units such as classes through automated tests that make assertions about the output. JUnit is a unit testing framework for Java that uses annotations to identify test methods and make assertions about the results.
Property-based testing an open-source compiler, pflua (FOSDEM 2015)Igalia
By Katerina Barone-Adesi.
Discover property-based testing, and see how it works on a real project, the pflua compiler.
How do you find a lot of non-obvious bugs in an afternoon? Write a property that should always be true (like "this code should have the same result before and after it's optimized"), generate random valid expressions, and study the counter-examples!
Property-based testing is a powerful technique for finding bugs quickly. It can partly replace unit tests, leading to a more flexible test suite that generates more cases and finds more bugs in less time.
It's really quick and easy to get started with property-based testing. You can use existing tools like QuickCheck, or write your own: Andy Windo and I wrote pflua-quickcheck and found a half-dozen bugs with it in one afternoon, using pure Lua and no external libraries.
In this talk, I will introduce property-based testing, demonstrate a tool for using it in Lua - and how to write your own property-based testing tool from scratch, and explain how simple properties found bugs in pflua.
(c) 2015 FOSDEM VZW
CC BY 2.0 BE
https://archive.fosdem.org/2015/
Patni has been supporting LSI in various areas including architecture design, tool development, firmware testing and enhancements, and setting up offshore development environments. Key projects included SATA command testing, MPI-2 interface testing, ATA passthrough testing, StoreLib IR testing, and Integrated RAID 1E and SimDiscovery tool testing. Patni delivered the projects on time with 100% test coverage, found and resolved multiple defects, and followed quality processes.
This document discusses test escape analysis (TEA), which analyzes defects that escaped testing to help improve testing efficiency. TEA examines past defects to identify patterns and trends, such as which test types or techniques could have caught which defects earlier. The benefits of early defect detection through improved testing are reduced costs, reputation, and engineer workload. TEA data from defect histories can show where to apply testing resources and procedure changes for maximum return.
Leveraging Existing Tests in Automated Test Generation for Web ApplicationsSALT Lab @ UBC
This document describes a technique called Testilizer that combines manual and automated testing of web applications. Testilizer leverages existing manual test cases to generate new test cases by exploring the extended state flow graph in a way that stays close to the original manual test paths. It also regenerates assertions for the new test cases by reusing, exactly matching, or generating similar assertions to those in the original test cases. An evaluation of Testilizer showed it was able to generate test cases that improved code coverage while making significant use of input data, sequences, and assertions from the original manual test cases.
p4pktgen: Automated Test Case Generation for P4 ProgramsAJAY KHARAT
Traditional network devices - fixed set of capabilities
Rise of programmable network devices in recent years
Offers great flexibility / capability than traditional network devices
Flexibility introduces new bugs:
Hardware
Toolchains
Programs
These bugs were previously covered by traditional network devices due to fixed set of capabilities
Use test cases to check whether program is behaving as intended on the device
ACSAC2016: Code Obfuscation Against Symbolic Execution AttacksSebastian Banescu
Slides from the 2016 Annual Computer Security Applications Conference (ACSAC), about the paper entitled "Code Obfuscation Against Symbolic Execution Attacks"
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
To productionize data science work (and have it taken seriously by software engineers, CTOs, clients, or the open source community), you need to write tests! Except… how can you test code that performs nondeterministic tasks like natural language parsing and modeling? This talk presents an approach to testing probabilistic functions in code, illustrated with concrete examples written for Pytest.
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...Curiosity Software Ireland
This webinar was co-hosted by Xray and Curiosity Software on 18th May 2021. Watch the on demand recording here: https://opentestingplatform.curiositysoftware.ie/xray-in-sprint-testing-webinar
In-sprint testing must tackle three pressing problems:
1. You must know exactly what needs testing before each release. There’s not time to test everything.
2. You need up-to-date and aligned test assets, including test cases, data, scripts and CI/CD artefacts.
3. Test teams must know what needs testing, when, and have on demand access to environments, tests and data.
These problems are near-impossible to crack at organisations who struggle with application complexity, rapid system change, and overly-manual testing processes. Challenges include:
1. Test creation time. Manually creating test cases, data and scripts is slow and unsystematic, resulting in low coverage tests.
2. Slow test maintenance. Changes break tests, with little time in sprints to check test cases, scripts, and data.
3. Knowing when testing is “done”. There is little measurability or peace of mind when systems “go live”.
This webinar will set out how maintaining a “digital twin” of the system under test prioritises testing time AND maintains rigorous tests in-sprint. You will see how:
1. Intuitive flowcharts generate optimised test cases, scripts, and data.
2. Feeding changes into the models maintains up-to-date tests.
3. Pushing the tests to agile test management tooling then makes sure that teams know which tests to run, when, with full traceability and a measurable definition of ‘done’.
James Walker, Curiosity’s Director of Technology, and Sérgio Freire, Head of Product Evangelism for Xray, will set out this cutting-edge approach to in-sprint testing. Günther-Matthias Bär, Test Automation Engineer at Sogeti, will then draw on implementation experience to discuss the value of the proposed approach.
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't SuckKevin Brockhoff
Kevin Brockhoff presented on accelerating software delivery through test-driven development (TDD) and behavior-driven development (BDD). TDD and BDD help identify incorrect behavior early, reduce rework, increase code confidence through automated tests, and improve code flexibility and extensibility. Brockhoff discussed key software delivery metrics like deployment frequency and change failure rate. He also covered testing best practices like the testing pyramid, unit testing frameworks, test source layout, and refactoring for testability.
Leveling Up With Unit Testing - LonghornPHP 2022Mark Niebergall
Writing unit testing on a project can seem like a daunting task, and earning team and leadership buy-in can be challenging. Level up your skillset as we cover PHPUnit and Prophecy setup with composer, writing meaningful tests, restructuring existing classes with dependency injection to allow for unit testing, using mock objects, and releasing code confidently with test coverage. We'll also discuss overcoming common biases, unit testing challenges, and shortcomings of unit testing.
[FullStack NYC 2019] Effective Unit Tests for JavaScriptHazem Saleh
The document discusses code coverage and mutation testing tools for JavaScript. It introduces karma-coverage as a code coverage plugin that can be used with Karma test runner. Traditional code coverage only measures executed code and does not guarantee tests will fail on logic changes. Mutation testing seeds code with faults to evaluate test strength by whether faults are killed by tests. The document demonstrates Stryker, a mutation testing tool for JavaScript that works with popular frameworks and provides test reports. It provides sample URLs and recommends using Stryker with Angular CLI 6.1+.
This document summarizes a thesis on automating test routine creation through natural language processing. The author proposes using word embeddings and recommender systems to automatically generate test cases from requirements documents and link them together. The methodology involves representing text as word vectors, calculating similarity between requirements and test blocks, and applying association rule mining on test block sequences. An experiment on a space operations dataset showed the approach improved productivity in test creation and requirements tracing over manual methods. Future work could explore using deep learning models and collecting additional evaluation metrics from users.
This document discusses using mock objects to make unit tests more effective and efficient. It presents a technique called Automock that can automatically generate mock code for tests based on static and dynamic analysis of the test class and its collaborators. This reduces the effort required to develop and maintain mock-based tests. An evaluation of Automock found that it reduced tester effort on mock code development by 96% and reduced mock code development time by 96% compared to manual mock code development. The automatically generated mock code was also found to have equivalent semantics to manually written mock code based on mutation testing and qualitative analysis.
This document summarizes a collaborative project called STAMP that aims to automatically generate and amplify test assets like unit test cases and configuration files to improve software quality. The project focuses on developing open source tools for test amplification in DevOps workflows. The goal is to increase test coverage, find more bugs, and improve test quality through automatic generation and variation of test assets. The tools will integrate with development toolchains and microservices. The project involves both academic and industry partners applying the techniques to various software systems.
The SonarQube Platform is made of 4 components:
- Server, Database, Plugins and Scanner
One or more SonarQube Scanners running on your Build / Continuous Integration Servers to analyze projects
Codeception: introduction to php testing (v2 - Aberdeen php)Engineor
This document introduces Codeception, an open source PHP testing framework. It discusses how Codeception provides tools for unit, integration, functional, acceptance and other types of testing. Codeception uses PHPUnit and other tools under the hood. The document demonstrates how to install, configure and run basic acceptance tests with Codeception. It also discusses how Codeception supports testing modern JavaScript-heavy frontends using Selenium or PhantomJS.
Precise and Complete Requirements? An Elusive GoalLionel Briand
The document discusses the challenges of achieving precise and complete requirements upfront in software development projects. It notes that while academics assume detailed requirements are needed, practitioners find this difficult to achieve in reality due to limited resources, uncertainty, and changing needs. The document provides perspectives from practice that emphasize starting with prototypes and visions rather than detailed specifications. It also summarizes research finding diverse requirements practices across different domains and organizations. The document concludes that while precise requirements may be desirable, they are often elusive goals, and the focus should be on achieving compliance and delivering working software.
Ad
More Related Content
Similar to FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categories and Test Code Repair (20)
Patni has been supporting LSI in various areas including architecture design, tool development, firmware testing and enhancements, and setting up offshore development environments. Key projects included SATA command testing, MPI-2 interface testing, ATA passthrough testing, StoreLib IR testing, and Integrated RAID 1E and SimDiscovery tool testing. Patni delivered the projects on time with 100% test coverage, found and resolved multiple defects, and followed quality processes.
This document discusses test escape analysis (TEA), which analyzes defects that escaped testing to help improve testing efficiency. TEA examines past defects to identify patterns and trends, such as which test types or techniques could have caught which defects earlier. The benefits of early defect detection through improved testing are reduced costs, reputation, and engineer workload. TEA data from defect histories can show where to apply testing resources and procedure changes for maximum return.
Leveraging Existing Tests in Automated Test Generation for Web ApplicationsSALT Lab @ UBC
This document describes a technique called Testilizer that combines manual and automated testing of web applications. Testilizer leverages existing manual test cases to generate new test cases by exploring the extended state flow graph in a way that stays close to the original manual test paths. It also regenerates assertions for the new test cases by reusing, exactly matching, or generating similar assertions to those in the original test cases. An evaluation of Testilizer showed it was able to generate test cases that improved code coverage while making significant use of input data, sequences, and assertions from the original manual test cases.
p4pktgen: Automated Test Case Generation for P4 ProgramsAJAY KHARAT
Traditional network devices - fixed set of capabilities
Rise of programmable network devices in recent years
Offers great flexibility / capability than traditional network devices
Flexibility introduces new bugs:
Hardware
Toolchains
Programs
These bugs were previously covered by traditional network devices due to fixed set of capabilities
Use test cases to check whether program is behaving as intended on the device
ACSAC2016: Code Obfuscation Against Symbolic Execution AttacksSebastian Banescu
Slides from the 2016 Annual Computer Security Applications Conference (ACSAC), about the paper entitled "Code Obfuscation Against Symbolic Execution Attacks"
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
To productionize data science work (and have it taken seriously by software engineers, CTOs, clients, or the open source community), you need to write tests! Except… how can you test code that performs nondeterministic tasks like natural language parsing and modeling? This talk presents an approach to testing probabilistic functions in code, illustrated with concrete examples written for Pytest.
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...Curiosity Software Ireland
This webinar was co-hosted by Xray and Curiosity Software on 18th May 2021. Watch the on demand recording here: https://opentestingplatform.curiositysoftware.ie/xray-in-sprint-testing-webinar
In-sprint testing must tackle three pressing problems:
1. You must know exactly what needs testing before each release. There’s not time to test everything.
2. You need up-to-date and aligned test assets, including test cases, data, scripts and CI/CD artefacts.
3. Test teams must know what needs testing, when, and have on demand access to environments, tests and data.
These problems are near-impossible to crack at organisations who struggle with application complexity, rapid system change, and overly-manual testing processes. Challenges include:
1. Test creation time. Manually creating test cases, data and scripts is slow and unsystematic, resulting in low coverage tests.
2. Slow test maintenance. Changes break tests, with little time in sprints to check test cases, scripts, and data.
3. Knowing when testing is “done”. There is little measurability or peace of mind when systems “go live”.
This webinar will set out how maintaining a “digital twin” of the system under test prioritises testing time AND maintains rigorous tests in-sprint. You will see how:
1. Intuitive flowcharts generate optimised test cases, scripts, and data.
2. Feeding changes into the models maintains up-to-date tests.
3. Pushing the tests to agile test management tooling then makes sure that teams know which tests to run, when, with full traceability and a measurable definition of ‘done’.
James Walker, Curiosity’s Director of Technology, and Sérgio Freire, Head of Product Evangelism for Xray, will set out this cutting-edge approach to in-sprint testing. Günther-Matthias Bär, Test Automation Engineer at Sogeti, will then draw on implementation experience to discuss the value of the proposed approach.
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't SuckKevin Brockhoff
Kevin Brockhoff presented on accelerating software delivery through test-driven development (TDD) and behavior-driven development (BDD). TDD and BDD help identify incorrect behavior early, reduce rework, increase code confidence through automated tests, and improve code flexibility and extensibility. Brockhoff discussed key software delivery metrics like deployment frequency and change failure rate. He also covered testing best practices like the testing pyramid, unit testing frameworks, test source layout, and refactoring for testability.
Leveling Up With Unit Testing - LonghornPHP 2022Mark Niebergall
Writing unit testing on a project can seem like a daunting task, and earning team and leadership buy-in can be challenging. Level up your skillset as we cover PHPUnit and Prophecy setup with composer, writing meaningful tests, restructuring existing classes with dependency injection to allow for unit testing, using mock objects, and releasing code confidently with test coverage. We'll also discuss overcoming common biases, unit testing challenges, and shortcomings of unit testing.
[FullStack NYC 2019] Effective Unit Tests for JavaScriptHazem Saleh
The document discusses code coverage and mutation testing tools for JavaScript. It introduces karma-coverage as a code coverage plugin that can be used with Karma test runner. Traditional code coverage only measures executed code and does not guarantee tests will fail on logic changes. Mutation testing seeds code with faults to evaluate test strength by whether faults are killed by tests. The document demonstrates Stryker, a mutation testing tool for JavaScript that works with popular frameworks and provides test reports. It provides sample URLs and recommends using Stryker with Angular CLI 6.1+.
This document summarizes a thesis on automating test routine creation through natural language processing. The author proposes using word embeddings and recommender systems to automatically generate test cases from requirements documents and link them together. The methodology involves representing text as word vectors, calculating similarity between requirements and test blocks, and applying association rule mining on test block sequences. An experiment on a space operations dataset showed the approach improved productivity in test creation and requirements tracing over manual methods. Future work could explore using deep learning models and collecting additional evaluation metrics from users.
This document discusses using mock objects to make unit tests more effective and efficient. It presents a technique called Automock that can automatically generate mock code for tests based on static and dynamic analysis of the test class and its collaborators. This reduces the effort required to develop and maintain mock-based tests. An evaluation of Automock found that it reduced tester effort on mock code development by 96% and reduced mock code development time by 96% compared to manual mock code development. The automatically generated mock code was also found to have equivalent semantics to manually written mock code based on mutation testing and qualitative analysis.
This document summarizes a collaborative project called STAMP that aims to automatically generate and amplify test assets like unit test cases and configuration files to improve software quality. The project focuses on developing open source tools for test amplification in DevOps workflows. The goal is to increase test coverage, find more bugs, and improve test quality through automatic generation and variation of test assets. The tools will integrate with development toolchains and microservices. The project involves both academic and industry partners applying the techniques to various software systems.
The SonarQube Platform is made of 4 components:
- Server, Database, Plugins and Scanner
One or more SonarQube Scanners running on your Build / Continuous Integration Servers to analyze projects
Codeception: introduction to php testing (v2 - Aberdeen php)Engineor
This document introduces Codeception, an open source PHP testing framework. It discusses how Codeception provides tools for unit, integration, functional, acceptance and other types of testing. Codeception uses PHPUnit and other tools under the hood. The document demonstrates how to install, configure and run basic acceptance tests with Codeception. It also discusses how Codeception supports testing modern JavaScript-heavy frontends using Selenium or PhantomJS.
Precise and Complete Requirements? An Elusive GoalLionel Briand
The document discusses the challenges of achieving precise and complete requirements upfront in software development projects. It notes that while academics assume detailed requirements are needed, practitioners find this difficult to achieve in reality due to limited resources, uncertainty, and changing needs. The document provides perspectives from practice that emphasize starting with prototypes and visions rather than detailed specifications. It also summarizes research finding diverse requirements practices across different domains and organizations. The document concludes that while precise requirements may be desirable, they are often elusive goals, and the focus should be on achieving compliance and delivering working software.
Metamorphic Testing for Web System SecurityLionel Briand
This document summarizes a presentation on metamorphic testing for web system security given by Nazanin Bayati on September 13, 2023. Metamorphic testing uses relations between the outputs of multiple test executions to test systems when specifying expected outputs is difficult. It was applied to web systems by generating follow-up inputs based on transformations of valid interactions and checking that output relations held. The approach detected over 60% of vulnerabilities in tested systems and addressed more vulnerability types than static and dynamic analysis tools. It provides an effective and automated way to test for security issues in web systems.
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...Lionel Briand
This document proposes a method called SEDE (Simulator-based Explanations for DNN Errors) to automatically generate explanations for errors in DNN-based safety-critical systems by constraining simulator parameters. SEDE first identifies clusters of error-inducing images, then uses an evolutionary algorithm to generate simulator images within each cluster, including failing, passing, and representative images. SEDE extracts rules characterizing the unsafe parameter space and uses the generated images to retrain DNNs, improving accuracy compared to alternative methods. The paper evaluates SEDE on head pose and face landmark detection DNNs in terms of generating diverse cluster images, delimiting unsafe spaces, and enhancing DNN performance.
This document summarizes a research paper on using grey-box fuzzing (MOTIF) for mutation testing of C/C++ code in cyber-physical systems (CPS). It introduces mutation testing and grey-box fuzzing, and proposes MOTIF which generates a fuzzing driver to test functions with live mutants. An empirical evaluation compares MOTIF to symbolic execution-based mutation testing on three subject programs. MOTIF killed more mutants within 10,000 seconds and was able to test programs that symbolic execution could not handle due to limitations like floating-point values. Seed inputs alone killed few mutants, showing the importance of fuzzing. MOTIF is an effective approach for mutation testing of CPS software.
Data-driven Mutation Analysis for Cyber-Physical SystemsLionel Briand
Data-driven mutation analysis is proposed to assess if test suites for cyber-physical systems properly exercise component interoperability. Fault models are developed for different data types and dependencies, and are used to automatically generate mutants by injecting faults. Empirical results on industrial systems demonstrate the feasibility and effectiveness of the approach in identifying test suite shortcomings and poor oracles.
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled SystemsLionel Briand
This document proposes MORLOT (Many-Objective Reinforcement Learning for Online Testing) to address challenges in online testing of DNN-enabled systems. MORLOT leverages many-objective search and reinforcement learning to choose test actions. It was evaluated on the Transfuser autonomous driving system in the CARLA simulator using 6 safety requirements. MORLOT was significantly more effective and efficient at finding safety violations than random search or other many-objective approaches, achieving a higher average test effectiveness for any given test budget.
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...Lionel Briand
1. The document presents ATM, a new approach for black-box test case minimization that transforms test code into abstract syntax trees and uses tree-based similarity measures and genetic algorithms to minimize test suites.
2. ATM was evaluated on the DEFECTS4J dataset and achieved a fault detection rate of 0.82 on average, significantly outperforming existing techniques, while requiring only practical execution times.
3. The best configuration of ATM used a genetic algorithm with a combined similarity measure, achieving a fault detection rate of 0.80 within 1.2 hours on average.
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...Lionel Briand
The document is a journal paper that proposes a method for black-box safety analysis and retraining of deep neural networks (DNNs) based on feature extraction and clustering of failure-inducing images. The method uses a pre-trained VGG16 model to extract features from failure images, clusters the features using DBSCAN, selects clusters that likely caused failures, and retrains the DNN to improve safety based on images in problematic clusters. An empirical evaluation on various DNNs for tasks like gaze detection showed the method effectively determined failure causes through clustering and improved models with fewer images than other approaches.
PRINS: Scalable Model Inference for Component-based System LogsLionel Briand
PRINS is a technique for scalable model inference of component-based system logs. It divides the problem into inferring individual component models and then stitching them together. The paper evaluates PRINS on several systems and compares its execution time and accuracy to MINT, a state-of-the-art model inference tool. Results show that PRINS is significantly faster than MINT, especially on larger logs, with comparable accuracy. However, stitching component models can result in larger overall system models. The paper contributes an empirical evaluation of the PRINS technique and makes its implementation publicly available.
Revisiting the Notion of Diversity in Software TestingLionel Briand
The document discusses the concept of diversity in software testing. It provides examples of how diversity has been applied in various testing applications, including test case prioritization and minimization, mutation analysis, and explaining errors in deep neural networks. The key aspects of diversity discussed are the representation of test cases, measures of distance or similarity between cases, and techniques for maximizing diversity. The document emphasizes that the best approach depends on factors like information access, execution costs, and the specific application context.
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Lionel Briand
This document discusses search-based approaches for testing artificial intelligence systems. It covers testing at different levels, from model-level testing of individual machine learning components to system-level testing of AI-enabled systems. At the model level, search-based techniques are used to generate test inputs that target weaknesses in deep learning models. At the system level, simulations and reinforcement learning are used to test AI components integrated into complex systems. The document outlines many open challenges in AI testing and argues that search-based approaches are well-suited to address challenges due to the complex, non-linear behaviors of AI systems.
Autonomous Systems: How to Address the Dilemma between Autonomy and SafetyLionel Briand
Autonomous systems present safety challenges due to their complexity and use of machine learning. Two key approaches are needed to address these challenges: (1) design-time assurance cases to validate safety requirements and (2) run-time monitoring architectures to detect unsafe behavior. Automated testing techniques leveraging metaheuristics and machine learning can help provide evidence for assurance cases and learn conditions to guide run-time monitoring. However, more industrial experience is still needed to properly validate these approaches at scale for autonomous systems.
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...Lionel Briand
This document discusses the split identities of software engineering researchers between being mathematicians, social scientists, or engineers. It notes there are three main communities - formal methods and guarantees, human and social studies, and engineering automated solutions - that have different backgrounds, languages, and research methods. While diversity is good, the communities need to be better connected to work together to solve problems. The document calls for more demand-driven, collaborative research with industry to have a greater impact and produce practical solutions.
Reinforcement Learning for Test Case PrioritizationLionel Briand
1) The document discusses using reinforcement learning for test case prioritization in continuous integration environments. It compares different ranking models (listwise, pairwise, pointwise) and reinforcement learning algorithms.
2) Pairwise and pointwise ranking models generally perform better than listwise, and pairwise training times are better than pointwise. The best configuration is pairwise ranking with the ACER algorithm.
3) When compared to traditional machine learning ranking models, the best reinforcement learning configuration provides significantly better ranking accuracy than the state-of-the-art MART model.
4) However, relying solely on test execution history may not provide sufficient features for an accurate prioritization policy regardless of the approach. Enriched datasets with more features
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...Lionel Briand
The document summarizes a paper that presents Mutation Analysis for Space Software (MASS), a scalable and automated pipeline for mutation testing of cyber-physical systems software in the space domain. The pipeline includes steps to create mutants, sample and prioritize mutants, discard equivalent mutants, and compute mutation scores. An empirical evaluation on space software case studies found that MASS provides accurate mutation scores with fewer sampled mutants compared to other sampling approaches. It also enables significant time savings over non-optimized mutation analysis through test case prioritization and reduction techniques. MASS helps uncover weaknesses in test suites and ensures thorough software testing for safety-critical space systems.
On Systematically Building a Controlled Natural Language for Functional Requi...Lionel Briand
The document presents a qualitative methodology for systematically building a controlled natural language (CNL) for functional requirements. It describes extracting requirements from software requirements specifications, identifying codes within the requirements, labeling and grouping the requirements, creating a grammar by identifying the content in requirements and deriving grammar rules. An evaluation of the developed CNL called Rimay showed it could express 88% of requirements from unseen documents and reached stability after analyzing three documents.
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Lionel Briand
This document proposes SAMOTA, a surrogate-assisted many-objective optimization approach for online testing of DNN-enabled systems. SAMOTA uses global and local surrogate models to replace expensive function evaluations. It clusters local data points and builds individual surrogate models for each cluster, rather than one model for all data. An evaluation on a DNN-enabled autonomous driving system shows SAMOTA achieves better test effectiveness and efficiency than alternative approaches, and clustering local data points leads to more effective local searches than using a single local model. SAMOTA is an effective method for online testing of complex DNN systems.
Guidelines for Assessing the Accuracy of Log Message Template Identification ...Lionel Briand
The document provides guidelines for assessing the accuracy of log message template identification techniques. It discusses issues with existing accuracy metrics and proposes new metrics like Template Accuracy that are not sensitive to message frequency. It also recommends performing oracle template correction as templates extracted without source code are often incorrect. Additionally, it suggests analyzing incorrectly identified templates to understand weaknesses and provide insights to improve techniques. The guidelines aim to help properly evaluate template identification techniques for different use cases.
A Theoretical Framework for Understanding the Relationship between Log Parsin...Lionel Briand
This document proposes a theoretical framework to understand the relationship between log parsing and anomaly detection. It argues that log parsing should be viewed as an information abstraction process that converts unstructured logs into structured logs. The goal of log parsing should be to extract the minimum amount of information necessary to distinguish normal behavior from anomalies. This "minimality" and "distinguishability" can be used to define ideal log parsing results. The framework aims to provide guidance on how log parsing quality impacts anomaly detection accuracy and determine the root causes of any inaccuracies.
Douwan Crack 2025 new verson+ License codeaneelaramzan63
Copy & Paste On Google >>> https://dr-up-community.info/
Douwan Preactivated Crack Douwan Crack Free Download. Douwan is a comprehensive software solution designed for data management and analysis.
Download Wondershare Filmora Crack [2025] With Latesttahirabibi60507
Copy & Past Link 👉👉
http://drfiles.net/
Wondershare Filmora is a video editing software and app designed for both beginners and experienced users. It's known for its user-friendly interface, drag-and-drop functionality, and a wide range of tools and features for creating and editing videos. Filmora is available on Windows, macOS, iOS (iPhone/iPad), and Android platforms.
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AIdanshalev
If we were building a GenAI stack today, we'd start with one question: Can your retrieval system handle multi-hop logic?
Trick question, b/c most can’t. They treat retrieval as nearest-neighbor search.
Today, we discussed scaling #GraphRAG at AWS DevOps Day, and the takeaway is clear: VectorRAG is naive, lacks domain awareness, and can’t handle full dataset retrieval.
GraphRAG builds a knowledge graph from source documents, allowing for a deeper understanding of the data + higher accuracy.
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)Andre Hora
Software testing plays a crucial role in the contribution process of open-source projects. For example, contributions introducing new features are expected to include tests, and contributions with tests are more likely to be accepted. Although most real-world projects require contributors to write tests, the specific testing practices communicated to contributors remain unclear. In this paper, we present an empirical study to understand better how software testing is approached in contribution guidelines. We analyze the guidelines of 200 Python and JavaScript open-source software projects. We find that 78% of the projects include some form of test documentation for contributors. Test documentation is located in multiple sources, including CONTRIBUTING files (58%), external documentation (24%), and README files (8%). Furthermore, test documentation commonly explains how to run tests (83.5%), but less often provides guidance on how to write tests (37%). It frequently covers unit tests (71%), but rarely addresses integration (20.5%) and end-to-end tests (15.5%). Other key testing aspects are also less frequently discussed: test coverage (25.5%) and mocking (9.5%). We conclude by discussing implications and future research.
Adobe After Effects Crack FREE FRESH version 2025kashifyounis067
🌍📱👉COPY LINK & PASTE ON GOOGLE http://drfiles.net/ 👈🌍
Adobe After Effects is a software application used for creating motion graphics, special effects, and video compositing. It's widely used in TV and film post-production, as well as for creating visuals for online content, presentations, and more. While it can be used to create basic animations and designs, its primary strength lies in adding visual effects and motion to videos and graphics after they have been edited.
Here's a more detailed breakdown:
Motion Graphics:
.
After Effects is powerful for creating animated titles, transitions, and other visual elements to enhance the look of videos and presentations.
Visual Effects:
.
It's used extensively in film and television for creating special effects like green screen compositing, object manipulation, and other visual enhancements.
Video Compositing:
.
After Effects allows users to combine multiple video clips, images, and graphics to create a final, cohesive visual.
Animation:
.
It uses keyframes to create smooth, animated sequences, allowing for precise control over the movement and appearance of objects.
Integration with Adobe Creative Cloud:
.
After Effects is part of the Adobe Creative Cloud, a suite of software that includes other popular applications like Photoshop and Premiere Pro.
Post-Production Tool:
.
After Effects is primarily used in the post-production phase, meaning it's used to enhance the visuals after the initial editing of footage has been completed.
FL Studio Producer Edition Crack 2025 Full Versiontahirabibi60507
Copy & Past Link 👉👉
http://drfiles.net/
FL Studio is a Digital Audio Workstation (DAW) software used for music production. It's developed by the Belgian company Image-Line. FL Studio allows users to create and edit music using a graphical user interface with a pattern-based music sequencer.
Avast Premium Security Crack FREE Latest Version 2025mu394968
🌍📱👉COPY LINK & PASTE ON GOOGLE https://dr-kain-geera.info/👈🌍
Avast Premium Security is a paid subscription service that provides comprehensive online security and privacy protection for multiple devices. It includes features like antivirus, firewall, ransomware protection, and website scanning, all designed to safeguard against a wide range of online threats, according to Avast.
Key features of Avast Premium Security:
Antivirus: Protects against viruses, malware, and other malicious software, according to Avast.
Firewall: Controls network traffic and blocks unauthorized access to your devices, as noted by All About Cookies.
Ransomware protection: Helps prevent ransomware attacks, which can encrypt your files and hold them hostage.
Website scanning: Checks websites for malicious content before you visit them, according to Avast.
Email Guardian: Scans your emails for suspicious attachments and phishing attempts.
Multi-device protection: Covers up to 10 devices, including Windows, Mac, Android, and iOS, as stated by 2GO Software.
Privacy features: Helps protect your personal data and online privacy.
In essence, Avast Premium Security provides a robust suite of tools to keep your devices and online activity safe and secure, according to Avast.
Join Ajay Sarpal and Miray Vu to learn about key Marketo Engage enhancements. Discover improved in-app Salesforce CRM connector statistics for easy monitoring of sync health and throughput. Explore new Salesforce CRM Synch Dashboards providing up-to-date insights into weekly activity usage, thresholds, and limits with drill-down capabilities. Learn about proactive notifications for both Salesforce CRM sync and product usage overages. Get an update on improved Salesforce CRM synch scale and reliability coming in Q2 2025.
Key Takeaways:
Improved Salesforce CRM User Experience: Learn how self-service visibility enhances satisfaction.
Utilize Salesforce CRM Synch Dashboards: Explore real-time weekly activity data.
Monitor Performance Against Limits: See threshold limits for each product level.
Get Usage Over-Limit Alerts: Receive notifications for exceeding thresholds.
Learn About Improved Salesforce CRM Scale: Understand upcoming cloud-based incremental sync.
Landscape of Requirements Engineering for/by AI through Literature ReviewHironori Washizaki
Hironori Washizaki, "Landscape of Requirements Engineering for/by AI through Literature Review," RAISE 2025: Workshop on Requirements engineering for AI-powered SoftwarE, 2025.
Why Orangescrum Is a Game Changer for Construction Companies in 2025Orangescrum
Orangescrum revolutionizes construction project management in 2025 with real-time collaboration, resource planning, task tracking, and workflow automation, boosting efficiency, transparency, and on-time project delivery.
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Andre Hora
Exceptions allow developers to handle error cases expected to occur infrequently. Ideally, good test suites should test both normal and exceptional behaviors to catch more bugs and avoid regressions. While current research analyzes exceptions that propagate to tests, it does not explore other exceptions that do not reach the tests. In this paper, we provide an empirical study to explore how frequently exceptional behaviors are tested in real-world systems. We consider both exceptions that propagate to tests and the ones that do not reach the tests. For this purpose, we run an instrumented version of test suites, monitor their execution, and collect information about the exceptions raised at runtime. We analyze the test suites of 25 Python systems, covering 5,372 executed methods, 17.9M calls, and 1.4M raised exceptions. We find that 21.4% of the executed methods do raise exceptions at runtime. In methods that raise exceptions, on the median, 1 in 10 calls exercise exceptional behaviors. Close to 80% of the methods that raise exceptions do so infrequently, but about 20% raise exceptions more frequently. Finally, we provide implications for researchers and practitioners. We suggest developing novel tools to support exercising exceptional behaviors and refactoring expensive try/except blocks. We also call attention to the fact that exception-raising behaviors are not necessarily “abnormal” or rare.
Adobe Master Collection CC Crack Advance Version 2025kashifyounis067
🌍📱👉COPY LINK & PASTE ON GOOGLE http://drfiles.net/ 👈🌍
Adobe Master Collection CC (Creative Cloud) is a comprehensive subscription-based package that bundles virtually all of Adobe's creative software applications. It provides access to a wide range of tools for graphic design, video editing, web development, photography, and more. Essentially, it's a one-stop-shop for creatives needing a broad set of professional tools.
Key Features and Benefits:
All-in-one access:
The Master Collection includes apps like Photoshop, Illustrator, InDesign, Premiere Pro, After Effects, Audition, and many others.
Subscription-based:
You pay a recurring fee for access to the latest versions of all the software, including new features and updates.
Comprehensive suite:
It offers tools for a wide variety of creative tasks, from photo editing and illustration to video editing and web development.
Cloud integration:
Creative Cloud provides cloud storage, asset sharing, and collaboration features.
Comparison to CS6:
While Adobe Creative Suite 6 (CS6) was a one-time purchase version of the software, Adobe Creative Cloud (CC) is a subscription service. CC offers access to the latest versions, regular updates, and cloud integration, while CS6 is no longer updated.
Examples of included software:
Adobe Photoshop: For image editing and manipulation.
Adobe Illustrator: For vector graphics and illustration.
Adobe InDesign: For page layout and desktop publishing.
Adobe Premiere Pro: For video editing and post-production.
Adobe After Effects: For visual effects and motion graphics.
Adobe Audition: For audio editing and mixing.
This presentation explores code comprehension challenges in scientific programming based on a survey of 57 research scientists. It reveals that 57.9% of scientists have no formal training in writing readable code. Key findings highlight a "documentation paradox" where documentation is both the most common readability practice and the biggest challenge scientists face. The study identifies critical issues with naming conventions and code organization, noting that 100% of scientists agree readable code is essential for reproducible research. The research concludes with four key recommendations: expanding programming education for scientists, conducting targeted research on scientific code quality, developing specialized tools, and establishing clearer documentation guidelines for scientific software.
Presented at: The 33rd International Conference on Program Comprehension (ICPC '25)
Date of Conference: April 2025
Conference Location: Ottawa, Ontario, Canada
Preprint: https://arxiv.org/abs/2501.10037
Not So Common Memory Leaks in Java WebinarTier1 app
This SlideShare presentation is from our May webinar, “Not So Common Memory Leaks & How to Fix Them?”, where we explored lesser-known memory leak patterns in Java applications. Unlike typical leaks, subtle issues such as thread local misuse, inner class references, uncached collections, and misbehaving frameworks often go undetected and gradually degrade performance. This deck provides in-depth insights into identifying these hidden leaks using advanced heap analysis and profiling techniques, along with real-world case studies and practical solutions. Ideal for developers and performance engineers aiming to deepen their understanding of Java memory management and improve application stability.
Societal challenges of AI: biases, multilinguism and sustainabilityJordi Cabot
Towards a fairer, inclusive and sustainable AI that works for everybody.
Reviewing the state of the art on these challenges and what we're doing at LIST to test current LLMs and help you select the one that works best for you
Discover why Wi-Fi 7 is set to transform wireless networking and how Router Architects is leading the way with next-gen router designs built for speed, reliability, and innovation.
PDF Reader Pro Crack Latest Version FREE Download 2025mu394968
🌍📱👉COPY LINK & PASTE ON GOOGLE https://dr-kain-geera.info/👈🌍
PDF Reader Pro is a software application, often referred to as an AI-powered PDF editor and converter, designed for viewing, editing, annotating, and managing PDF files. It supports various PDF functionalities like merging, splitting, converting, and protecting PDFs. Additionally, it can handle tasks such as creating fillable forms, adding digital signatures, and performing optical character recognition (OCR).
3. nanda-lab.ca
Flaky Tests intermittently pass and fail even for the same version of the source code
(i.e., non-deterministic testing results)
Why Detect and Repair Flaky Tests?
❖ Test failures caused by flaky tests can be hard to reproduce as re-running is required (computationally expensive)
❖ Flaky tests might hide real bugs in the source code
❖ Tests become unreliable
❖ Software releases might be delayed
❖ Hard to manually detect and fix so developers ignore these tests
Problem: Flaky Tests
2
4. nanda-lab.ca
FlakyFix: Black Box Automated Repair of Flaky Tests
Language
Model
Flaky Test A Fix for Flaky Test
Proposed Solution
*Black Box: Using test case code only, No access to code under test. This research focuses on those flaky tests where part of the flakiness lies in the test code. (10% of overall flaky tests dataset)
1. Sakina Fatima, Taher A Ghaleb, and Lionel Briand. Flakify: A black-box, language model-based predictor for flaky tests. IEEE Transactions on Software Engineering, 2022. 3
5. nanda-lab.ca
Proposed Approach
→ Definition of flaky tests fixes and labeling of flaky tests accordingly
• A set of heuristics
• First labeled dataset categorizing flaky tests by the type of fix needed
• Open-source script* to automatically label flaky tests based on their fixes
→ Prediction of flaky test fix category using pre-trained code language models
• Suggest to developers a type of fix they can implement to repair flaky tests
• Aid Conventional Large Language Models i.e. GPT to automatically generate the fix
→ Generation of a fully repaired flaky test using the predicted fix category and LLMS
• Attempt to generate a fully or semi-automated repair of Flaky tests
4
6. nanda-lab.ca
Prediction of Flaky Test Fix Category
Model
Change Data Structure
Cause of
Flakiness
Predicts
HashMap Should be replaced with LinkedHashMap to
maintain the order in which their elements are stored,
regardless of how many times a code is executed.
5
8. nanda-lab.ca
Dataset
Used International Dataset of Flaky Tests (IDoFT)
→Largest available dataset of flaky tests where cause of flakiness is in the test code
→562 Flaky Tests in Java and their developer repaired fixes
→Flaky Tests belong to 123 different projects, helpful for the generalizability of prediction
models
7
10. nanda-lab.ca
Fix Category Prediction
Fine-tune Pre-Trained Code Models i.e. CodeBERT and UniXcoder for the task of Flaky Tests
Fix Classification with two different techniques:
✓ Feed Forward Neural Network (FNN)
✓ Few Shot Learning (recommended for smaller datasets)
9
11. nanda-lab.ca
Fix Category Prediction: Technique #1
Fine-tune Pre-Trained Code Models i.e. CodeBERT and UniXcoder for the task of
Flaky Tests Fix Classification using a Feed Forward Neural Network (FNN).
10
12. nanda-lab.ca
Fix Category Prediction: Technique #2
Since a smaller dataset of 562 tests, we used Few Shot Learning (FSL), popular with
smaller datasets.
Step 1: Fine-tune code models (UnixCoder and CodeBert) using Siamese Network for
Flaky Tests fix category classification
11
17. nanda-lab.ca
Example of Repaired Flaky Test from GPT
Flaky Test Generated Fixed Flaky Test without Fix Category
Cause of Flakiness
Incorrect Repair Suggested
17
18. nanda-lab.ca
Example of Repaired Flaky Test from GPT (2)
Fix Generation with Fix Category Label Original Fix for Flaky Test
Repaired by GPT
Repaired by Developer
18
21. nanda-lab.ca
GPT Generated Tests-Execution Results
• We ran a sample of 35 generated tests: 24 Passed, 11 Failed
• We conducted a series of analysis:
→Overall, among passing tests average CodeBLEU Score is 94%, Higher
Code BLEU scores have a higher likelihood to pass.
→16 GPT- fixed tests have 100% CodeBLEU score, indicating an exact match
with the developer-repaired versions.
→Bootstrapping: With 95% Confidence Interval, 51% to 83% GPT-fixed tests are
estimated to pass. (Helpful for Testers)
→Logistic Regression: Trained on the executable 35 tests to estimate the
passing test rate among the non-executable tests. 80 % Accuracy.
→Edit Distance is calculated to assess the manual fixing effort for 11 failed
tests. 16% average token replacement is needed.
21
22. nanda-lab.ca
GPT Generated Tests-Execution Results
Based on the trained Logistic Regression Model, Passing Estimates for non-executable tests for
both 181 and 131 test dataset:
22
23. nanda-lab.ca
Practical Implications
How do we envision our approach to be used in practice?
• Deploy in Continuous Integration (CI) environments to repair a flaky test without developer’s
explicit command.
• Guide developers about possible causes of flakiness that need to be addressed though Test
Smells and Fix Labels Information.
• Reduce the manual effort to fix the tests even when the GPT repair is not fully correct, semi-
automated repair approach.
23