arXiv Query: search_query=all:"software testing"&id_list=&start=0&max_results=10

フィード

記事のアイキャッチ画像
Quantum Circuit Repair by Gate Prioritisation
arXiv Query: search_query=all:"software testing"&id_list=&start=0&max_results=10
Repairing faulty quantum circuits is challenging and requires automated solutions. We present QRep, an automated repair approach that iteratively identifies and repairs faults in a circuit. QRep uniformly applies patches across the circuit and assigns each gate a suspiciousness score, reflecting its likelihood of being faulty. It then narrows the search space by prioritising the most suspicious gates in subsequent iterations, increasing the repair efficiency. We evaluated QRep on 40 (real and synthetic) faulty circuits. QRep completely repaired 70% of them, and for the remaining circuits, the actual faulty gate was ranked within the top 44% most suspicious gates, demonstrating the effectiveness of QRep in fault localisation. Compared with two baseline approaches, QRep scales to larger and more complex circuits, up to 13 qubits.
2日前
記事のアイキャッチ画像
Natural Adversaries: Fuzzing Autonomous Vehicles with Realistic Roadside Object Placements
arXiv Query: search_query=all:"software testing"&id_list=&start=0&max_results=10
The emergence of Autonomous Vehicles (AVs) has spurred research into testing the resilience of their perception systems, i.e., ensuring that they are not susceptible to critical misjudgements. It is important that these systems are tested not only with respect to other vehicles on the road, but also with respect to objects placed on the roadside. Trash bins, billboards, and greenery are examples of such objects, typically positioned according to guidelines developed for the human visual system, which may not align perfectly with the needs of AVs. Existing tests, however, usually focus on adversarial objects with conspicuous shapes or patches, which are ultimately unrealistic due to their unnatural appearance and reliance on white-box knowledge. In this work, we introduce a black-box attack on AV perception systems that creates realistic adversarial scenarios (i.e., satisfying road design guidelines) by manipulating the positions of common roadside objects and without resorting to "unna
2年前
記事のアイキャッチ画像
Can Language Models Pass Software Testing Certification Exams? a case study
arXiv Query: search_query=all:"software testing"&id_list=&start=0&max_results=10
Large Language Models (LLMs) play a pivotal role in both academic research and broader societal applications. LLMs are increasingly used in software testing activities such as test case generation, selection, and repair. However, several important questions remain: (1) do LLMs possess enough information about software testing principles to perform software testing tasks effectively? (2) do LLMs possess sufficient conceptual understanding of software testing to answer software testing questions under metamorphic transformations? and (3) do certain properties of software testing questions influence the performance of LLMs? To answer these questions, this study evaluates 60 multimodal language models from both commercial vendors and the open-source community. The evaluation is performed using 30 sample exams of different types (core foundation, core advanced, specialist, and expert) from the International Software Testing Qualifications Board (ISTQB), which are used to assess the competen
4日前
記事のアイキャッチ画像
On the Emergence of Testing Strategies: A Socio-technical Grounded Theory
arXiv Query: search_query=all:"software testing"&id_list=&start=0&max_results=10
Software testing is crucial for ensuring software quality, yet developers' engagement with it varies widely. Identifying the technical, organizational and social factors that lead to differences in engagement is required to remove barriers and utilize enablers for testing. While much research emphasizes the usefulness of software testing approaches and technical solutions, less is known about why developers do (not) test. This study investigates the first-hand experience of developers with software testing. The study illuminates how developers' opinions about testing and their testing behavior changes. Through analysis of personal evolutions of practice, we explore when and why testing is used. Employing socio-technical grounded theory (STGT), we construct a theory by systematically analyzing data from 19 in-depth, semi-structured interviews with software developers. Allowing interviewees to reflect on how and why they approach software testing, we explore perspectives that are rooted
1年前
記事のアイキャッチ画像
From Natural Language to Executable Properties for Property-based Testing of Mobile Apps
arXiv Query: search_query=all:"software testing"&id_list=&start=0&max_results=10
Property-based testing (PBT) is a popular software testing methodology and is effective in validating the functionality of mobile applications (apps for short). However, its adoption in practice remains limited, largely due to the manual effort and technical expertise required to specify executable properties. In this experience paper, we propose a novel structured property synthesis approach that automatically translates property descriptions in natural language into executable properties, and implement it in a tool named iPBT. Our approach decomposes the problem into UI semantic grounding and executable property synthesis. It first builds an enriched widget context via multimodal LLMs to align visual elements with their functional semantics, and then uses an LLM with in-context learning to generate framework-specific executable properties. We evaluate iPBT with a closed-source LLM (GPT-4o) and an open-source LLM (DeepSeek-V3) on 124 diverse property descriptions derived from an exist
6日前
記事のアイキャッチ画像
Software Entropy: A Statistical Mechanics Framework for Software Testing
arXiv Query: search_query=all:"software testing"&id_list=&start=0&max_results=10
The notion of software entropy is often invoked to describe the tendency of software systems to become increasingly disordered as they evolve, yet existing approaches to quantify it are largely heuristic. In this work we introduce a formal definition of software entropy grounded in statistical mechanics, interpreting test suites as executable specifications, that is, as macroscopic constraints on the space of possible program implementations. Within this framework, mutation analysis provides a practical approximation of the locally accessible microstate space, allowing entropy-related quantities to be estimated empirically. We propose metrics that quantify how test suites restrict program space, including an information-weighted measure of the distribution of constraint power across tests. Applying these ideas to a real-world project, we show how test suites reduce software entropy and how information weights reveal structural differences in the contribution of individual tests that tr
7日前
記事のアイキャッチ画像
In Perfect Harmony: Orchestrating Causality in Actor-Based Systems
arXiv Query: search_query=all:"software testing"&id_list=&start=0&max_results=10
Runtime verification has gained popularity as a lightweight approach for increasing assurance in systems under scrutiny. Performing runtime checks enables dynamic monitoring and alerts for unexpected behavior, thereby improving reliability and correctness. Actor-based systems present significant challenges for runtime verification. Properties frequently span multiple actors with complex causal dependencies, while nondeterministic message interleavings can obscure execution semantics. Moreover, most existing monitoring tools are designed for single-process behavior. This paper presents ACTORCHESTRA, a runtime verification framework for Erlang that automatically tracks causality across multi-actor interactions. The framework instruments Erlang systems that comply with OTP guidelines via targeted code injection. This method establishes the orchestration infrastructure required to track causal relationships between actors without requiring manual modifications to the target system. To ease
10日前
記事のアイキャッチ画像
A Black-box Testing Framework for Oracle Quantum Programs
arXiv Query: search_query=all:"software testing"&id_list=&start=0&max_results=10
Oracle quantum programs are a fundamental class of quantum programs that serve as a critical bridge between quantum computing and classical computing. Many important quantum algorithms are built upon oracle quantum programs, making it essential to ensure their correctness during development. Although software testing is a well-established approach for improving program reliability, no systematic method has been developed to test oracle quantum programs. This paper proposes a black-box testing framework designed for general oracle quantum programs. We formally define these programs, establish the foundational theory for their testing, and propose a detailed testing framework. We develop a prototype tool and conduct extensive experimental evaluations to evaluate the effectiveness of the framework. Our results demonstrate that the proposed framework significantly aids developers in testing oracle quantum programs, providing insights to enhance the reliability of quantum software.
1年前
記事のアイキャッチ画像
ISTQB Certifications Under the Lens: Their Contributions to the Software-Testing Profession; and AI-assisted Synthesis of Practitioners' Endorsements and Criticisms
arXiv Query: search_query=all:"software testing"&id_list=&start=0&max_results=10
Objective: This study investigates the perceived value and critique of ISTQB certifications, the most widely recognized testing qualifications worldwide. While the certifications aim to standardize the software testing body of knowledge, debates persist about their practical relevance and impact. Our objective was to systematically capture practitioner perspectives and assess the precision of endorsements and fairness of criticisms through expert review. Method: We conducted an AI-assisted Multivocal Literature Review (MLR), combining academic and grey literature to synthesize practitioner endorsements (RQ1) and criticisms (RQ2). ChatGPT's deep research capability was employed under continuous human oversight, with QA strategies ensuring transparency and reliability. As another analysis, we asked a panel of four independent experts to evaluate the precision of endorsements and fairness of criticisms. Results: Practitioner endorsements emphasized career benefits, improved communication,
12日前
記事のアイキャッチ画像
Unveiling Practical Shortcomings of Patch Overfitting Detection Techniques
arXiv Query: search_query=all:"software testing"&id_list=&start=0&max_results=10
Automated Program Repair (APR) can reduce the time developers spend debugging, allowing them to focus on other aspects of software development. Automatically generated bug patches are typically validated through software testing. However, this method can lead to patch overfitting, i.e., generating patches that pass the given tests but are still incorrect. Patch correctness assessment (also known as overfitting detection) techniques have been proposed to identify patches that overfit. However, prior work often assessed the effectiveness of these techniques in isolation and on datasets that do not reflect the distribution of correct-to-overfitting patches that would be generated by APR tools in typical use; thus, we still do not know their effectiveness in practice. This work presents the first comprehensive benchmarking study of several patch overfitting detection (POD) methods in a practical scenario. To this end, we curate datasets that reflect realistic assumptions (i.e., patches pro
16日前