Demonstrating ARG-V's Generation of Realistic Java Benchmarks for SV-COMP

Published 4 Feb 2026 in cs.SE | (2602.04786v1)

Abstract: The SV-COMP competition provides a state-of-the-art platform for evaluating software verification tools on a standardized set of verification tasks. Consequently, verifier development outcomes are influenced by the composition of program benchmarks included in SV-COMP. When expanding this benchmark corpus, it is crucial to consider whether newly added programs cause verifiers to exhibit behavior distinct from that observed on existing benchmarks. Doing so helps mitigate external threats to the validity of the competition's results. In this paper, we present the application of the ARG-V tool for automatically generating Java verification benchmarks in the SV-COMP format. We demonstrate that, on a newly generated set of 68 realistic benchmarks, all four leading Java verifiers decrease in accuracy and recall compared to their performance on the existing benchmark suite. These findings highlight the potential of ARG-V to enhance the comprehensiveness and realism of verification tool evaluation, while also providing a roadmap for verifier developers aiming to improve their tools' applicability to real-world software.