DafnyBench: Verification Benchmark Suite

Updated 18 March 2026

DafnyBench is a benchmark suite that offers a standardized set of programs and metrics for evaluating Dafny’s verification performance and accuracy.
It provides researchers with a robust platform to test various verification methods, facilitating comparisons through detailed performance and scalability metrics.
By automating test generation and execution reporting, DafnyBench helps pinpoint strengths and limitations in verification strategies for continuous tool improvements.

CPAchecker is an open-source, extensible framework for software verification and testing, implementing the Configurable Program Analysis (CPA) formalism. It provides a flexible infrastructure to design, combine, and empirically evaluate advanced program analyses including predicate abstraction, explicit-value analysis, bounded model checking, interpolation-based algorithms, symbolic execution, multi-threaded verification, test generation, execution reporting, and parallelization strategies (0902.0019, Baier et al., 2024). Its architecture is based on the key idea that diverse verification techniques can be represented as CPAs, facilitating modular construction, hybrid approaches, and dynamic refinement loops. CPAchecker targets C and C++ (via a control-flow automata front-end), has comprehensive support for property specification (e.g., reachability, assertion safety, deadlock freedom), and serves as a reference platform for benchmarking and competitive evaluation (e.g., SV-COMP).

1. Architectural Foundations and Workflow

At its core, CPAchecker is organized into three principal layers:

Front-end (CFA Construction): Program source is parsed (using Eclipse CDT), lowered to CIL-style syntax trees, and converted to one or more control-flow automata (CFA), where each node represents a program location and each edge is a program operation (assignment, assume, call, return) (0902.0019, Baier et al., 2024).
CPA Interfaces and Abstract Domains: Analyses are implemented as CPAs—tuples $\Sigma = (D, \Pi, \preceq, \mathit{post}, \mathit{merge}, \mathit{stop})$ , where $D$ is an abstract domain (defining a set of abstract states $E$ ), $\Pi$ is a set of precisions tuning abstraction granularity, and $\mathit{post}$ , $\mathit{merge}$ , $\mathit{stop}$ define semantics for successor computation, state merging, and coverage checks (0902.0019). Abstract domains include explicit-value, predicate abstraction (BDD/SMT-backed), intervals, octagons, memory graphs, and threading states.
CPA Reachability Engine: The central CPAAlgorithm performs worklist-based abstract reachability, parametrized over a CompositeCPA that combines multiple component analyses. Precision is dynamically refined via configurable strategies, enabling CEGAR (counterexample-guided abstraction refinement) and heuristic precision adjustment (Baier et al., 2024).

The standard reachability algorithm operates over a worklist and reached set, repeatedly computing successors, merging, and checking coverage via the stop operator. Analyses are declared and configured via properties files, and all interactions, including composite CPAs and refinement schedules, are plug-in-style without modification of the algorithm core (0902.0019, Baier et al., 2024).

2. Supported Abstract Domains and Algorithms

CPAchecker supports a breadth of abstract domains and verification engines, broadly classified as follows (Baier et al., 2024):

Explicit-Value Analysis: Tracks variable assignments concretely; merges states conservatively by introducing unknowns ( $\top$ ) on disagreement.
Predicate Abstraction: Represents abstract program states as Boolean formulas over predicates, leveraging BDD or SMT solvers; supports merge-sep and merge-join modes; precisions are refined using interpolants from infeasible error paths (0902.0019).
Interval and Octagon Domains: Encode numeric relations as intervals or octagonal constraints, with adjustable precision and widening (Baier et al., 2024).
Symbolic Memory Graphs: Heap analysis via symbolic graphs and summaries.
Threading Abstraction: Bounded-thread verification via ThreadingCPA, which models thread interleaving and synchronization explicitly; combinable with other domains such as value, interval, and BDD (Beyer et al., 2016).
Composite Domains: Arbitrary combinations (via ProductCPA) allow, e.g., parallel explicit-value and predicate tracking for synergistic refinement.

Algorithms and verification modes encompass:

Abstract Interpretation/Data-Flow Analysis: Lightweight, with interval or relation domains, refined for precision as needed.
Bounded Model Checking (BMC): Unrolling-based bug finding, with incremental depth and SMT-encoded reachability checks.
Predicate Abstraction (CEGAR, Impact): On-the-fly refinement of Boolean abstraction using interpolation.
Interpolation-Based Model Checking: Implements McMillan’s 2003 IMC algorithm, adapted for program verification using large-block encoding and backward interpolation (Beyer et al., 2022).
k-Induction: Supports both base-case and step-case induction with auxiliary invariants provided by a parallel data-flow analysis; handles multi-loop programs effectively via continuously-refined invariants (Beyer et al., 2015).
PDR/IC3, Symbolic Execution: Additional model checking and test generation schemes.

An overview table of domains and their essential properties:

Abstract Domain	Representation	Key Operations (Transfer, Merge, Stop)
Explicit-Value	Map: Var → ℤ ∪ {⊤}	Evaluate/transmit/merge pointwise
Predicate Abstr.	Boolean formulas (BDD)	Strongest post/merge-sep/join/implication
Interval	Var → [l, u] intervals	Interval arithmetic, widening/merge
Octagon	Numeric oct. relations	Octagon join, transfer, subsumption
Threading	Threads ↦ locations	Interleave, fork/join, partitioned sets

Dynamic precision is central throughout CPAchecker:

CEGAR Loop: Analyses iteratively adjust their abstraction precision in response to counterexamples. For instance, when an explicit-value or predicate analysis encounters an infeasible error trace, it refines its precision (tracked variables, predicates) based on interpolants extracted from sliced path prefixes (Beyer et al., 2015), reducing spurious refinements and accelerating convergence.
Heuristic Precision Adjustment: Interval and explicit domains tune variable sets, depth, or widening on demand. Variable selection heuristics may traverse from error locations backward.
Hybrid Combinations: CompositeCPA allows for parallel explicit and predicate domains; e.g., explicit-value analysis “pre-solves” easily-trackable data flow, while predicate abstraction targets complex logic.

Notably, domain-type-guided refinement can extract multiple alternative interpolant sequences from infeasible paths, choosing among them by scoring variable domain types (Bool, Enum, Int, LoopCnt), guiding precision toward more efficient or invariant-friendly abstractions (Beyer et al., 2015). This reduces the number of required CEGAR refinements and directly impacts performance and scalability on large verification tasks.

4. Advanced Extensions: Parallelism, Invariant Injection, Execution Reporting

CPAchecker has been extended along several advanced axes, including:

Parallel Program Analysis on Path Ranges: Supporting compositionally-parallel exploration, CPAchecker can split a program’s path space into disjoint ranges (via splitter automata and input generation) and assign each range to a separate analysis or worker thread. Two specialized range-reduction CPAs enforce lower/upper path bounds, ensuring each analysis only covers its designated region (Haltermanna et al., 2024). Parallel orchestration includes dynamic work stealing to rebalance load, and witness-joining techniques to combine correctness results from multiple analyses into a unified proof artifact. Empirical results indicate improvements in both performance and coverage, especially with work stealing enabled.
Invariant Injection into Interpolation-Based Model Checking: Auxiliary invariants, produced via lightweight, continuously refined interval/data-flow analyses, can be injected into the fixing-point check and interpolant strengthening steps of McMillan’s IMC algorithm. This reduces the number of required unrollings, number of interpolant queries, and overall run time, yielding higher proof rates compared to unaugmented IMC and variant techniques (Beyer et al., 2024). CPAchecker’s implementation allows toggling invariant injection modes independently and supports continuous refinement, with practical gains in benchmark evaluations.
Execution Reports: In the presence of inconclusive verification runs, CPAchecker can emit execution reports summarizing which traces (sequences of statements) have been fully analyzed, which constitute “safe cones” (all continuations proven safe), and which represent the “frontier” (extensions not explored) (Castaño et al., 2016). Reports are computed via an Assumption Automaton that summarizes exploration, and trace sets (S, F) are output for safe cones and frontiers, respectively. This enables users to identify verified regions and portions yet to be analyzed, correcting misleading coverage metrics and guiding further verification or debugging.

5. Multi-Threaded Program Verification

Verification of multi-threaded C programs is handled via an orthogonal extension, ThreadingCPA (Beyer et al., 2016). This CPA:

Models thread interleavings explicitly with a flat-lattice domain mapping bounded thread-IDs to CFA locations.
Implements transfer relations for thread-local steps, creation (forking), and joining, with merge/stop only unifying identical thread states.
Supports partial-order reduction, partitioned reachability sets, and heuristic waitlist ordering—pruning redundant interleavings and improving scalability relative to full state-space exploration.
Integrates seamlessly with other domains; e.g., ThreadingCPA × ValueCPA, ThreadingCPA × IntervalCPA, or ThreadingCPA × BDDCPA.

Empirical data reveals that successive optimizations (partitioning, waitlist order, POR) substantially improve the solvability and efficiency of concurrent benchmarks, achieving performance comparable to dedicated concurrent verifiers.

6. Usage, Configuration, and Tool Ecosystem

CPAchecker supports a rich usage and configuration model (Baier et al., 2024):

Command-Line and IDE (Eclipse) Frontends: Analyses can be executed via shell scripts or directly within Eclipse CDT, with all configuration managed via human-readable properties files and command-line overrides.
Selection and Tuning of Analyses: Predefined configurations enable rapid switching between verification strategies (e.g., explicit-value, predicate, symbolic execution, BMC, k-induction, data flow), with fine-grained adjustment of CPA parameters, solver backends, precision, and resource limits.
Advanced Use Cases: Test-case generation (via CoVeriTest), witness-based result validation (support for GraphML and YAML proof/violation witnesses), and result reporting (statistical summaries, ARG/CFA visualization) are supported natively.
Open-Source, Extensibility, and Benchmarking: All analyses implement small Java interfaces (ConfigurableProgramAnalysis, TransferRelation, etc.), allowing straightforward addition of new domains and algorithms. CPAchecker provides extensive documentation, prebuilt binaries, sample benchmarks, and has been a reference implementation in verification competitions.

7. Empirical Evaluations, Limitations, and Future Directions

Across multiple studies and benchmark suites (e.g., SV-COMP), CPAchecker demonstrates competitive or state-of-the-art results in verification task coverage, bug-finding, and performance (0902.0019, Beyer et al., 2016, Beyer et al., 2015, Haltermanna et al., 2024, Beyer et al., 2015, Beyer et al., 2022, Beyer et al., 2024). Key observations include:

Synergy between analyses (e.g., explicit-value and predicate abstraction) yields both lower predicate counts and faster convergence.
Advanced refinement and invariant schemes (domain-type-guided slicing, continuous invariant refinement, invariant injection into IMC) provide significant reductions in iteration counts and resource utilization.
Parallel and portfolio analyses expand the proof and bug-finding power, especially with coordinated work stealing and witness merging.
Execution reporting exposes the true coverage of incomplete runs, correcting misleading test metrics.

Limitations noted include pointer-alias imprecision, potential scalability issues for very large code bases, and complexity in handling pointer-rich or shape-manipulating programs. Ongoing and future work aims at more advanced heap analyses, improved CPA combinations, richer visualization/UI/IDE integration, smarter parallel scheduling, and adaptive refinement and splitting strategies (Haltermanna et al., 2024, Baier et al., 2024).

CPAchecker remains a reference platform for research in software verification, enabling systematic integration, benchmarking, and comparison of cutting-edge techniques across the spectrum of abstract domains and algorithmic paradigms.