SLD-Spec: Automatic Formal Specification for Complex Loops
- The paper presents SLD-Spec, a framework that overcomes LLM limitations by generating complete formal specifications for functions with complex loop constructs.
- It employs program slicing to reduce input complexity and logical deletion to filter out irrelevant candidates, thereby boosting correctness and efficiency.
- Experimental evaluations show SLD-Spec achieves a 95.1% assertion pass rate and a 23.73% runtime reduction, outperforming existing auto-specification methods.
SLD-Spec is a framework for automatic formal specification generation targeted at program functions containing complex loop constructs. Its core contribution is to overcome limitations of existing LLM-based methods, which frequently yield incomplete or irrelevant specifications for such code due to interference among loops and the inability of traditional verification tools to discriminate valid edge cases. SLD-Spec enhances the guess-and-verify paradigm by introducing two key phases—program slicing and logical deletion—significantly improving correctness, relevance, and completeness of generated specifications for both simple and complex programs (Chen et al., 12 Sep 2025).
1. Motivation and Challenges
Automatic specification generation is integral to the software verification pipeline, facilitating rigorous reasoning from code to formal requirements and enabling automated verification workflows. Preceding approaches using LLMs are limited by their inability to handle programs with multiple, complex loop structures. Common deficiencies include generation interference among loops, excessive filtering of correct specifications by verification tools (especially when function preconditions are incomplete or absent), and the inability to capture nuanced behaviors such as non-trivial variants and assigns clauses. SLD-Spec is specifically designed to address these pain points, aiming for a system that generates specifications which are both maximally relevant to the code and retain high pass rates under verification.
2. Architecture: Dual-Phase Specification Generation
SLD-Spec incorporates two novel phases into the standard LLM specification generation pipeline:
A. Program Slicing
- Purpose: Reduce input complexity and eliminate specification interference by decomposing each function into independent code slices that contain logically related or structurally independent loop constructs.
- Mechanism: Uses static analysis to build abstract syntax trees (ASTs) and function call graphs. Slicing criteria are synthesized for each function, based on tuples , where is a program point and is a subset of relevant variables.
- Algorithm:
- Checks for existence of a calling function, creating one if absent.
- Extracts candidate variables, inserts slicing criteria, and partitions the code.
- Applies greedy simplification to prune overlapping slices.
- Effect: Focuses LLM attention narrowly on each slice's semantics, resulting in higher specification completeness.
B. Logical Deletion
- Purpose: Overcome verification-tool limitations by leveraging LLM-based reasoning to filter out incorrect specification candidates unable to be properly validated by static or dynamic proof engines.
- Mechanism: For each generated candidate, an LLM executes a four-step chain-of-thought process:
- Exclusion: Filters candidates missing presence of key program variables.
- Understanding: Generates a natural language description of the candidate specification.
- Reasoning: Assesses candidate consistency with code semantics.
- Output: Delivers a Boolean verdict on candicate validity.
Technical Basis: Error types (such as incorrect boundary conditions, invariant misalignments, and more) are explicitly annotated in prompts using examples of seven common erroneous forms.
- Benefit: Retains valid specifications that formal verification might mistakenly reject and removes those that verification cannot adequately discriminate.
3. Experimental Evaluation and Key Results
SLD-Spec was benchmarked on two datasets:
- Frama-C-Problems: A standard set of 51 simple programs used in prior research.
- SLD-Spec verified five more programs than AutoSpec.
- Achieved a 23.73% reduction in overall runtime.
- Complex-Loop Benchmark: A manually constructed and categorized dataset with functions containing up to four complex loops, distributed across four categories:
- Parallel Single-path Loop
- Single Multi-path Loop
- Conditional Enhanced Single-path Loop
- Nested Loop
Experimental results on the Complex-Loop benchmark demonstrate:
| Framework | Assertion Pass Rate | Program Pass Rate | Runtime Δ |
|---|---|---|---|
| SLD-Spec | 95.1% | 90.91% | -23.73% |
| Baselines | (lower) | (lower) | (higher) |
These results reveal that SLD-Spec delivers both high assertion- and program-level verification pass rates and substantial efficiency gains.
4. Dataset Construction
- Frama-C-Problems: Contains 51 functions suitable for evaluating LLM-based specification systems.
- Complex-Loop Benchmark: Designed to test SLD-Spec under challenging conditions not present in standard datasets. Functions vary in number and type of loop, length, and internal complexity, with explicit partitioning into four categories for nuanced benchmarking of specification methods under various loop interaction scenarios.
5. Ablation Analysis
Ablation studies clarify the contributions of each phase:
| Scheme | Slicing | Logical Deletion | Correct Spec Rate | Irrelevant Spec Rate |
|---|---|---|---|---|
| SLD-Spec (full) | Yes | Yes | maximal | minimal |
| Slicing only | Yes | No | increased | increased |
| Baseline (none) | No | No | reduced | reduced |
Logical deletion is critical for eliminating irrelevant candidates and improving correctness; slicing increases overall completeness by ensuring all relevant specifications are generated. Omitting both leads to specification loss or retention of spurious outputs due to verification miss.
6. Comparison to Existing Methods
Existing platforms such as AutoSpec and LLM baselines (e.g., GPT-3.5-turbo, DeepSeek-V2.5) are inadequate for complex loop programs, frequently failing to verify or even generate relevant specifications. SLD-Spec achieves near full assertion- and program-level verification on the complex benchmark, evidencing its effectiveness in handling interference, completeness, and ambiguity that other systems are unable to address.
7. Implications and Future Directions
The methodological advances of SLD-Spec demonstrate that robust specification generation for complex control-flow constructs is achievable via structured code analysis and LLM-guided reasoning. Potential directions include:
- Refinement of the chain-of-thought prompting strategies for increased logical deletion accuracy.
- Integration of specialized LLM training data for improved semantic judgment of code-specification relationships.
- Extension to other code constructs beyond loops (e.g., deep recursion, concurrency).
- Continued development of open-source resources to support reproducibility and future research in LLM-based formal methods.
SLD-Spec thus establishes a scalable, efficient, and highly accurate approach to LLM-assisted formal specification generation, with clear generalization to future tasks in specification mining, software verification, and program synthesis under complex control flows (Chen et al., 12 Sep 2025).