SLD-Spec: Automatic Formal Specification for Complex Loops

Updated 15 September 2025

The paper presents SLD-Spec, a framework that overcomes LLM limitations by generating complete formal specifications for functions with complex loop constructs.
It employs program slicing to reduce input complexity and logical deletion to filter out irrelevant candidates, thereby boosting correctness and efficiency.
Experimental evaluations show SLD-Spec achieves a 95.1% assertion pass rate and a 23.73% runtime reduction, outperforming existing auto-specification methods.

SLD-Spec is a framework for automatic formal specification generation targeted at program functions containing complex loop constructs. Its core contribution is to overcome limitations of existing LLM-based methods, which frequently yield incomplete or irrelevant specifications for such code due to interference among loops and the inability of traditional verification tools to discriminate valid edge cases. SLD-Spec enhances the guess-and-verify paradigm by introducing two key phases—program slicing and logical deletion—significantly improving correctness, relevance, and completeness of generated specifications for both simple and complex programs (Chen et al., 12 Sep 2025).

1. Motivation and Challenges

Automatic specification generation is integral to the software verification pipeline, facilitating rigorous reasoning from code to formal requirements and enabling automated verification workflows. Preceding approaches using LLMs are limited by their inability to handle programs with multiple, complex loop structures. Common deficiencies include generation interference among loops, excessive filtering of correct specifications by verification tools (especially when function preconditions are incomplete or absent), and the inability to capture nuanced behaviors such as non-trivial variants and assigns clauses. SLD-Spec is specifically designed to address these pain points, aiming for a system that generates specifications which are both maximally relevant to the code and retain high pass rates under verification.

2. Architecture: Dual-Phase Specification Generation

SLD-Spec incorporates two novel phases into the standard LLM specification generation pipeline:

A. Program Slicing

Purpose: Reduce input complexity and eliminate specification interference by decomposing each function into independent code slices that contain logically related or structurally independent loop constructs.
Mechanism: Uses static analysis to build abstract syntax trees (ASTs) and function call graphs. Slicing criteria are synthesized for each function, based on tuples $\langle p, V \rangle$ , where $p$ is a program point and $V$ is a subset of relevant variables.
Algorithm:
- Checks for existence of a calling function, creating one if absent.
- Extracts candidate variables, inserts slicing criteria, and partitions the code.
- Applies greedy simplification to prune overlapping slices.
Effect: Focuses LLM attention narrowly on each slice's semantics, resulting in higher specification completeness.

B. Logical Deletion

Purpose: Overcome verification-tool limitations by leveraging LLM-based reasoning to filter out incorrect specification candidates unable to be properly validated by static or dynamic proof engines.
Mechanism: For each generated candidate, an LLM executes a four-step chain-of-thought process:
1. Exclusion: Filters candidates missing presence of key program variables.
2. Understanding: Generates a natural language description of the candidate specification.
3. Reasoning: Assesses candidate consistency with code semantics.
4. Output: Delivers a Boolean verdict on candicate validity.
Technical Basis: Error types (such as incorrect boundary conditions, invariant misalignments, and more) are explicitly annotated in prompts using examples of seven common erroneous forms.
Benefit: Retains valid specifications that formal verification might mistakenly reject and removes those that verification cannot adequately discriminate.

3. Experimental Evaluation and Key Results

SLD-Spec was benchmarked on two datasets:

Frama-C-Problems: A standard set of 51 simple programs used in prior research.
- SLD-Spec verified five more programs than AutoSpec.
- Achieved a 23.73% reduction in overall runtime.
Complex-Loop Benchmark: A manually constructed and categorized dataset with functions containing up to four complex loops, distributed across four categories:
- Parallel Single-path Loop
- Single Multi-path Loop
- Conditional Enhanced Single-path Loop
- Nested Loop

Experimental results on the Complex-Loop benchmark demonstrate:

Framework	Assertion Pass Rate	Program Pass Rate	Runtime Δ
SLD-Spec	95.1%	90.91%	-23.73%
Baselines	(lower)	(lower)	(higher)

These results reveal that SLD-Spec delivers both high assertion- and program-level verification pass rates and substantial efficiency gains.

4. Dataset Construction

Frama-C-Problems: Contains 51 functions suitable for evaluating LLM-based specification systems.
Complex-Loop Benchmark: Designed to test SLD-Spec under challenging conditions not present in standard datasets. Functions vary in number and type of loop, length, and internal complexity, with explicit partitioning into four categories for nuanced benchmarking of specification methods under various loop interaction scenarios.

5. Ablation Analysis

Ablation studies clarify the contributions of each phase:

Scheme	Slicing	Logical Deletion	Correct Spec Rate	Irrelevant Spec Rate
SLD-Spec (full)	Yes	Yes	maximal	minimal
Slicing only	Yes	No	increased	increased
Baseline (none)	No	No	reduced	reduced

Logical deletion is critical for eliminating irrelevant candidates and improving correctness; slicing increases overall completeness by ensuring all relevant specifications are generated. Omitting both leads to specification loss or retention of spurious outputs due to verification miss.

6. Comparison to Existing Methods

Existing platforms such as AutoSpec and LLM baselines (e.g., GPT-3.5-turbo, DeepSeek-V2.5) are inadequate for complex loop programs, frequently failing to verify or even generate relevant specifications. SLD-Spec achieves near full assertion- and program-level verification on the complex benchmark, evidencing its effectiveness in handling interference, completeness, and ambiguity that other systems are unable to address.

7. Implications and Future Directions

The methodological advances of SLD-Spec demonstrate that robust specification generation for complex control-flow constructs is achievable via structured code analysis and LLM-guided reasoning. Potential directions include:

Refinement of the chain-of-thought prompting strategies for increased logical deletion accuracy.
Integration of specialized LLM training data for improved semantic judgment of code-specification relationships.
Extension to other code constructs beyond loops (e.g., deep recursion, concurrency).
Continued development of open-source resources to support reproducibility and future research in LLM-based formal methods.

SLD-Spec thus establishes a scalable, efficient, and highly accurate approach to LLM-assisted formal specification generation, with clear generalization to future tasks in specification mining, software verification, and program synthesis under complex control flows (Chen et al., 12 Sep 2025).

PDF Markdown Chat (Pro)

References (1)

SLD-Spec: Enhancement LLM-assisted Specification Generation for Complex Loop Functions via Program Slicing and Logical Deletion (2025)

Follow Topic

Get notified by email when new papers are published related to SLD-Spec.