Leveraging Large Language Models for Automated Proof Synthesis in Rust (2311.03739v2)

Published 7 Nov 2023 in cs.FL and cs.AI

Abstract: Formal verification can provably guarantee the correctness of critical system software, but the high proof burden has long hindered its wide adoption. Recently, LLMs have shown success in code analysis and synthesis. In this paper, we present a combination of LLMs and static analysis to synthesize invariants, assertions, and other proof structures for a Rust-based formal verification framework called Verus. In a few-shot setting, LLMs demonstrate impressive logical ability in generating postconditions and loop invariants, especially when analyzing short code snippets. However, LLMs lack the ability to retain and propagate context information, a strength of traditional static analysis. Based on these observations, we developed a prototype based on OpenAI's GPT-4 model. Our prototype decomposes the verification task into multiple smaller ones, iteratively queries GPT-4, and combines its output with lightweight static analysis. We evaluated the prototype with a developer in the automation loop on 20 vector-manipulating programs. The results demonstrate that it significantly reduces human effort in writing entry-level proof code.

References (27)

Citations (10)

View on Semantic Scholar

Summary

The paper presents a novel framework that combines GPT-4 with static analysis to automate proof synthesis in Rust.
It employs task decomposition and iterative feedback to effectively generate crucial invariants and postconditions from complex Rust code.
The approach achieves an 80% reduction in proof code lines for vector-manipulating programs, indicating significant efficiency gains in formal verification.

Leveraging LLMs for Automated Proof Synthesis in Rust

The paper presents an innovative methodology for integrating LLMs, specifically OpenAI's GPT-4, with formal verification processes in Rust programming through a framework called Verus. The aim is to synthesize proof structures, thus reducing the significant manual effort typically required in interactive formal verification.

Overview

Interactive formal verification is a powerful yet resource-intensive method for ensuring system software correctness. It often demands more code lines for verification than the original program itself. Recent advances in LLMs have shown potential in code analysis, prompting this exploration into their application in proof synthesis. The authors introduce a prototype combining LLM capabilities with static analysis to automate invariant, assertion, and proof structure generation.

Methodology

The proposed system leverages GPT-4 to decompose complex verification tasks into manageable segments, allowing iterative analysis and synthesis:

Task Decomposition: Large programs are segmented, enabling GPT-4 to focus on smaller sections. This helps circumvent limitations related to context retention, ensuring more accurate generation of postconditions and loop invariants.
Combining LLM with Static Analysis: Static analysis identifies variable interactions within loops, aiding in the addition of simple but crucial invariants. This hybrid approach helps maintain the integrity of invariants through code segments.
Iterative Feedback: When initial proof attempts fail, error messages are relayed back to GPT-4 for refinement, emulating iterative human debugging processes.

Numerical Results and Implications

The authors evaluate their prototype on 20 vector-manipulating programs and achieve a significant reduction in lines of proof code—80% less than manual efforts. This demonstrates the tool's capability in simplifying the verification of entry-level programs, highlighting an intersection between advanced machine learning models and formal methods.

Challenges and Future Directions

While promising, the approach faces challenges in handling longer and more complex programs. Task decomposition helps but does not fully resolve issues with context tracking and error handling for large codebases. Additionally, the specificity required for non-intuitive invariant details (e.g., vector sizing) suggests areas for further LLM training enhancements. Moreover, while effective at postcondition and invariant synthesis, challenges remain in nonlinear arithmetic reasoning and quantifier instantiation.

Conclusion

This work illustrates a significant step towards automating formal verification tasks using LLM technology. By reducing human resource investments in the verification process, it paves the way for broader adoption of formal methods in software development. Future iterations may expand capabilities to more complex data structures and multi-function programs, thus further broadening the potential applications of LLMs in automated verification. The iterative nature of error correction and segmentation strategies outlined here highlight a dynamic approach that can evolve with advancements in AI and verification technologies.

PDF Markdown