Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Leveraging Large Language Models for Automated Proof Synthesis in Rust (2311.03739v2)

Published 7 Nov 2023 in cs.FL and cs.AI

Abstract: Formal verification can provably guarantee the correctness of critical system software, but the high proof burden has long hindered its wide adoption. Recently, LLMs have shown success in code analysis and synthesis. In this paper, we present a combination of LLMs and static analysis to synthesize invariants, assertions, and other proof structures for a Rust-based formal verification framework called Verus. In a few-shot setting, LLMs demonstrate impressive logical ability in generating postconditions and loop invariants, especially when analyzing short code snippets. However, LLMs lack the ability to retain and propagate context information, a strength of traditional static analysis. Based on these observations, we developed a prototype based on OpenAI's GPT-4 model. Our prototype decomposes the verification task into multiple smaller ones, iteratively queries GPT-4, and combines its output with lightweight static analysis. We evaluated the prototype with a developer in the automation loop on 20 vector-manipulating programs. The results demonstrate that it significantly reduces human effort in writing entry-level proof code.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Verus’s tutorial on loops and invariants. https://verus-lang.github.io/verus/guide/while.html.
  2. Verus’s tutorial on triggers. https://verus-lang.github.io/verus/guide/forall.html.
  3. Diffy: Inductive reasoning of array programs using difference invariants. In Computer Aided Verification: 33rd International Conference, CAV 2021, Virtual Event, July 20–23, 2021, Proceedings, Part II 33, pages 911–935. Springer, 2021.
  4. Łukasz Czajka. Practical proof search for coq by type inhabitation. In Automated Reasoning: 10th International Joint Conference, IJCAR 2020, Paris, France, July 1–4, 2020, Proceedings, Part II 10, pages 28–57. Springer, 2020.
  5. Hammer for coq: Automation for dependent type theory. Journal of automated reasoning, 61:423–453, 2018.
  6. Quantified invariants via syntax-guided synthesis. In Proceedings of the 31st International Conference on Computer Aided Verification (CAV ’19), pages 259–277, July 2019.
  7. Baldur: whole-proof generation and repair with large language models. arXiv preprint arXiv:2303.04910, 2023.
  8. Learning universally quantified invariants of linear data structures. In Proceedings of the 25th International Conference on Computer Aided Verification (CAV ’13), pages 813–829, July 2013.
  9. Learning invariants using decision trees and implication counterexamples. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16), page 499–512, January 2016.
  10. Finding invariants of distributed systems: It’s a small (enough) world after all. In Proceedings of the 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’21), pages 115–131, April 2021.
  11. Storage Systems are Distributed Systems (So Verify Them That Way!). In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’20), pages 99–115, 2020.
  12. IronFleet: Proving practical distributed systems correct. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP ’15), pages 1–17, October 2015.
  13. Inferring invariants with quantifier alternations: Taming the search space explosion. In Proceedings of the 28th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS ’22), pages 338–356, April 2022.
  14. Beyond the elementary representations of program invariants over algebraic data types. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI ’21), page 451–465, June 2021.
  15. Verus: Verifying rust programs using linear ghost types. Proc. ACM Program. Lang., 7(OOPSLA1), 2023.
  16. K Rustan M Leino. Dafny: An automatic program verifier for functional correctness. In International conference on logic for programming artificial intelligence and reasoning, pages 348–370. Springer, 2010.
  17. I4: Incremental inference of inductive invariants for verification of distributed protocols. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP ’19), pages 370–384, October 2019.
  18. Data-driven inference of representation invariants. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’20), pages 1–15, June 2020.
  19. OpenAI. GPT-4. https://openai.com/research/gpt-4, 2023.
  20. Data-driven precondition inference with learned features. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16), page 42–56, June 2016.
  21. Induction duality: Primal-dual search for invariants. Proceedings of the ACM on Programming Languages, 6(POPL), January 2022.
  22. Can large language models reason about program invariants? In Proceedings of the 40th International Conference on Machine Learning (ICML ’23), 2023.
  23. Cln2inv: Learning loop invariants with continuous logic networks. In International Conference on Learning Representations, 2020.
  24. Learning loop invariants for program verification. In Advances in Neural Information Processing Systems, pages 7751–7762, 2018.
  25. Proof-oriented programming in F*. https://www.fstar-lang.org/tutorial/proof-oriented-programming-in-fstar.pdf, 2023.
  26. Learning to prove theorems via interacting with proof assistants. In International Conference on Machine Learning, pages 6984–6994. PMLR, 2019.
  27. DuoAI: Fast, automated inference of inductive invariants for verifying distributed protocols. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’22), pages 485–501, 2022.
Citations (10)

Summary

  • The paper presents a novel framework that combines GPT-4 with static analysis to automate proof synthesis in Rust.
  • It employs task decomposition and iterative feedback to effectively generate crucial invariants and postconditions from complex Rust code.
  • The approach achieves an 80% reduction in proof code lines for vector-manipulating programs, indicating significant efficiency gains in formal verification.

Leveraging LLMs for Automated Proof Synthesis in Rust

The paper presents an innovative methodology for integrating LLMs, specifically OpenAI's GPT-4, with formal verification processes in Rust programming through a framework called Verus. The aim is to synthesize proof structures, thus reducing the significant manual effort typically required in interactive formal verification.

Overview

Interactive formal verification is a powerful yet resource-intensive method for ensuring system software correctness. It often demands more code lines for verification than the original program itself. Recent advances in LLMs have shown potential in code analysis, prompting this exploration into their application in proof synthesis. The authors introduce a prototype combining LLM capabilities with static analysis to automate invariant, assertion, and proof structure generation.

Methodology

The proposed system leverages GPT-4 to decompose complex verification tasks into manageable segments, allowing iterative analysis and synthesis:

  1. Task Decomposition: Large programs are segmented, enabling GPT-4 to focus on smaller sections. This helps circumvent limitations related to context retention, ensuring more accurate generation of postconditions and loop invariants.
  2. Combining LLM with Static Analysis: Static analysis identifies variable interactions within loops, aiding in the addition of simple but crucial invariants. This hybrid approach helps maintain the integrity of invariants through code segments.
  3. Iterative Feedback: When initial proof attempts fail, error messages are relayed back to GPT-4 for refinement, emulating iterative human debugging processes.

Numerical Results and Implications

The authors evaluate their prototype on 20 vector-manipulating programs and achieve a significant reduction in lines of proof code—80% less than manual efforts. This demonstrates the tool's capability in simplifying the verification of entry-level programs, highlighting an intersection between advanced machine learning models and formal methods.

Challenges and Future Directions

While promising, the approach faces challenges in handling longer and more complex programs. Task decomposition helps but does not fully resolve issues with context tracking and error handling for large codebases. Additionally, the specificity required for non-intuitive invariant details (e.g., vector sizing) suggests areas for further LLM training enhancements. Moreover, while effective at postcondition and invariant synthesis, challenges remain in nonlinear arithmetic reasoning and quantifier instantiation.

Conclusion

This work illustrates a significant step towards automating formal verification tasks using LLM technology. By reducing human resource investments in the verification process, it paves the way for broader adoption of formal methods in software development. Future iterations may expand capabilities to more complex data structures and multi-function programs, thus further broadening the potential applications of LLMs in automated verification. The iterative nature of error correction and segmentation strategies outlined here highlight a dynamic approach that can evolve with advancements in AI and verification technologies.