Leveraging Large Language Models for Automated Proof Synthesis in Rust (2311.03739v2)
Abstract: Formal verification can provably guarantee the correctness of critical system software, but the high proof burden has long hindered its wide adoption. Recently, LLMs have shown success in code analysis and synthesis. In this paper, we present a combination of LLMs and static analysis to synthesize invariants, assertions, and other proof structures for a Rust-based formal verification framework called Verus. In a few-shot setting, LLMs demonstrate impressive logical ability in generating postconditions and loop invariants, especially when analyzing short code snippets. However, LLMs lack the ability to retain and propagate context information, a strength of traditional static analysis. Based on these observations, we developed a prototype based on OpenAI's GPT-4 model. Our prototype decomposes the verification task into multiple smaller ones, iteratively queries GPT-4, and combines its output with lightweight static analysis. We evaluated the prototype with a developer in the automation loop on 20 vector-manipulating programs. The results demonstrate that it significantly reduces human effort in writing entry-level proof code.
- Verus’s tutorial on loops and invariants. https://verus-lang.github.io/verus/guide/while.html.
- Verus’s tutorial on triggers. https://verus-lang.github.io/verus/guide/forall.html.
- Diffy: Inductive reasoning of array programs using difference invariants. In Computer Aided Verification: 33rd International Conference, CAV 2021, Virtual Event, July 20–23, 2021, Proceedings, Part II 33, pages 911–935. Springer, 2021.
- Łukasz Czajka. Practical proof search for coq by type inhabitation. In Automated Reasoning: 10th International Joint Conference, IJCAR 2020, Paris, France, July 1–4, 2020, Proceedings, Part II 10, pages 28–57. Springer, 2020.
- Hammer for coq: Automation for dependent type theory. Journal of automated reasoning, 61:423–453, 2018.
- Quantified invariants via syntax-guided synthesis. In Proceedings of the 31st International Conference on Computer Aided Verification (CAV ’19), pages 259–277, July 2019.
- Baldur: whole-proof generation and repair with large language models. arXiv preprint arXiv:2303.04910, 2023.
- Learning universally quantified invariants of linear data structures. In Proceedings of the 25th International Conference on Computer Aided Verification (CAV ’13), pages 813–829, July 2013.
- Learning invariants using decision trees and implication counterexamples. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16), page 499–512, January 2016.
- Finding invariants of distributed systems: It’s a small (enough) world after all. In Proceedings of the 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’21), pages 115–131, April 2021.
- Storage Systems are Distributed Systems (So Verify Them That Way!). In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’20), pages 99–115, 2020.
- IronFleet: Proving practical distributed systems correct. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP ’15), pages 1–17, October 2015.
- Inferring invariants with quantifier alternations: Taming the search space explosion. In Proceedings of the 28th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS ’22), pages 338–356, April 2022.
- Beyond the elementary representations of program invariants over algebraic data types. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI ’21), page 451–465, June 2021.
- Verus: Verifying rust programs using linear ghost types. Proc. ACM Program. Lang., 7(OOPSLA1), 2023.
- K Rustan M Leino. Dafny: An automatic program verifier for functional correctness. In International conference on logic for programming artificial intelligence and reasoning, pages 348–370. Springer, 2010.
- I4: Incremental inference of inductive invariants for verification of distributed protocols. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP ’19), pages 370–384, October 2019.
- Data-driven inference of representation invariants. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’20), pages 1–15, June 2020.
- OpenAI. GPT-4. https://openai.com/research/gpt-4, 2023.
- Data-driven precondition inference with learned features. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16), page 42–56, June 2016.
- Induction duality: Primal-dual search for invariants. Proceedings of the ACM on Programming Languages, 6(POPL), January 2022.
- Can large language models reason about program invariants? In Proceedings of the 40th International Conference on Machine Learning (ICML ’23), 2023.
- Cln2inv: Learning loop invariants with continuous logic networks. In International Conference on Learning Representations, 2020.
- Learning loop invariants for program verification. In Advances in Neural Information Processing Systems, pages 7751–7762, 2018.
- Proof-oriented programming in F*. https://www.fstar-lang.org/tutorial/proof-oriented-programming-in-fstar.pdf, 2023.
- Learning to prove theorems via interacting with proof assistants. In International Conference on Machine Learning, pages 6984–6994. PMLR, 2019.
- DuoAI: Fast, automated inference of inductive invariants for verifying distributed protocols. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’22), pages 485–501, 2022.