Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification

Published 18 Dec 2024 in cs.SE and cs.AI | (2412.14063v3)

Abstract: Formal verification using proof assistants, such as Coq, enables the creation of high-quality software. However, the verification process requires significant expertise and manual effort to write proofs. Recent work has explored automating proof synthesis using machine learning and LLMs. This work has shown that identifying relevant premises, such as lemmas and definitions, can aid synthesis. We present Rango, a fully automated proof synthesis tool for Coq that automatically identifies relevant premises and also similar proofs from the current project and uses them during synthesis. Rango uses retrieval augmentation at every step of the proof to automatically determine which proofs and premises to include in the context of its fine-tuned LLM. In this way, Rango adapts to the project and to the evolving state of the proof. We create a new dataset, CoqStoq, of 2,226 open-source Coq projects and 196,929 theorems from GitHub, which includes both training data and a curated evaluation benchmark of well-maintained projects. On this benchmark, Rango synthesizes proofs for 32.0% of the theorems, which is 29% more theorems than the prior state-of-the-art tool Tactician. Our evaluation also shows that Rango adding relevant proofs to its context leads to a 47% increase in the number of theorems proven.

Abstract PDF HTML Upgrade to Chat

Summary

The paper introduces Rango, which adaptively integrates retrieval methods and LLM-driven proof synthesis to enhance automated theorem proving in Coq.
Rango employs BM-25 and TF-IDF to dynamically retrieve contextually relevant proofs and lemmas, contributing to a 47% increase in proving success.
Evaluations on the CoqStoq dataset show Rango outperforms competitors by proving 32% of theorems, demonstrating substantial improvements in verification efficiency.

Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification

The paper presents Rango, an automated proof synthesis tool that enhances formal verification processes in software development, especially utilizing the Coq proof assistant. Rango innovates by incorporating an adaptive retrieval-augmented proving (RAP) method using LLMs to enhance the automatic generation of proofs. With this approach, Rango identifies relevant lemma and proof data within the ongoing project context, refining its retrieval-based generation strategy at each proof step.

Methodological Advancements

Rango's approach involves the integration of retrieval augmentation at every step, allowing it to adapt to each unique project and proof state. This dynamic adjustment uses prior proofs and lemmas to enrich the context input to a fine-tuned LLM, facilitating more informed and relevant proof generation. The core components of Rango include:

Proof Retriever: Operates using the BM-25 sparse retrieval technique to identify the most contextually relevant proofs at each stage.
Lemma Retriever: Utilizes TF-IDF to isolate significant lemmas within the project that could directly assist the current proof state.
LLM: A decoder-only LLM fine-tuned on a vast dataset, CoqStoq, which uses retrieved contexts to predict suitable proof steps robustly.

Rango employs a rollout search technique for proof synthesis, iteratively sampling potential proof steps until a valid completion is formulated or a timeout condition is met. This iterative strategy is noted for its effective balance of exploration and exploitation in navigating complex proof search spaces.

Dataset and Evaluation

The CoqStoq dataset, a major contribution outlined in the paper, consists of 2,226 open-source GitHub repositories with a total of 196,929 theorems. This dataset is pivotal for training and evaluating Rango, ensuring it adheres to real-world formal verification challenges.

In comparative evaluations, Rango outperforms several state-of-the-art proof synthesis tools, such as Tactician, Proverbot9001, and Graph2Tac, demonstrating its capability with significant improvements:

32.0% Theorems Proven: Rango showed a substantial increase of 29% more theorems proven than Tactician and 66% more than Proverbot.
Proof Retrievers' Contribution: The adaptation of the proof retrieval mechanism resulted in a 47% increase in theorem proving success, emphasizing the utility of integrating both lemma and proof retrieval strategies.

Implications and Future Directions

The practical implications of Rango are considerable, potentially reducing the expertise barrier and manual effort needed in formal verification endeavors. By enhancing software reliability and reducing errors through proven verifications, such automated systems could significantly improve software quality assurance processes.

Theoretical implications include advancements in machine learning-assisted theorem proving, positing retrieval-augmented generation as a viable path to tackle complex proofs by leveraging both past exemplar proofs and typical premise retrieval. This dual-retrieval strategy could inspire similar adaptations in other proof-generating systems beyond Coq, broadening the applicability of Rango's methodology.

Future avenues of research could explore further optimizing the retrieval strategies to encompass dynamic online updates, potentially incorporating hybrid methods that leverage both offline database growth and real-time contextual analysis. Additionally, expanding Rango's framework to accommodate a broader array of proof assistants could universalize its benefits across different formal verification platforms, offering a more comprehensive approach to automated proof synthesis in software engineering.

Markdown