Cobblestone: Iterative Automation for Formal Verification

Published 25 Oct 2024 in cs.LO, cs.AI, and cs.PL | (2410.19940v1)

Abstract: Formal verification using proof assistants, such as Coq, is an effective way of improving software quality, but it is expensive. Writing proofs manually requires both significant effort and expertise. Recent research has used machine learning to automatically synthesize proofs, reducing verification effort, but these tools are able to prove only a fraction of the desired software properties. We introduce Cobblestone, a new proof-synthesis approach that improves on the state of the art by taking advantage of partial progress in proof synthesis attempts. Unlike prior tools, Cobblestone can produce multiple unsuccessful proofs using a LLM, identify the working portions of those proofs, and combine them into a single, successful proof, taking advantage of internal partial progress. We evaluate Cobblestone on two benchmarks of open-source Coq projects, controlling for training data leakage in LLM datasets. Fully automatically, Cobblestone can prove 48% of the theorems, while Proverbot9001, the previous state-of-the-art, learning-based, proof-synthesis tool, can prove 17%. Cobblestone establishes a new state of the art for fully automated proof synthesis tools for Coq. We also evaluate Cobblestone in a setting where it is given external partial proof progress from oracles, serving as proxies for a human proof engineer or another tool. When the theorem is broken down into a set of subgoals and Cobblestone is given a set of relevant lemmas already proven in the project, it can prove up to 58% of the theorems. We qualitatively study the theorems Cobblestone is and is not able to prove to outline potential future research directions to further improve proof synthesis, including developing interactive, semi-automated tools. Our research shows that tools can make better use of partial progress made during proof synthesis to more effectively automate formal verification.

Abstract PDF HTML Upgrade to Chat

References (89)

Summary

The paper introduces Cobblestone, which iteratively synthesizes verified proofs by combining partial successes from multiple LLM-generated attempts.
It achieves state-of-the-art results with a 48% success rate on CoqGym and 38% on coq-wigderson, outperforming tools like Proverbot9001 and CoqHammer.
The methodology incorporates external data, such as proven lemmas and subgoal decompositions, paving the way for interactive and efficient formal verification.

Overview of "Cobblestone: Iterative Automation for Formal Verification"

The paper "Cobblestone: Iterative Automation for Formal Verification" introduces Cobblestone, a novel approach for automating formal proof synthesis, specifically targeting the Coq proof assistant. In the field of formal verification, manually constructing proofs is a labor-intensive process that requires considerable expertise. Traditional approaches like using proof assistants have seen improvements with machine learning techniques, yet the automation of proof synthesis remains somewhat constrained in scope and effectiveness.

Key Contributions

Partial Proof Synthesis with LLMs: Cobblestone leverages LLMs to iterate over potential proofs, highlighting its capacity to combine segments of partial proofs into a coherent and verified proof. Its distinctiveness lies in using partial successes from multiple unsuccessful proof attempts and synthesizing them into accurate proofs.
Evaluation against Benchmarks: Cobblestone's performance was measured against two benchmarks: a subset from the CoqGym test set and the coq-wigderson project. Notably, CoqGym is widely used for evaluating proof synthesis tools but may include potential data leakage due to its presence in LLMs' pre-training datasets. Conversely, coq-wigderson was specifically chosen to mitigate such risks due to its post-LLM pre-training creation.
State-of-the-Art Results: The evaluation results indicate that Cobblestone outperforms existing state-of-the-art proof synthesis tools and baselines, including Proverbot9001 and CoqHammer. On CoqGym100, it achieved a 48% success rate, significantly exceeding previous methods' performance. On Wigderson100, Cobblestone also showed superior results with a 38% success rate.
Incorporation of External Information: Cobblestone can harness external data, such as proven lemmas or subgoal decompositions from human engineers or other tools. This approach showed substantial enhancement in its performance, marking a promising direction for development in interactive or semi-automated proof synthesis systems.

Implications and Future Directions

The implications of this research are substantial for both practical and theoretical aspects of AI:

Practical Implications: The progress of tools like Cobblestone heralds potential reductions in the time and expertise needed to verify software systems formally. Improved automation may encourage broader adoption of formal methods in industry, contributing to higher software reliability and reduced costs due to software faults.
Theoretical Implications: The use of LLMs in Cobblestone extends beyond traditional applications by employing natural language processing models effectively in formal verification settings. This interdisciplinary approach highlights opportunities for LLMs to solve complex structured problems in software engineering domains.
Speculative Future Developments: As Cobblestone demonstrates the synergy between LLMs and formal verification processes, future AI developments might focus on more refined integration of diverse data sources and training paradigms to improve the understanding and synthesis of proofs. Such advancements could lead to interactive verification environments that accentuate collaboration between software engineers and AI tools.

Conclusion

In conclusion, Cobblestone presents a robust and innovative step forward in the automation of formal verification processes. Its methodology not only surpasses existing automated proof synthesis tools in effectiveness but also paves the way for more interactive and user-friendly verification systems. By leveraging state-of-the-art machine learning frameworks, Cobblestone can potentially redefine the landscape of formal software verification, enabling greater accessibility and usability in critical software system developments.