Effectiveness of LLMs in Automating Proofs with Proof Assistants

Determine the effectiveness of large language models in automating proofs for formal verification using proof assistants.

Background

The paper investigates whether LLMs can effectively automate proof generation within proof assistants. As motivation, the authors note that despite the promise of LLMs for verification, their actual effectiveness is unclear. To address this uncertainty, they conduct a case paper on two mature Rocq (Coq) projects—hs-to-coq and Verdi—performing quantitative and qualitative analyses of LLM-generated proofs.

Their paper examines factors such as external dependencies, context within source files, proof sizes, and project-specific differences, reporting empirical findings on where LLMs succeed and fail. The open question in the abstract frames the need for systematic evaluation of LLM performance in real verification workflows.

References

However, it is unclear how effective LLMs are in this task.

— A Case Study on the Effectiveness of LLMs in Verification with Proof Assistants (2508.18587 - Bayazıt et al., 26 Aug 2025) in Abstract (Page 1)

Effectiveness of LLMs in Automating Proofs with Proof Assistants

Sponsor

Background

References

Related Problems