CoqPilot, a plugin for LLM-based generation of proofs
Abstract: We present CoqPilot, a VS Code extension designed to help automate writing of Coq proofs. The plugin collects the parts of proofs marked with the admit tactic in a Coq file, i.e., proof holes, and combines LLMs along with non-machine-learning methods to generate proof candidates for the holes. Then, CoqPilot checks if each proof candidate solves the given subgoal and, if successful, replaces the hole with it. The focus of CoqPilot is twofold. Firstly, we want to allow users to seamlessly combine multiple Coq generation approaches and provide a zero-setup experience for our tool. Secondly, we want to deliver a platform for LLM-based experiments on Coq proof generation. We developed a benchmarking system for Coq generation methods, available in the plugin, and conducted an experiment using it, showcasing the framework's possibilities. Demo of CoqPilot is available at: https://youtu.be/oB1Lx-So9Lo. Code at: https://github.com/JetBrains-Research/coqpilot
- Proofster: Automated Formal Verification (ICSE ’23). IEEE Press, 26–30. https://doi.org/10.1109/ICSE-Companion58688.2023.00018
- Yves Bertot and Pierre Castéran. 2013. Interactive theorem proving and program development: Coq’Art: the calculus of inductive constructions. Springer Science & Business Media. https://doi.org/10.1007/978-3-662-07964-5
- The tactician: A seamless, interactive tactic learner and prover for coq. In International Conference on Intelligent Computer Mathematics. Springer, 271–277. https://doi.org/10.1007/978-3-030-53518-6_17
- Łukasz Czajka and Cezary Kaliszyk. 2018. Hammer for Coq: Automation for dependent type theory. Journal of automated reasoning 61 (2018), 423–453. https://doi.org/doi:10.1007/s10817-018-9458-4
- The Lean theorem prover (system description). In Automated Deduction-CADE-25: 25th International Conference on Automated Deduction, Berlin, Germany, August 1-7, 2015, Proceedings 25. Springer, 378–388. https://doi.org/10.1007/978-3-319-21401-6_26
- Henrico Dolfing. 2019. The $440 Million Software Error at Knight Capital. Retrieved June 3, 2024 from https://www.henricodolfing.com/2019/06/project-failure-case-study-knight-capital.html
- Visual Studio Code Extension and Language Server Protocol for Coq. https://github.com/ejgallego/coq-lsp
- TacTok: Semantics-aware proof synthesis. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 1–31. https://doi.org/10.1145/3428299
- Baldur: Whole-proof generation and repair with large language models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1229–1241. https://doi.org/10.1145/3611643.3616243
- A Survey on Large Language Models for Code Generation. arXiv preprint arXiv:2406.00515 (2024). https://doi.org/10.48550/arXiv.2406.00515
- CompCert-a formally verified optimizing compiler. In ERTS 2016: Embedded Real Time Software and Systems, 8th European Congress.
- Jessica MacNeil. 2019. Mariner 1 destroyed due to code error, July 22, 1962. Retrieved June 3, 2024 from https://www.edn.com/mariner-1-destroyed-due-to-code-error-july-22-1962/
- Isabelle/HOL: a proof assistant for higher-order logic. Springer. https://doi.org/10.1007/3-540-45949-9_5
- Logical Foundations. Software Foundations, Vol. 1. Electronic textbook.
- QED at large: A survey of engineering of formally verified software. Foundations and Trends® in Programming Languages 5, 2-3 (2019), 102–281. https://doi.org/10.1561/2500000045
- Graph2Tac: Learning Hierarchical Representations of Math Concepts in Theorem proving. arXiv preprint arXiv:2401.02949 (2024). https://doi.org/10.48550/arXiv.2401.02949
- Generating correctness proofs with neural networks. In Proceedings of the 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languages. 1–10. https://doi.org/10.1145/3394450.3397466
- A language-agent approach to formal theorem-proving. arXiv preprint arXiv:2310.04353 (2023). https://doi.org/10.48550/arXiv.2310.04353
- Kaiyu Yang and Jia Deng. 2019. Learning to prove theorems via interacting with proof assistants. In International Conference on Machine Learning. PMLR, 6984–6994. https://doi.org/10.48550/arXiv.1905.09381
- Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation. 283–294. https://doi.org/10.1145/1993498.1993532
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.