Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 60 tok/s

Gemini 2.5 Pro 51 tok/s Pro

GPT-5 Medium 39 tok/s Pro

GPT-5 High 40 tok/s Pro

GPT-4o 120 tok/s Pro

Kimi K2 211 tok/s Pro

GPT OSS 120B 416 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

An In-Context Learning Agent for Formal Theorem-Proving (2310.04353v5)

Published 6 Oct 2023 in cs.LG, cs.AI, cs.LO, and cs.PL

Abstract: We present an in-context learning agent for formal theorem-proving in environments like Lean and Coq. Current state-of-the-art models for the problem are finetuned on environment-specific proof data. By contrast, our approach, called COPRA, repeatedly asks a high-capacity, general-purpose LLM (GPT-4) to propose tactic applications from within a stateful backtracking search. Proposed tactics are executed in the underlying proof environment. Feedback from the execution is used to build the prompt for the next model query, along with selected information from the search history and lemmas retrieved from an external database. We evaluate our implementation of COPRA on the miniF2F benchmark for Lean and a set of Coq tasks from the CompCert project. On these benchmarks, COPRA significantly outperforms few-shot invocations of GPT-4. It also compares favorably against finetuning-based approaches, outperforming ReProver, a state-of-the-art finetuned approach for Lean, in terms of the pass@1 metric. Our code and data are available at https://github.com/trishullab/copra.

References (30)

Citations (12)

View on Semantic Scholar

Summary

The paper introduces Copra, an agent leveraging GPT-4's in-context learning combined with intelligent search and step-by-step environment interaction for formal theorem-proving.
Copra achieves competitive performance on Lean and Coq benchmarks, demonstrating that in-context learning with search can rival or surpass fine-tuned systems in efficiency.
This approach highlights the potential of general-purpose LLMs like GPT-4 to generalize across formal systems and bridge neural-symbolic methods for complex reasoning tasks.

An In-Context Learning Agent for Formal Theorem-Proving

The paper "An In-Context Learning Agent for Formal Theorem-Proving" explores a novel approach to automatic theorem-proving utilizing in-context learning capabilities of contemporary LLMs, particularly GPT-4. The researchers introduce a system named Copra, which contrasts with existing paradigms that depend heavily on fine-tuned models trained explicitly on formal proofs from specific environments like Lean or Coq.

Overview and Contributions

The heart of Copra lies in leveraging in-context learning, which uses the LLM's ability to adapt to different contexts provided by prompts to predict the next proof step. Here, GPT-4 is deployed from within a stateful search algorithm, in which each proposed tactic is executed in the theorem-proving environment. The failure or success of tactics informs subsequent moves, illustrating a dynamic interplay between LLM predictions and environment feedback.

Copra's approach is characterized by several key features:

Integration of Intelligent Search: By invoking the LLM iteratively as part of a backtracking search, Copra goes beyond mere few-shot inference. It discerns which tactics make genuine progress by checking the simplicity of resulting proof states as compared to previous states.
Retrieval-Augmented Contexts: The system utilizes external databases for lemma and theorem retrieval, thus reinforcing the model's capacity to deduce formal proofs without explicit environment-centric fine-tuning.
Interactive Theorem Proving: Rather than pre-committing to an entire proof sequence at once, Copra interacts with the theorem-proving environment step-by-step, utilizing execution feedback to avoid pitfalls like cyclic dependencies or inefficient tactic sequences. This interactivity introduces robustness against hallucinations often associated with LLM outputs.
General-Purpose LLM Utilization: Remarkably, the approach demonstrates the power of a black-box LLM when combined with intelligent prompting and environment interaction, achieving competitive results across differing theorem-proving frameworks without bespoke model training for each.

Results and Implications

Empirically, Copra's efficacy is tested on benchmarks from both Lean's mathematics library (mathlib) and Coq's CompCert. Notably, Copra surpasses few-shot GPT-4 invocations, illustrating that the synergy of search and LLM-guided step-taking is integral to success. Against other state-of-the-art systems, the approach shows comparable performance, outmatching some finetuned systems in Lean tasks like the miniF2F benchmark in terms of queries needed per proof, indicating enhanced efficiency.

Copra underscores the potential of LLMs like GPT-4 to generalize across multiple domains within theorem proving, suggesting advancements in AI applications from mathematics to software verification. The method bridges the gap between purely neural and purely symbolic approaches, leveraging strengths from each to navigate complex proof spaces effectively.

Speculation on Future Directions

Looking forward, this research paves multiple avenues for exploration, among them:

Fine-Tuning and Scaling: Validating whether LLMs smaller or comparable to GPT-4 can approach these tasks with augmented in-context learning through symbolic data-driven synthesis strategies could propose cost-effective alternatives.
Cross-Domain Adaptability: Extending the current methodology to other formal systems that incorporate inherent decision procedures or tactics could reveal the universality of this technique.

Through Copra, the paper illustrates a significant stride in harnessing the contextual understanding inherent in LLMs beyond routine NLP tasks, pushing the boundaries of their applicability to domains demanding high precision and iterative learning, such as formal theorem-proving.