Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 23 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 190 tok/s Pro
GPT OSS 120B 425 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Ax-Prover: Multi-Agent Theorem Prover

Updated 18 October 2025
  • Ax-Prover is a multi-agent system that integrates LLM creative reasoning with Lean's formal verification to automate theorem proving.
  • It employs an orchestrator, prover, and verifier coordinated via the Model Context Protocol (MCP) for iterative and error-free proof construction.
  • Benchmark results and a practical formalization use case demonstrate Ax-Prover's efficiency, adaptability, and collaborative potential across diverse scientific domains.

Ax-Prover is a multi-agent system for automated theorem proving in mathematics and quantum physics, implemented in the Lean proof assistant. It combines the creative reasoning capabilities of general-purpose LLMs with Lean’s rigor through a tool-mediated protocol, allowing both autonomous operation and human-machine collaborative proof construction. Notably, Ax-Prover demonstrates strong generalization across mathematical and scientific domains, competitive benchmark results, and effective practical support for expert-driven formalization tasks (Tredici et al., 14 Oct 2025).

1. Multi-Agent System Architecture

Ax-Prover’s architecture features three specialized agents coordinated over the Model Context Protocol (MCP):

  • Orchestrator: Schedules the proof task, distributes subtasks to the other agents, manages feedback, and maintains the refinement loop. It terminates when the proof is verified or resource limits are reached.
  • Prover: Leveraging a general-purpose LLM (e.g., Claude Sonnet 4), it synthesizes a natural language proof sketch and incrementally translates this into Lean code. Lean tools are called via MCP, including edit_file, lean_diagnostic_messages, lean_goal, lean_leansearch, and lean_loogle. This enforces correctness through regular verification in the Lean environment.
  • Verifier: Operates on diagnostics from Lean, using the MCP to ensure the proof is error-free and free of unproven placeholders such as sorry or admit.

Agents operate in a closed loop—problem dispatch, iterative construction, and verification—yielding formally validated Lean proofs.

2. Benchmark Performance and Comparative Results

Ax-Prover was evaluated on both existing and newly created Lean benchmarks:

Benchmark Easy Acc. Interm. Acc. Overall Acc. Notable Results
NuminaMath-LEAN 81% 47% 51% Pass@1 on Unsolved: 26%
Abstract Algebra AA 72% 56% 64% Outperforms Mathlib LLMs
QuantumTheorems QT 100% 92% 96% Full coverage (easy);
PutnamBench 14% Ranked 3rd, strong sample efficiency

Ax-Prover outperforms specialized provers (e.g., DeepSeek-Prover, Kimina) on newly introduced abstract algebra and quantum theory benchmarks, and is competitive on established ones, often exceeding specialist models in accuracy using far fewer compute resources (e.g., reduced parallel sampling on PutnamBench).

3. Generalization and Domain Adaptability

Unlike systems trained solely on mathematical corpora or narrow domains, Ax-Prover leverages broad-domain knowledge inherent in general-purpose LLMs. MCP ensures always up-to-date interaction with Lean libraries without retraining. This methodology enables rapid adaptation to diverse disciplines such as algebra, quantum physics, and cryptography, regardless of local idiosyncrasies in formalization.

The modular multi-agent framework further supports component interchangeability and parallel development, facilitating the extension and improvement of individual agents without disrupting the overall system.

4. Formalization of Advanced Theorems: Practical Use Case

Ax-Prover demonstrated its collaborative capability by assisting in the Lean formalization of a cryptography theorem (Branch Number Computation for Non-Singular Matrices over Finite Fields). The formal statement addressed:

B(M)=min{wh(x)+wh(Mx):xFqn,x0}\mathcal{B}(M) = \min \{ w_h(x) + w_h(Mx) : x \in \mathbb{F}_q^n, x \neq 0 \}

with an alternate, computation-friendly version: B(M)=min{min{h(M,x),h(M1,x)}:xFqn,1wh(x)n+12}\mathcal{B}(M) = \min \left\{\min \{ h(M, x), h(M^{-1}, x) \} : x \in \mathbb{F}_q^n, 1 \leq w_h(x) \leq \left\lfloor \frac{n+1}{2} \right\rfloor \right\} where h(M,x)=wh(x)+wh(Mx)h(M, x) = w_h(x) + w_h(Mx).

Ax-Prover supported the human expert by co-structuring the proof, verifying lemmas, and error-checking intermediate steps. It autonomously detected a misstep and enabled completion of the formalization on modest hardware in two working days, demonstrating practical usability in scientific research.

5. Technical Details and Protocol Integration

Proof generation in Ax-Prover uses iterative augmentation:

  • The Prover agent writes a Lean proof with natural language “have” statements or outlines.
  • Code is regularly compiled; diagnostic messages from Lean identify errors.
  • MCP orchestrates Lean tool usage, e.g., lean_goal for goal state queries, lean_loogle for theorem searching, and edit_file for code manipulation.
  • Placeholders like sorry/admit are phased out as proof fragments coalesce into a complete verification chain.

This tightly coupled LLM-logic system enables both creative exploration and strict protocol-driven validation.

6. Planned Extensions and Future Directions

Ongoing development includes:

  • Parallelization: Agents exploring distinct proof paths concurrently to increase coverage and solution rates for complex theorems.
  • Long-Term Memory Module: Persisting insights from previous proofs and human-agent interactions, fostering cumulative scientific knowledge and sustained collaboration.
  • These enhancements aim to evolve Ax-Prover into a continually learning, memory-augmented scientific assistant capable of reliable reasoning across formalizable domains.

7. Significance for Formal Scientific Discovery

Ax-Prover constitutes a deep reasoning agentic framework integrating LLM-based creativity with Lean’s formal verification. Its multi-agent modularity, tool-assisted protocol, and benchmark-leading generalization establish it as an effective system for both autonomous proof discovery and human-center collaboration. This framework addresses known limitations of specialization, enables rapid formalization in emerging scientific fields, and sets a foundation for verifiable scientific artificial intelligence (Tredici et al., 14 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Ax-Prover.