AlphaProof Nexus: Autonomous Proofs
- AlphaProof Nexus is an autonomous system for generating and machine-checking Lean proofs using a robust generate–verify–refine pipeline.
- It employs an extensible multi-agent architecture with evolutionary search and Elo-rating to tackle complex mathematical problems like Erdős problems and OEIS conjectures.
- The system integrates cryptographically grounded attestation for verifiable provenance, supporting scalable research in combinatorics, optimization, and more.
AlphaProof Nexus is an autonomous system for the generation and machine-checking of mathematical proofs in Lean, equipped with an extensible, multi-agent architecture designed for large-scale, open-ended mathematical research. It operationalizes an end-to-end “generate–verify–refine” pipeline, employs advanced search and formalization strategies, and integrates cryptographically grounded attestation at the API level, enabling robust provenance across distributed LLM-driven workflows. AlphaProof Nexus has autonomously resolved open Erdős problems and OEIS conjectures, and is actively deployed across combinatorics, optimization, graph theory, algebraic geometry, and quantum optics contexts (Tsoukalas et al., 21 May 2026).
1. System Architecture
At its core, AlphaProof Nexus is built upon an iterative, feedback-driven flow:
- Input: User provides a Lean file (sketch) that contains theorem statements with
sorryplaceholders and markers such as EVOLVE-BLOCK and EVOLVE-VALUE. - Generation: An LLM-based prover (notably Gemini 3.1 Pro) produces candidate proofs via chain-of-thought and search-replace edits to the Lean file.
- Verification: Lean v4.27 is employed in sandboxed mode to check candidate proofs for correctness, ensuring no new axioms and the elimination of
sorry. - Refinement: Lean’s compiler feedback guides subsequent refinement iterations, terminating upon successful verification.
Key architectural modules and their functions include:
| Module | Functionality | Implementation Highlights |
|---|---|---|
| Prover subagent | LLM calls, search_replace | Gemini 3.1 Pro, Lean file diffs |
| Validator | Lean compilation & checking | No new axioms/sorry post-check |
| {AlphaProof} tool | Subgoal tree search | Returns: proof, disproof, or failure |
| Population database | Validated sketch storage/Elo rank | Used for evolutionary search |
| Rater subagents | Plackett–Luce pairwise Elo rating | Gemini 3.0 Flash-based |
| Controller | P-UCB sampler over sketch pool | Prompt assembly with inspiration |
The generation and verification mechanisms are formalized as:
- where is the set of prompt-formatted Lean files.
- where iff Lean compiles with no
sorry.
2. Formalization and Encoding Strategies
Translating open mathematical problems to Lean consists of:
- Core Definition Mapping: Mathematical entities (sets, sequences, graphs) are mapped to Lean types. For example, the property "A ⊆ ℕ avoids divisibility" is encoded as: 6
- Statement Formalization: Theorems and conjectures leverage existing Mathlib lemmas. OEIS conjectures are auto-formalized, e.g., via
seq n = ...and statements of the form∀ n, test_lemma n → conjecture n.
Best practices include annotating only modifiable regions using EVOLVE-BLOCK/EVOLVE-VALUE, isolating parameter values for schedule search, and supplementing with test lemmas for dataset sanity checks.
The system processed 353 Erdős problems (Lean-formalized) and 492 OEIS conjectures (selected from an initial set of 500 after manual correction).
3. Generation, Search Heuristics, and Agent Design
The system’s proof search is realized through a layered agent scaffold:
- Prompting: LLM contextualization (Lean code, error logs, prior sketches) with chain-of-thought heuristics.
- Diversity Parameters: Generation temperature set high ($0.7$–$1.0$) for provers, low ($0.2$) for raters.
- Actions:
search_replacediffs, limited {AlphaProof} queries per episode.
Agent designs:
- Agent A: Basic BFS with independent breadth search.
- Agent D: Heuristic-guided A* (full agent) employing evolutionary sampling, Elo-rated sketches, and P-UCB scoring:
with , and a top-64 sketch pool.
Representative search pseudocode: 7
Solve rate (0) and expected search time (1) are defined as:
- 2
- 3
4. Cost Model and Performance Metrics
The cost of autonomous proof generation is formally:
4
Empirical outcomes:
| Benchmark | Solved | Avg. tokens | Cost/proof (USD) |
|---|---|---|---|
| Erdős | 9/353 | 12M | 200 |
| OEIS | 44/492 | 3.5M | 80 |
Performance is also reported as:
- Solve-rate vs cost (with bootstrap error bars for Agents A/B, single runs for C/D).
- Wall-clock time: median 5–12 hours per proof with 10 parallel subagents.
5. Empirical Results and Benchmark Deployments
Key empirical milestones:
- Erdős Problems (353): 9 solutions—specific identifiers include #12(i,ii), #125, #138, #152, #741(i,ii), #846, #26. Deployed proof techniques include Chinese Remainder Theorem-based constructions, Diophantine thinning, greedy colorings, Sidon-set bounds, and combinatorial labelings.
- OEIS Conjectures (492): 44 conjectures solved, including asymptotic enumeration in Lean.
Agent D consistently outperformed in high-difficulty settings (e.g. solve-rate increase, 2–5× cost reduction), while Agents A/B performed comparably on medium-difficulty cases. Notably, the {AlphaProof} agent alone did not solve any of the Erdős problem set.
Failure modes were predominantly attributed to hallucinated sorry instances or unsubstantiated literature lemma inclusions.
6. Generalization, Provenance, and API-Level Attestation
AlphaProof Nexus has been generalized and deployed in:
- Combinatorics: Graph reconstruction theorems.
- Optimization: 5 convergence proofs for Anchored GDA, exploiting simultaneous EVOLVE-VALUE parameter-and-proof search.
- Graph Theory: Leaves-in-spanning-tree conjectures.
- Algebraic Geometry: Log-concavity in module sequences.
- Quantum Optics: Monochromatic quantum graphs for GHZ-state construction.
Best practices drawn from deployment include prioritization of mature domains within the Lean library, leveraging compiler feedback to avoid hallucinations, and budget scaling (300–1000 episodes) for high-variance problems. Elo-based evolutionary ranking accelerates difficult subgoals but can impede simple cases; the EVOLVE-VALUE mechanism supports new algorithmic discovery.
AEX-Based Attestation and Multi-Hop Provenance
The AEX protocol (Guan, 15 Mar 2026) is proposed as Nexus’s per-API attestation layer, providing signed, canonical JSON object commitments at each service boundary, with explicit transform receipts for trusted rewriting and streaming output hash-chains. This enables end-to-end, verifiable provenance across arbitrarily long LLM-based pipelines in Nexus, capturing every trusted mutation and transformation step. In the Nexus architecture, services can be policy-enforcing gateways, tool-calling orchestrators, or post-processors, acting as AEX issuers or transform-receipt issuers. The resulting global state is a provenance graph of request commitments, stream checkpoints, and output transforms, which can be composed with higher-level trust policies (e.g., TEE attestations, model-fingerprinting).
Limitations of AEX v1—including the mutual exclusion of checkpoints and full lineage within a single output chain—necessitate architectural decisions at each hop regarding proof granularity and provenance completeness.
7. Outlook and Future Directions
Planned extensions and open research directions for AlphaProof Nexus include:
- Formal method expansion to domains such as differential geometry and homotopy theory.
- Integration of constrained-domain heuristics, such as Gröbner bases.
- Adaptive computational budgets based on a priori difficulty prediction.
- Development of interactive user interfaces for proof-sketch diagnostics and human guidance.
A plausible implication is that, as the Lean library matures and attestation/provenance standards like AEX are adopted, the breadth and reliability of AI-driven theorem proving platforms such as AlphaProof Nexus will broaden, enabling complex, multi-stage research workflows with robust end-to-end trust (Tsoukalas et al., 21 May 2026, Guan, 15 Mar 2026).