AlphaProof Nexus: Autonomous Proofs

Updated 24 May 2026

AlphaProof Nexus is an autonomous system for generating and machine-checking Lean proofs using a robust generate–verify–refine pipeline.
It employs an extensible multi-agent architecture with evolutionary search and Elo-rating to tackle complex mathematical problems like Erdős problems and OEIS conjectures.
The system integrates cryptographically grounded attestation for verifiable provenance, supporting scalable research in combinatorics, optimization, and more.

AlphaProof Nexus is an autonomous system for the generation and machine-checking of mathematical proofs in Lean, equipped with an extensible, multi-agent architecture designed for large-scale, open-ended mathematical research. It operationalizes an end-to-end “generate–verify–refine” pipeline, employs advanced search and formalization strategies, and integrates cryptographically grounded attestation at the API level, enabling robust provenance across distributed LLM-driven workflows. AlphaProof Nexus has autonomously resolved open Erdős problems and OEIS conjectures, and is actively deployed across combinatorics, optimization, graph theory, algebraic geometry, and quantum optics contexts (Tsoukalas et al., 21 May 2026).

1. System Architecture

At its core, AlphaProof Nexus is built upon an iterative, feedback-driven flow:

Input: User provides a Lean file (sketch) that contains theorem statements with sorry placeholders and markers such as EVOLVE-BLOCK and EVOLVE-VALUE.
Generation: An LLM-based prover (notably Gemini 3.1 Pro) produces candidate proofs via chain-of-thought and search-replace edits to the Lean file.
Verification: Lean v4.27 is employed in sandboxed mode to check candidate proofs for correctness, ensuring no new axioms and the elimination of sorry.
Refinement: Lean’s compiler feedback guides subsequent refinement iterations, terminating upon successful verification.

Key architectural modules and their functions include:

Module	Functionality	Implementation Highlights
Prover subagent	LLM calls, search_replace	Gemini 3.1 Pro, Lean file diffs
Validator	Lean compilation & checking	No new axioms/`sorry` post-check
{AlphaProof} tool	Subgoal tree search	Returns: proof, disproof, or failure
Population database	Validated sketch storage/Elo rank	Used for evolutionary search
Rater subagents	Plackett–Luce pairwise Elo rating	Gemini 3.0 Flash-based
Controller	P-UCB sampler over sketch pool	Prompt assembly with inspiration

The generation and verification mechanisms are formalized as:

$G: \Sigma^* \to Proof$ where $\Sigma^*$ is the set of prompt-formatted Lean files.
$V: Proof \to \{0,1\}$ where $V(\pi) = 1$ iff Lean compiles $\pi$ with no sorry.

2. Formalization and Encoding Strategies

Translating open mathematical problems to Lean consists of:

Core Definition Mapping: Mathematical entities (sets, sequences, graphs) are mapped to Lean types. For example, the property "A ⊆ ℕ avoids divisibility" is encoded as: $\Sigma^*$ 6
Statement Formalization: Theorems and conjectures leverage existing Mathlib lemmas. OEIS conjectures are auto-formalized, e.g., via seq n = ... and statements of the form ∀ n, test_lemma n → conjecture n.

Best practices include annotating only modifiable regions using EVOLVE-BLOCK/EVOLVE-VALUE, isolating parameter values for schedule search, and supplementing with test lemmas for dataset sanity checks.

The system processed 353 Erdős problems (Lean-formalized) and 492 OEIS conjectures (selected from an initial set of 500 after manual correction).

3. Generation, Search Heuristics, and Agent Design

The system’s proof search is realized through a layered agent scaffold:

Prompting: LLM contextualization (Lean code, error logs, prior sketches) with chain-of-thought heuristics.
Diversity Parameters: Generation temperature set high ($0.7$–$1.0$) for provers, low ($0.2$) for raters.
Actions: search_replace diffs, limited {AlphaProof} queries per episode.

Agent designs:

Agent A: Basic BFS with independent breadth search.
Agent D: Heuristic-guided A* (full agent) employing evolutionary sampling, Elo-rated sketches, and P-UCB scoring:

$\text{P-UCB score} = q + c \cdot \sqrt{\Sigma V_i} / (v + 1)$

with $c=0.2$ , and a top-64 sketch pool.

Representative search pseudocode: $\Sigma^*$ 7

Solve rate ( $\Sigma^*$ 0) and expected search time ( $\Sigma^*$ 1) are defined as:

$\Sigma^*$ 2
$\Sigma^*$ 3

4. Cost Model and Performance Metrics

The cost of autonomous proof generation is formally:

$\Sigma^*$ 4

Empirical outcomes:

Benchmark	Solved	Avg. tokens	Cost/proof (USD)
Erdős	9/353	12M	200
OEIS	44/492	3.5M	80

Performance is also reported as:

Solve-rate vs cost (with bootstrap error bars for Agents A/B, single runs for C/D).
Wall-clock time: median 5–12 hours per proof with 10 parallel subagents.

5. Empirical Results and Benchmark Deployments

Key empirical milestones:

Erdős Problems (353): 9 solutions—specific identifiers include #12(i,ii), #125, #138, #152, #741(i,ii), #846, #26. Deployed proof techniques include Chinese Remainder Theorem-based constructions, Diophantine thinning, greedy colorings, Sidon-set bounds, and combinatorial labelings.
OEIS Conjectures (492): 44 conjectures solved, including asymptotic enumeration in Lean.

Agent D consistently outperformed in high-difficulty settings (e.g. solve-rate increase, 2–5× cost reduction), while Agents A/B performed comparably on medium-difficulty cases. Notably, the {AlphaProof} agent alone did not solve any of the Erdős problem set.

Failure modes were predominantly attributed to hallucinated sorry instances or unsubstantiated literature lemma inclusions.

6. Generalization, Provenance, and API-Level Attestation

AlphaProof Nexus has been generalized and deployed in:

Combinatorics: Graph reconstruction theorems.
Optimization: $\Sigma^*$ 5 convergence proofs for Anchored GDA, exploiting simultaneous EVOLVE-VALUE parameter-and-proof search.
Graph Theory: Leaves-in-spanning-tree conjectures.
Algebraic Geometry: Log-concavity in module sequences.
Quantum Optics: Monochromatic quantum graphs for GHZ-state construction.

Best practices drawn from deployment include prioritization of mature domains within the Lean library, leveraging compiler feedback to avoid hallucinations, and budget scaling (300–1000 episodes) for high-variance problems. Elo-based evolutionary ranking accelerates difficult subgoals but can impede simple cases; the EVOLVE-VALUE mechanism supports new algorithmic discovery.

AEX-Based Attestation and Multi-Hop Provenance

The AEX protocol (Guan, 15 Mar 2026) is proposed as Nexus’s per-API attestation layer, providing signed, canonical JSON object commitments at each service boundary, with explicit transform receipts for trusted rewriting and streaming output hash-chains. This enables end-to-end, verifiable provenance across arbitrarily long LLM-based pipelines in Nexus, capturing every trusted mutation and transformation step. In the Nexus architecture, services can be policy-enforcing gateways, tool-calling orchestrators, or post-processors, acting as AEX issuers or transform-receipt issuers. The resulting global state is a provenance graph of request commitments, stream checkpoints, and output transforms, which can be composed with higher-level trust policies (e.g., TEE attestations, model-fingerprinting).

Limitations of AEX v1—including the mutual exclusion of checkpoints and full lineage within a single output chain—necessitate architectural decisions at each hop regarding proof granularity and provenance completeness.

7. Outlook and Future Directions

Planned extensions and open research directions for AlphaProof Nexus include:

Formal method expansion to domains such as differential geometry and homotopy theory.
Integration of constrained-domain heuristics, such as Gröbner bases.
Adaptive computational budgets based on a priori difficulty prediction.
Development of interactive user interfaces for proof-sketch diagnostics and human guidance.

A plausible implication is that, as the Lean library matures and attestation/provenance standards like AEX are adopted, the breadth and reliability of AI-driven theorem proving platforms such as AlphaProof Nexus will broaden, enabling complex, multi-stage research workflows with robust end-to-end trust (Tsoukalas et al., 21 May 2026, Guan, 15 Mar 2026).

Markdown Report Issue Upgrade to Chat

References (2)

Advancing Mathematics Research with AI-Driven Formal Proof Search (2026)

AEX: Non-Intrusive Multi-Hop Attestation and Provenance for LLM APIs (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AlphaProof Nexus.