Papers
Topics
Authors
Recent
Search
2000 character limit reached

Automated Conjecture Resolution with Formal Verification

Published 4 Apr 2026 in cs.LG and cs.AI | (2604.03789v1)

Abstract: Recent advances in LLMs have significantly improved their ability to perform mathematical reasoning, extending from elementary problem solving to increasingly capable performance on research-level problems. However, reliably solving and verifying such problems remains challenging due to the inherent ambiguity of natural language reasoning. In this paper, we propose an automated framework for tackling research-level mathematical problems that integrates natural language reasoning with formal verification, enabling end-to-end problem solving with minimal human intervention. Our framework consists of two components: an informal reasoning agent, Rethlas, and a formal verification agent, Archon. Rethlas mimics the workflow of human mathematicians by combining reasoning primitives with our theorem search engine, Matlas, to explore solution strategies and construct candidate proofs. Archon, equipped with our formal theorem search engine LeanSearch, translates informal arguments into formalized Lean 4 projects through structured task decomposition, iterative refinement, and automated proof synthesis, ensuring machine-checkable correctness. Using this framework, we automatically resolve an open problem in commutative algebra and formally verify the resulting proof in Lean 4 with essentially no human involvement. Our experiments demonstrate that strong theorem retrieval tools enable the discovery and application of cross-domain mathematical techniques, while the formal agent is capable of autonomously filling nontrivial gaps in informal arguments. More broadly, our work illustrates a promising paradigm for mathematical research in which informal and formal reasoning systems, equipped with theorem retrieval tools, operate in tandem to produce verifiable results, substantially reduce human effort, and offer a concrete instantiation of human-AI collaborative mathematical research.

Summary

  • The paper demonstrates a dual-agent approach combining informal strategy generation and formal Lean 4 verification to tackle open mathematical problems.
  • It utilizes advanced retrieval methods via Matlas and LeanSearch to efficiently decompose and verify conjectures in commutative algebra.
  • The framework's success in resolving Anderson’s open problem underscores its potential to automate rigorous mathematical proofs at scale.

Automated Conjecture Resolution with Formal Verification: A Technical Analysis

Motivation and Context

LLMs have recently demonstrated notable progress in mathematical reasoning, advancing from elementary tasks to increasingly complex research-level challenges. However, for research mathematics, reliable solution generation and verification remain unsolved, primarily due to ambiguities in natural language and the requirement for absolute rigor in mathematical argumentation. The paper "Automated Conjecture Resolution with Formal Verification" (2604.03789) addresses these issues by proposing a framework that combines informal natural language reasoning and formal, machine-checkable verification, thereby enabling end-to-end automated mathematical problem solving.

Framework Architecture

The proposed system comprises two synergistic agents: Rethlas (informal reasoning agent) and Archon (formal verification agent), each leveraging advanced retrieval and reasoning tools. Figure 1

Figure 1: Overview of the framework pipeline integrating informal agent Rethlas and formal agent Archon for automated conjecture resolution and verification.

Informal Agent: Rethlas

Rethlas emulates the workflow of expert mathematicians using a skills-and-tools approach. Its design is modular, comprising a reasoning generator and a verifier subagent (Figure 2). The core capabilities of Rethlas include:

  • Example/counterexample construction: To test the robustness or limitations of conjectures and hypotheses.
  • Subgoal decomposition: Partitioning complex problems into tractable intermediates.
  • Semantic theorem search (Matlas): Fast, structured retrieval from a database of ~13.6 million mathematical statements, enabling efficient access to relevant theorems, examples, and literature.
  • Recursive and critical failure analysis: Iterative plan evolution and identification of repeated failures to optimize search strategy.
  • Artifact memory: Persistent storage and querying of intermediate objects and plans. Figure 2

    Figure 2: Architecture of the Rethlas agent, showing generation and verification subagents and their interactions with the theorem search engine and working memory.

The generator synthesizes informal proofs, while the verifier rigorously inspects proofs for errors, skipped steps, and inapplicable citations, using both Matlas and external searches for cross-domain consistency.

Formal Agent: Archon

Archon is dedicated to translating informal, human-readable proofs into fully formalized Lean 4 projects, capable of passing machine verification without human oversight. Its architecture is dual-agent:

  • Plan Agent: Performs structured task decomposition and provides targeted guidance.
  • Lean Agent: Executes formalization, employing LeanSearch for efficient theorem and lemma retrieval from Mathlib.

Noteworthy technical components:

  • Persistent memory and reviewing: Session summaries and global status documents facilitate high-level progress tracking, context management, and proactive stall detection.
  • Tool orchestration: Archon integrates CLI scripting, Lean LSP protocols, structured skill guidelines, and a reference management subsystem for scalable project-scale formalization.
  • Autonomous problem-solving: Fills logical gaps left implicit in informal arguments, diagnoses and corrects failed strategies, and independently discovers alternative proof strategies where library support is lacking.

Archon's workflow comprises three stages: (1) scaffolding with problem decomposition, (2) iterative proving with plan/lean agent cycles, and (3) verification and codebase quality passes.

Case Study: Automated Resolution of Anderson's Open Problem

To evaluate the efficacy of this framework, the authors targeted the Anderson (2014) open problem in commutative algebra:

Does weak quasi-completeness imply quasi-completeness for Noetherian local rings?

This forms a core question in the topology of local rings, relating to the behavior of ideal chains.

Problem Reduction and Strategy Formation

Via literature search and reductions (enabled by Matlas), Rethlas recast the problem in terms of the existence of a Noetherian local ring AA that is weakly quasi-complete but has a quotient A/aAA/aA not weakly quasi-complete. Key external results, such as Farley's criterion on generic formal fibers and Anderson's criterion on analytic irreducibility, facilitated this reduction.

Rethlas' exploration trajectory is presented in Figure 3. Figure 3

Figure 3: Rethlas' exploration trajectory, showing search, plan generation, and iterative strategy refinement leading to the counterexample.

The successful approach involved leveraging Jensen's (2006) result characterizing completions of UFDs with prescribed formal fibers. Using this, Rethlas identified the ring T=C[[x,y,z]]/(x2−yz)T = \mathbb{C}[[x, y, z]]/(x^2 - yz) with a nonprincipal height-one prime and constructed a local UFD AA such that:

  • AA is weakly quasi-complete (trivial generic formal fiber).
  • A/aAA/aA fails to be analytically irreducible (hence not weakly quasi-complete).

Proof Formalization and Verification

Archon formalized the entire proof pipeline, including the main theorem and all supporting lemmas and constructions. Key observations:

  • Autonomous gap-filling: Archon automatically generated rigorous proofs for steps omitted in the informal outline, including explicit isomorphism arguments, cardinality identities, and detailed transfinite recursions.
  • Non-trivial project scale: The resulting formalization comprised approximately 19,000 lines of Lean 4 code, verified via both lake build and the Comparator tool for semantic equivalence with human-readable specifications.
  • Zero mathematical human input: The only human involvement was procuring paywalled references; no mathematical decision-making or proof steering was necessary.

The system thus provided a negative answer to Anderson's question by constructing a weakly quasi-complete Noetherian local ring that is not quasi-complete and fully formalizing the proof in Lean 4.

Performance Evaluation and Limitations

Strengths:

  • Cross-domain retrieval: Rethlas, via Matlas, rapidly assimilated and applied advanced results from adjacent mathematical domains.
  • Formalization cost and scale: Archon achieved several person-months worth of formal proof generation in approximately 80 hours of runtime and at moderate monetary cost.
  • Autonomous strategy adaptation: When infrastructure (e.g., Krull domain theory) was unavailable, Archon independently discovered and utilized alternative lemma characterizations, such as Kaplansky's criterion for UFDs.

Limitations:

  • Inefficiencies in ambiguous or underspecified proof steps: Archon sometimes explored tangential routes or deferred obligations strategically, underscoring the need for more targeted prompt engineering or optional minimal expert intervention.
  • Non-idiomatic Lean code generation: While correct, the output Lean code is verbose, lacks Mathlib-standard naming and structuring, and would require human refactoring for upstream contribution.
  • Brittleness for even greater project scale: In more complex formalization tasks, compounding inefficiencies and overthinking in gap regions could significantly slow progress.

A controlled ablation study confirmed that brief, blueprint-style human mathematical guidance can improve throughput (about 70% time reduction in key bottlenecks), without fundamentally altering the agent's chosen proof routes.

Theoretical and Practical Implications

Theoretical

  • Demonstrates that LLM-based agents, when equipped with robust retrieval and verification subsystems, can autonomously resolve open problems in research mathematics, including intricate counterexample constructions.
  • Establishes a paradigm for future integrations of informal (LLM-driven) and formal (mechanical checking) reasoning, where the informal agent explores and drafts plausible proof strategies and the formal agent rigorously fills all logical details.

Practical

  • Substantially accelerates the pace of research-level mathematical discovery and verification.
  • Shifts the role of human mathematicians to oversight and strategic guidance, analogous to PhD supervision rather than detailed proof checking.
  • Offers a route to reliable, completely checkable proofs for new results, thus reducing the risk of error propagation in mathematical literature.

This agentic, dual-phase structure for mathematical research is highly generalizable and suggests imminent opportunities for fully-automated formalization pipelines across mathematics and theoretical computer science.

Conclusion

The framework introduced in "Automated Conjecture Resolution with Formal Verification" showcases a powerful agentic approach to automated mathematical problem solving, seamlessly integrating natural language reasoning, semantic search, and formal proof synthesis (2604.03789). By autonomously resolving and formally verifying an open problem in commutative algebra, the authors validate the effectiveness and generalizability of such dual-agent architectures. This work substantially reduces the need for human involvement in both conjecture resolution and formal verification, heralding significant shifts in the methodology of mathematical research and its automation.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 5 tweets with 44 likes about this paper.

HackerNews