Rethlas: Automated Math Reasoning System
- Rethlas is a natural-language automated reasoning system that generates candidate proofs and detects errors in research mathematics.
- It utilizes an iterative propose–verify–revise cycle and leverages a large theorem retrieval engine, Matlas, to guide proof discovery.
- The system functions as both an informal proof-discovery agent and a verification tool within a broader workflow that includes formal checks by Archon.
Rethlas is a natural-language automated reasoning system for research mathematics that is presented, across several 2026 papers, as both an informal proof-discovery agent and, in broader usage, part of an automated workflow for verification and correction of AI-generated arguments. In its most explicit architectural description, Rethlas is the informal reasoning component of an automated conjecture-resolution framework paired with the formal verification agent Archon; in subsequent mathematical papers it is credited with generating candidate proofs, discovering counterexamples, supplying proof architectures, detecting errors, and assisting in corrections for results in commutative algebra and algebraic geometry (Ju et al., 4 Apr 2026, Liu, 20 May 2026, Liu et al., 21 May 2026, Jiang et al., 24 May 2026).
1. Definition and scope
Rethlas is described as a natural-language automated reasoning system designed to solve mathematical problems directly from problem statements stated in ordinary mathematical prose and to produce proofs or counterexamples that can be checked by experts; in at least one case, a resulting proof was also machine-verified in Lean 4 (Jiang et al., 24 May 2026). In the automated conjecture-resolution framework, it is the informal reasoning agent, while Archon is the formal verification agent responsible for Lean 4 formalization and machine-checkable correctness (Ju et al., 4 Apr 2026).
The system is explicitly not presented merely as a narrow theorem-proving assistant that outputs formal proof scripts. Rather, it is said to “mimic the workflow of human mathematicians” by exploring examples, searching for relevant theorems, proposing decompositions into subgoals, revising failed approaches, and assembling candidate proofs in natural language (Ju et al., 4 Apr 2026). This positioning is reinforced by reports of its use on open problems drawn from published lists in commutative algebra and related areas, where it is credited with self-contained proofs produced “with no human intervention” and subsequently verified by human experts (Jiang et al., 24 May 2026).
In applied papers, the term “Rethlas system” is sometimes used more broadly than in the two-agent framework. One paper, for example, describes feeding a ChatGPT-generated proof into the “Rethlas system’s verifier,” which detected an error and then helped obtain a corrected argument (Liu, 20 May 2026). This suggests that in later usage the name can refer not only to the informal reasoning agent in isolation but also to a surrounding workflow for checking and repair.
2. System architecture and computational workflow
The paper "Automated Conjecture Resolution with Formal Verification" presents a two-stage framework. The first stage is Rethlas, which performs informal reasoning and proof discovery; the second is Archon, which turns the informal argument into a formalized Lean 4 project through structured task decomposition, iterative refinement, and automated proof synthesis (Ju et al., 4 Apr 2026). The stated aim is end-to-end problem solving with minimal human intervention.
Within that framework, Rethlas contains a generation agent and a verification agent. The generation agent proposes informal proofs, while the verification agent checks them and returns feedback when verification fails. The system also maintains a working memory of intermediate artifacts, including examples, counterexamples, decomposition plans, and partial insights, and later queries this memory to preserve coherence across iterations (Ju et al., 4 Apr 2026). The resulting control pattern is an iterative propose verify revise loop.
Rethlas is tightly coupled to theorem retrieval. Its central retrieval component is Matlas, a semantic theorem search engine over mathematical statements. In the version used for the reported experiments, Matlas indexed about 13.6 million statements extracted from arXiv papers, including definitions, propositions, theorems, corollaries, examples, and remarks; statements are embedded in a vector database, and retrieval is performed by nearest-neighbor search using cosine similarity (Ju et al., 4 Apr 2026). When Rethlas applies its “search relevant results” skill, it first queries Matlas and may then use web search for additional context or terminology.
Archon, by contrast, is formal rather than informal. It works in Lean 4, uses LeanSearch for Mathlib retrieval, and is organized around a Plan Agent and a Lean Agent with a scaffolding / proving / verification-polish workflow (Ju et al., 4 Apr 2026). The division of labor is therefore explicit: Rethlas is responsible for mathematical insight and informal proof discovery, whereas Archon is responsible for formal synthesis and machine-checked verification.
3. Reasoning model and mathematical heuristics
Rethlas’s generation agent is described as using a set of “reasoning primitives distilled from the processes mathematicians employ.” The primitives named in the framework paper are:
- Construct toy examples
- Construct counterexamples
- Search relevant results
- Propose subgoal decomposition plans
- Direct proving
- Recursive proving
- Identify key failures (Ju et al., 4 Apr 2026)
These operations are not treated as a rigid linear pipeline. Instead, the system is instructed to assess the current state of the problem and choose appropriate actions dynamically. The paper describes a human-like workflow: try examples or counterexamples to build intuition, search the literature for relevant results, decompose the problem into subgoals, attempt direct proofs, and when that fails, recursively refine the plan with multiple parallel attempts; after several failures, recurring obstacles are summarized to guide the next iteration (Ju et al., 4 Apr 2026).
This model of reasoning is retrieval-augmented rather than purely generative. In the reported experiments, Matlas was crucial for surfacing a theorem of Jensen that was not obvious from the original commutative-algebra problem statement and that became central to the eventual solution strategy (Ju et al., 4 Apr 2026). The framework therefore treats theorem retrieval not as a bibliographic convenience but as an operational component of proof search.
The same paper emphasizes that Rethlas is strongest when it can explore and prune multiple candidate plans. In the commutative-algebra case study, it drafted several attack plans, discarded unsuccessful ones after repeated failures, then reformulated the problem around a newly retrieved theorem. That behavior fits the paper’s description of Rethlas as an informal proof-search agent rather than a single-pass text generator (Ju et al., 4 Apr 2026).
4. Benchmark case: automated conjecture resolution in commutative algebra
The framework paper’s principal demonstration concerns Anderson’s open problem asking whether weak quasi-completeness implies quasi-completeness for Noetherian local rings. The reported outcome is a counterexample showing that there exists a weakly quasi-complete ring that is not quasi-complete (Ju et al., 4 Apr 2026).
The discovery trajectory is described in unusual detail. Rethlas first performed a broad search on the original statement and its references; it then executed a focused search that led toward literature on completions and generic formal fibers, especially work of Fleming et al. It next drafted three plans—Plan A, Plan B, and Plan C—each based on a different structural route to a counterexample. After multiple attempts, Plans A and C failed. Continued retrieval then surfaced Jensen’s theorem on completions of UFDs with semi-local formal fibers, leading to a new Plan D (Ju et al., 4 Apr 2026).
The final construction uses the complete local domain
a nonprincipal height-one prime
and a local UFD with and trivial generic formal fiber. The argument then shows that is weakly quasi-complete by Farley’s criterion, while a quotient is not weakly quasi-complete because its completion is not a domain; the proof uses the equivalence that a 1-dimensional Noetherian local domain is weakly quasi-complete iff it is analytically irreducible (Ju et al., 4 Apr 2026). The paper states that Archon then formalized the resulting argument in Lean 4 with essentially no human involvement.
A later paper reports the same solution as one of several problems resolved by Rethlas and explicitly notes that this solution “has been formalized and machine-verified in Lean 4” (Jiang et al., 24 May 2026). Within the published record, this problem therefore functions as the clearest end-to-end example of Rethlas-guided discovery followed by formal verification.
5. Documented roles in subsequent mathematical papers
Several 2026 papers explicitly credit Rethlas in the production, correction, or organization of research-level arguments in algebraic geometry and commutative algebra (Liu, 20 May 2026, Liu et al., 21 May 2026, Liu, 21 May 2026, Jiang et al., 24 May 2026).
| Paper | Mathematical result | Stated role of Rethlas |
|---|---|---|
| (Liu, 20 May 2026) | Negative answer on effective divisors of positive self-intersection on smooth projective surfaces | Verifier detected an error in a ChatGPT-generated proof and helped obtain the corrected version |
| (Liu et al., 21 May 2026) | Boundedness of total Cartier indices for varieties with rational singularities in bounded families | Originated the overall proof structure, especially the surface / higher-dimensional split |
| (Liu, 21 May 2026) | Negative answers to Mauri–Moraga Question 9.9 on log Calabi–Yau pairs | Supplied a lengthy proof of a critical singularity-theoretic step after ChatGPT failed on that step |
| (Jiang et al., 24 May 2026) | Seven problems in commutative algebra and related areas | Produced self-contained proofs with no human intervention; experts later verified them |
In "An example of a very non-movable effective divisor," the author states that ChatGPT 5.5 pro generated the example and an initial proof, after which the proof was put into the Rethlas system’s verifier. The verifier alerted that the proof of the claim that has no sections was wrong; the author then reran the workflow through Rethlas and obtained the corrected argument that appears in the paper (Liu, 20 May 2026). The text also says that the main result was obtained by generative AI, “particularly Chatgpt 5.5 pro and the Rethlas system,” and the acknowledgements name the Rethlas team as Haocheng Ju, Jiedong Jiang, Shurui Liu, Guoxiong Gao, Yuefeng Wang, Zeming Sun, Bin Wu, Liang Xiao, and Bin Dong, noting a customized version of Rethlas used for the problem (Liu, 20 May 2026).
In "Boundedness of total Cartier indices for rational singularities in families," the authors state that the overall structure of the proof was “originated by generative AI, particularly the Rethlas system,” with the first prompt generated by ChatGPT Pro 5.5 and the rest handled by Rethlas (Liu et al., 21 May 2026). The proof blueprint attributed to the system includes localizing the problem, separating the surface case from dimensions , bounding residual Cartier indices via topology of links, using semialgebraic geometry for uniformity, and then globalizing and descending the result. The same paper also emphasizes that the first draft contained many errors, that the proof of the semialgebraic torsion lemma was “essentially wrong” in the AI-generated draft, and that substantial elaboration, verification, and rewriting were carried out by hand (Liu et al., 21 May 2026).
In "On a question of Mauri and Moraga," both counterexamples are said to have been obtained by generative AI, but Rethlas played a more concentrated role in the first one (Liu, 21 May 2026). ChatGPT 5.5 pro produced an almost-complete threefold construction but failed at the step of proving that the singularity
0
is not a quotient singularity. The paper reports that the problem was then forwarded to the Rethlas system, which “solved the problem via a very lengthy proof”; the final paper does not reproduce that proof and instead cites Reid’s classification of threefold cDV quotient singularities for the non-quotient conclusion (Liu, 21 May 2026). By contrast, the second surface counterexample was described as a one-shot output of ChatGPT 5.5 pro.
The paper "On some open problems in commutative algebra resolved by Rethlas" presents the broadest claims about the system’s autonomous capabilities (Jiang et al., 24 May 2026). It reports seven resolved problems or questions, with five negative and two positive answers, and states that the proofs were produced by Rethlas with no human intervention and later verified by human experts. The problems span finite conductor and quasi coherent ascent, weak quasi-completeness, integer-valued polynomial biring structures, finite 1-character in AGCD domains, local Jaffardness of David’s factorial domain, and two nonrealizability results in Boij–Söderberg theory (Jiang et al., 24 May 2026).
6. Reliability, verification, and methodological significance
The published accounts do not present Rethlas as uniformly reliable in the sense of producing final, publication-ready proofs without further scrutiny. Instead, they document a range of workflows, from nearly autonomous resolution with formal verification to AI-generated drafts that required substantial correction. The framework paper reports a fully automated resolution of an open commutative-algebra problem with formal verification in Lean 4 and essentially no human involvement (Ju et al., 4 Apr 2026). The commutative-algebra collection paper likewise reports self-contained proofs produced with no human intervention and later checked by experts (Jiang et al., 24 May 2026).
By contrast, the algebraic-geometry papers present a more heterogeneous picture. One paper credits Rethlas with catching an error in a ChatGPT-generated proof and helping produce the corrected argument that made the final theorem rigorous (Liu, 20 May 2026). Another attributes to Rethlas the overall proof architecture while stressing that the AI draft contained many errors and that one key lemma was essentially wrong and had to be repaired by hand (Liu et al., 21 May 2026). A third says that Rethlas supplied a lengthy proof of a crucial singularity-theoretic step, but the final exposition replaced that argument with a citation to existing classification theory (Liu, 21 May 2026).
These reports suggest that Rethlas occupies a methodological position between heuristic large-language-model generation and strict formal proof assistants. It is used to explore proof space, retrieve non-obvious theorems, propose high-level proof skeletons, and in some cases generate complete proofs; but the degree of downstream validation ranges from expert checking to Lean 4 formalization, and several papers explicitly emphasize customized versions of the system, verification stages, or substantial human rewriting (Liu, 20 May 2026, Liu et al., 21 May 2026). In that sense, Rethlas is best characterized not as a single theorem-proving modality but as a research-mathematics reasoning system whose published uses span discovery, correction, synthesis, and collaboration.