AutoformBot System Architecture
- AutoformBot is a scalable multi-agent system that translates informal math texts into formal Lean 4 code via a persistent task dependency graph.
- It integrates rigorous formal verification pipelines with git-based collaborative workflows to ensure robust error isolation and efficient parallel processing.
- The architecture uses layered orchestration and modular agents, achieving high throughput and impressive completion metrics on large-scale mathematical libraries.
AutoformBot is a multi-agent system for large-scale autoformalization of mathematics textbooks, designed to translate informal prose into machine-checked Lean 4 definitions and proofs using thousands of collaborating LLM agents and formal verification tools. The system’s architecture is distinguished by rigorous dependency-aware task scheduling, formal proof-checking pipelines, and git-based collaborative workflows, enabling robust, modular, and scalable production of validated mathematical libraries (Rammal et al., 28 May 2026).
1. System Architecture Overview
AutoformBot consists of a layered, modular architecture in which a distributed multi-agent system converges around a persistent dependency graph of formalization tasks. At the top level, an Orchestrator ingests textbooks (PDFs), segments them into fine-grained formalization targets, and constructs a task dependency DAG reflecting logical and exposition-imposed relationships among statements. This DAG serves as the backbone for all parallelization and scheduling.
The architecture comprises six principal agent types with well-defined responsibilities:
- Orchestrator: Parses, segments, and plans across the global task DAG.
- Runner/Scheduler: Polls the DAG for ready (all dependencies satisfied) tasks, dispatches jobs, and manages pools of workers and reviewers.
- Worker: Short-lived coding agents that generate or repair Lean 4 code for an assigned task.
- Reviewer: Short-lived agents that review submissions for correctness and adherence to formalization protocols.
- Supervisor: Handles target-level global checking after merges, triages failures, and spawns remediation subtasks.
- Trace Analyzer: Conducts per-task failure analysis and provides detailed feedback to guide recovery.
Agents interact via the Model Context Protocol (MCP), a JSON RPC/method-call interface to Lean, git, filesystem, and DAG infrastructure. The system orchestrates massive agent concurrency while guaranteeing coordinated, acyclic progression through the global dependency structure.
2. Multi-Agent Orchestration and Communication
AutoformBot’s agent orchestration layer is designed for both massive parallelism and precise error recovery. The Orchestrator segments textbooks into individual “tasks,” each corresponding to a formalization unit (such as a definition or theorem statement/fix). Each task is added to a persistent DAG store, encoding logical or exposition-driven dependencies.
Workers and Reviewers are scheduled against this task DAG:
- Workers: Multiple LLM-based workers may compete (“race”) to complete the same formalization task; the first properly validated completion is accepted, and all other attempts are canceled. Each worker operates within an isolated git worktree, editing or generating Lean files.
- Reviewers: After mechanical gates (no Lean errors), submissions are reviewed for semantic integrity. Dependency-graph analysis is leveraged to detect hidden axioms or invalid proof tactics (“sorry”).
- Supervisory feedback: The Supervisor and Trace Analyzer continuously monitor outcomes, triggering immediate triaging and generating targeted remediation subtasks for failed or substandard results.
3. Formal Verification Pipeline Integration
AutoformBot’s operation critically depends on tight integration with Lean 4’s formal verification toolchain:
- Lean REPL and LSP: Workers interact with Lean via the MCP “tool servers,” including a persistent Lean REPL for incremental compilation and goal inspection, and a Lean LSP server for rapid diagnostics.
- Proof-checking Loop: The worker-modified Lean files are checked for compilation errors. Passing code is escalated to reviewers who cross-verify using Lean’s diagnostic tools and dependency-graph tracing, enforcing strict exclusion of “sorry” and unauthorized axioms or hooks.
- Evaluation Harness: After batch merges, a three-stage evaluation is applied:
- Mechanical (Lean) gates.
- Matcher for statement-to-declaration alignment.
- Three independent LLM-based judges, assessing faithfulness, proof integrity, and code quality, all leveraging the Lean dependency graph for axiomatic validation.
4. Dependency-Aware Scheduling and Collaborative Version Control
Central to AutoformBot’s scalability is its dependency-aware task dispatch and robust versioning. Tasks are organized in a persistent DAG (JSON adjacency list on-disk or in-memory), with nodes for formalization targets and directed edges for dependencies. The Scheduler employs a Kahn-style topological sorting algorithm, always dispatching tasks whose dependencies (“parents”) are complete.
Each agent instance operates on a short-lived git worktree branched off origin/main. The merging process is managed through a batched merge queue inspired by bors-ng:
Batch Merging: Approved branches are rebased and built in a batch; if the build fails, the queue is bisected to isolate the faulty commit.
Conflict Resolution: Bisection-based diagnosis marks the offending DAG node as failed; remediation is initiated by the Trace Analyzer.
Isolation: This strategy mitigates build breaks and allows parallel agent activity without cross-contamination, providing deterministic, reproducible merges.
5. Core Algorithms, Data Structures, and Performance
AutoformBot’s orchestration is underpinned by optimized core data structures and dispatch algorithms:
Task DAG: Encoded as JSON, with dual arrays (out_edges, in_degree) in memory for efficient ready task lookup.
Queues: The architecture separates readyQueue (tasks ready to dispatch), inProgress (active tasks), and mergeQueue (approved for main branch).
Worker Pooling: AsyncIO semaphores and thread-pools manage resource allocation for LLM and Lean sessions.
Dispatch Logic:
- At each iteration, up to available_slots ready tasks from the DAG are assigned to workers.
- When workers race, only the first successful submission is accepted.
The idealized throughput formula is given by
where is the number of parallel workers and is the expected worker execution time. Scheduling latency per task amortizes as . The overall makespan is approximated by
with as the sum of sequential task times and the number of DAG nodes.
6. Metrics, Benchmarks, and System Robustness
AutoformBot has demonstrated significant large-scale efficiency and robustness (Rammal et al., 28 May 2026). Across 26 mathematics textbooks totaling 4,007 formalization targets:
- 2,855 targets were formalized (71.3%).
- The system generated 483,918 lines of Lean code, consuming 183.2 million tokens.
Controlled ablations reveal the importance of each subsystem. On a 39-target Algebraic Combinatorics set (600M tokens):
| Pipeline Configuration | % Completed (39 targets) |
|---|---|
| Full pipeline | 77% |
| no orchestrator | 64% |
| no supervisor | 51% |
| no trace analyzer | 57% |
Parallelism yields nearly linear speedup for small problem sizes; for one, three, and five workers per task, wall-clock times were approximately 8 hours, 4 hours, and 4 hours, respectively, with increasing completion rates.
Model comparisons on this data yield, for 600M tokens, 92% completion on Claude 4.6 and 46% on Gemini 3.1 Pro.
7. Design Principles and Implications
AutoformBot embodies several best practices for scalable, verifiable automation:
- Layered Orchestration: Decoupling orchestration, verification, and review enables parallel agent activity and rapid recovery from fault.
- Fault Isolation: Short-lived git worktrees and merge batching guarantee deterministic merges and rapid localization of errors.
- Task-Driven Dependency Management: The universal task DAG facilitates fine-grained scheduling, parallelism, and systematic error isolation.
- Feedback Loops: Supervisory and tracing agents provide immediate root-cause analysis, enabling rapid re-planning and system resilience.
- Modularity: Each agent and tool is protocol-bound and independently replaceable.
These design principles have enabled the large-scale formalization of core mathematics content, making continuous, mechanically checkable formal mathematical libraries at scale technically and economically feasible. A plausible implication is that such architectures generalize to other domains requiring verified, distributed multi-agent collaboration on dependency-structured corpora (Rammal et al., 28 May 2026).