Papers
Topics
Authors
Recent
Search
2000 character limit reached

LeanArchitect: Automating Blueprint Generation for Humans and AI

Published 30 Jan 2026 in cs.LO | (2601.22554v1)

Abstract: Large-scale formalization projects in Lean rely on blueprints: structured dependency graphs linking informal mathematical exposition to formal declarations. While blueprints are central to human collaboration, existing tooling treats the informal ($\LaTeX$) and formal (Lean) components as largely decoupled artifacts, leading to maintenance overhead and limiting integration with AI automation. We present LeanArchitect, a Lean package for extracting, managing, and exporting blueprint data directly from Lean code. LeanArchitect introduces a declarative annotation mechanism that associates formal declarations with blueprint metadata, automatically infers dependency information, and generates $\LaTeX$ blueprint content synchronized with the Lean development. This design eliminates duplication between formal and informal representations and eases fine-grained progress tracking for both human contributors and AI-based theorem provers. We demonstrate the practicality of LeanArchitect through the automated conversion of several large existing blueprint-driven projects, and through a human--AI collaboration case study formalizing a multivariate Taylor theorem. Our results show that LeanArchitect improves maintainability, exposes latent inconsistencies in existing blueprints, and provides an effective interface for integrating AI tools into real-world formalization workflows.

Summary

  • The paper demonstrates a novel Lean-native, annotation-driven middleware that automates blueprint generation by inferring dependencies and proof statuses.
  • It integrates with the Lake build system to generate synchronized LaTeX artifacts, reducing manual metadata maintenance and eliminating blueprint drift.
  • Empirical results reveal significant improvements in project management, error detection, and effective collaboration between human and AI theorem provers.

Automating Lean Blueprints: The LeanArchitect Framework

Introduction and Motivation

Lean-based formalization projects at scale encounter severe challenges in synchronizing human-readable blueprints—dependency graphs linking informal mathematical expositions and formal declarations—with their evolving mechanization in Lean. Traditional blueprints duplicate metadata between LaTeX and Lean code, resulting in substantial maintenance overhead, drift between the blueprint and codebase, and limiting the seamless integration of AI-based theorem provers, which rely on fine-grained, structured representations for automation and progress tracking. "LeanArchitect: Automating Blueprint Generation for Humans and AI" (2601.22554) addresses these challenges by architecting an annotation-driven, Lean-native extraction and synchronization middleware, to unify, automate, and extend blueprinting workflows.

LeanArchitect System Design

LeanArchitect implements a declarative annotation mechanism centered on a @[blueprint] attribute (with rich metadata parameterization) that enables explicit tagging of Lean declarations (theorems or definitions) with blueprint-specific data such as LaTeX label, natural language statement/proof, project management annotations, and title. Critically, LeanArchitect performs static analysis on the tagged declaration to automatically infer both dependency edges (by collecting required constants from types and bodies) and formalization status (via propagation of sorry placeholders), maintaining a persistent environment extension to store the resulting blueprint node graph.

This architectural choice enforces Lean as the single source of truth, eliminating redundancy by supporting the emission of LaTeX blueprint fragments whose structure and proof status are programmatically synchronized with their Lean origins. The system directly integrates with the Lake build infrastructure, enabling deterministic, incremental, and reproducible LaTeX artifact generation suitable for consumption by existing tools like leanblueprint. Figure 1

Figure 1

Figure 1: Blueprint generation workflow without LeanArchitect, highlighting duplicated manual specification and synchronization challenges.

Blueprint Attribute Annotation and Dependency Inference

The blueprint attribute encapsulates rich options for node annotation, encompassing field-level controls for LaTeX label (latexLabel), statement/proof documentation, dependency overrides (uses, proofUses, excludes), project state flags (notReady, discussion), and export controls (latexEnv, title). This metadata is collated at export time into blueprint nodes, whose dependency edges are determined in parallel:

  • For statements: all blueprint-tagged constants transitively referenced in the type/body.
  • For proofs: constants used specifically in the proof term.
  • Status: Infers leanok/mathlibok via detection of unresolved axioms or Mathlib sources.

The exported LaTeX markup, via macros such as \inputleannode, uses this metadata to insert the synchronized node representation into the blueprint document, directly embedding Lean proof status and dependencies (and dynamically propagating changes in the formalization to the readable plan).

Practical Usage and Empirical Validation

LeanArchitect’s effectiveness is demonstrated at both micro (conversion of individual projects) and macro scales (case studies of extensive community projects):

  1. PrimeNumberTheoremAnd Migration: A major collaborative formalization migrated from a custom leanblueprint-extract workflow to full LeanArchitect usage with minimal manual intervention, leveraging automated annotation conversion and synchronization. The maintainers report immediate reduction in metadata maintenance, systematic elimination of blueprint drift, and improved project management interfaces. Importantly, LeanArchitect surfaced latent semantic errors in the human-authored blueprints by virtue of its inferential dependency model.
  2. Batch Project Conversion: Bulk migration scripts were successfully deployed on Carleson, Brownian Motion, Infinity Cosmos, and Fermat’s Last Theorem blueprints, automatically harmonizing informal and formal layers and exposing oversights such as unused nodes, incomplete dependency annotation, and disjointed Mathlib status marking.
  3. Autoformalization Pipeline Case Study: LeanArchitect enabled a blueprint-structured human-AI loop for the formalization of a multivariate Taylor theorem. Here, GPT-5 Pro generated a draft proof outline and Lean module annotated with blueprint attributes; Aristotle auto-filled (discharged) tractable proof subgoals; human intervention addressed irreparable failures. The transparent progress visualization and compositional localizability of failures, attributable to blueprint structure, facilitated efficient division of labor and error debugging. Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2: A visualization of the initial blueprint as produced from an AI-generated Lean module by GPT-5 Pro.

Figure 3

Figure 3

Figure 3: A graphical representation of the generated blueprint document showing the automatically inferred dependencies and proof statuses.

Technical Implementation and Features

LeanArchitect internals are realized as a Lean 4 library, accessible via standard Lake extensibility. The key implementation elements include:

  • Node and NodePart Structures: Each @[blueprint] declaration produces a Node object encapsulating formal/informal text, dependency annotation, LaTeX export options, and project metadata.
  • Persistent Environment Extension: Registered blueprint nodes are globally indexed and efficiently available for dependency graph construction and export.
  • Attribute Syntax and Tactic Instrumentation: The attribute supports rich configuration; docstring-overloaded tactics (and sorry_using) further aid proof metadata collation.
  • Export Mechanism: Blueprint LaTeX artifacts are deterministically generated per module, supporting incremental, reproducible builds, and macro-based import for composite blueprints.
  • Conversion Automation: Scripts enable rapid assimilation of legacy blueprints with customizable heuristics for docstring migration, Mathlib tagging, and incremental updates.

Implications and Future Directions

LeanArchitect significantly lowers the boundary for blueprint-driven project management in formal mathematics and substantially improves the interface for partial automation—enabling granular progress tracking, rigorous dependency maintenance, and robust integration of AI-based provers in live distributed collaboration. By exposing the full blueprint structure and status directly in Lean, it transforms the workflow from anecdotal synchronization to programmatic, up-to-date reflection of both informal exposition and mechanized development.

Strong empirical claims are substantiated by its successful retrofitting of large, collaborative projects, automatic detection and correction of blueprint inconsistencies, and measurable reduction in authoring and maintenance cost. Additionally, the system disambiguates human and automation boundaries via actionable, dependency-localized feedback, directly supporting modern neural theorem proving pipelines. No prior system provides such tight linkage between blueprint modeling, proof automation, and human collaboration in the Lean ecosystem.

Nevertheless, current ergonomic limitations around LaTeX editing in Lean IDEs, handling of many-to-many declaration-node relationships, and increased build pipeline complexity are acknowledged bottlenecks for maximal adoption. These reflect tooling and infrastructure challenges rather than intrinsic barriers of the LeanArchitect blueprint model.

Conclusion

LeanArchitect introduces a principled, annotation-driven infrastructure for blueprint generation in Lean, automating inference of dependency graphs, synchronization of proof status, and structured LaTeX export. Empirical results in both individual and large-scale projects confirm substantial improvements in maintainability, accuracy, and human-AI collaborative workflows. The system sets a new standard for integrating formal project management, dependency-tracking, and automated theorem proving within interactive theorem proving environments, and presents a compelling foundation for future advances in blueprint-driven formalization ecosystems.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 14 likes about this paper.