SITA: Structure-to-Instance Autoformalization
- The paper introduces SITA, an automated framework that transforms abstract mathematical structures into concrete Lean instances via template-based formalization.
- SITA leverages LLM-generated code skeletons and iterative error-fix procedures to ensure both syntactic correctness and semantic faithfulness in formal proofs.
- By integrating Lean’s typeclass mechanism and structural fidelity as seen in ProofFlow and SITA-R1, the framework enhances throughput and accuracy in formalized theorem instantiation.
Structure-to-Instance Theorem Autoformalization (SITA) is an automated framework designed for rigorously formalizing the instantiation of abstract mathematical theories in concrete settings, specifically within the Lean proof assistant. SITA transforms template-like formal modules—consisting of definitions, assumptions, operations, and theorems—into verified formalizations of problem-specific instances, capitalizing on both LLMs and feedback-rich refinement procedures to achieve syntactic and semantic correctness. Different instantiations of SITA have emerged, notably with ProofFlow for structural fidelity in stepwise proof autoformalization (Cabral et al., 13 Oct 2025) and the generic SITA-R1 pipeline for algorithmic theorem instantiation in optimization (Li et al., 13 Nov 2025).
1. Formalization of Abstract Structures as Templates
Mathematical theories often package “structure” as modular templates that can be reused and instantiated. In SITA, an abstract structure is encoded as a four-tuple: where:
- (“Definitions”): primitive objects and axioms,
- (“Operations”): algorithms or maps built on ,
- (“Conditions”): assumptions (e.g. convexity, Lipschitz regularity),
- (“Theorems”): propositions derived under .
In Lean, these are encoded as type classes. For example, a gradient descent convergence theorem on a composite objective is implemented as:
1 2 3 |
class composite_pro (f h : E → ℝ) class pg (pro : composite_pro f h) (x₀ : E) theorem pg_converge ... := by sorry |
composite_pro, through pg, while resides in pg_converge, with the assumptions encoded as theorem hypotheses.
2. Instance Generation and Typeclass Integration
SITA operationalizes the instantiation process by prompting an LLM to output Lean definitions for the concrete problem, its parameterization, instance declarations linking it to the abstract template, and all algorithmic constructs. For example, instantiating to Lasso yields:
1 2 3 4 5 |
class Lasso_pro (A : Matrix ...) (b : Fin m → ℝ) (μ : ℝ) ...
def Lasso_pro.f ... -- squared error
def Lasso_pro.g ... -- %%%%10%%%% term
instance (pro : Lasso_pro ...) : composite_pro pro.f pro.g := {}
instance (alg : pg_Lasso pro x₀) : pg ... := ... |
pg_converge) are applicable to these instances provided all conditions are discharged. Verified instantiation requires proof-of-assumption lemmas for each side condition, such as convexity and differentiability:
1 2 |
lemma Lasso_pro.ConvexOn_f (pro : Lasso_pro ...) : ConvexOn ℝ univ pro.f := ... lemma Lasso_pro.Lipschitz_f ... |
3. LLM-Based Autoformalization and Feedback Refinement
The SITA pipeline proceeds through structured stages:
- Skeleton Construction: LLM generates outline code from the template and instance description.
- Error-Fix and Proof Refinement: Deterministic syntax corrections, cache-driven error repair (using an error-message–fix knowledge base ), and iterative proof synthesis for goals with
sorry, cycling until type-checking and correctness are achieved. - Postprocessing: Remaining
sorryare replaced with placeholders or aligned with minimal well-typed stubs, and natural-language back-translation is produced for documentation.
High-level pseudocode for the workflow:
1 2 3 4 5 6 |
Generate skeleton via LLM Lean.check Apply ErrorFix For each sorry: LLMProof, Lean.check_proof (loop with retry) Final type-check Return file |
4. Structure-Preserving Proof Mapping: ProofFlow and Beyond
The SITA philosophy is also realized in ProofFlow (Cabral et al., 13 Oct 2025), which emphasizes structural fidelity in stepwise proof autoformalization. It models the proof as a directed acyclic graph (DAG)
where nodes consist of theorem conditions , definitions , intermediate lemmas , and theorem solutions , and edges encode logical dependencies. Each natural language step maps to an intermediate lemma with dependencies : with enforced self-containment and faithfulness, realized through targeted LLM prompting and iterative type-checking. The structure ensures each lemma’s context is minimal yet sufficient, reducing search complexity and preventing shortcut tactics.
5. Evaluation Metrics and Benchmarking
Outcomes of SITA autoformalization are measured by multi-axis metrics. Notably in ProofFlow:
- Syntactic correctness : binary compilation success per node,
- Semantic faithfulness : LLM-judged preservation of meaning,
- Structural fidelity : correctness of dependency structure,
with a composite metric: which rewards full alignment of all three criteria. In (Cabral et al., 13 Oct 2025), ProofFlow achieves —substantially improving on full-proof (0.123) and step-proof (0.072) baselines. In SITA-R1 (Li et al., 13 Nov 2025), rates of file-level success (57.14%) and definition/theorem/instance correctness (>93%) materially surpass direct generation, with ablation studies confirming the necessity of error repair and iterative refinement.
| System | ProofScore | File-level Success (%) | Def/Thm/Instance (%) |
|---|---|---|---|
| ProofFlow DAG | 0.545 | 37.5 | 93.9 / N/A / N/A |
| ProofFlow noDAG | 0.417 | 35.3 | N/A |
| Full-Proof | 0.123 | 14.1 | N/A |
| SITA-R1 | N/A | 57.14 | 93.8/95.6/95.4 |
A plausible implication is that structural decomposition and template-based generation offer critical advantages for both correctness and coverage.
6. Limitations, Challenges, and Prospects
Current SITA implementations face several technical barriers:
- Complex type coercions: LLMs struggle with advanced Lean type translations (e.g., matrices to
ContinuousLinearMap). - Hard side-conditions: Properties like the Kurdyka–Łojasiewicz inequality require symbolic reasoning typically infeasible for end-to-end generation.
- Scaling with proof complexity: Large DAGs or templates with intricate sublemma hierarchies tax prompting and topological reasoning.
- Semantic slips: Over 35% of step failures in ProofFlow are due to misinterpretation of dependencies or missed assumptions.
Potential avenues for strengthening SITA include reinforcement learning for semantic preservation, integrated tactic/proof search, human-in-the-loop disambiguation, broader template libraries, and expansion to domains beyond convex optimization (e.g., graph theory, algebraic structures). Extending the error knowledge base through continual learning and augmenting with domain-specific heuristics is anticipated to advance full-file success rates above 80%.
7. Significance and Comparative Context
SITA distinguishes itself from tactic-first and linear step-proof strategies by maintaining both minimal and precise context per lemma, systematically enforcing structural fidelity. Template instantiation enables high-throughput derivation of verified concrete theorems, reducing manual intervention in large families of problems—a prevalent task in both mathematical research and formal-methods software engineering. SITA’s pipeline demonstrates scalable integration of LLM automation with interactive theorem proving, setting a reference model for future automated formalization frameworks.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free