Yanasse: Finding New Proofs from Deep Vision's Analogies, Part 1

Published 19 Apr 2026 in cs.AI | (2604.17229v1)

Abstract: Project Yanasse presents a method for discovering new proofs of theorems in one area of mathematics by transferring proof strategy patterns (e.g., Lean 4 tactic invocation patterns) from a structurally distant area. The system extracts tactic usage distributions across 27 top-level areas of Mathlib (217,133 proof states), computes z-scores to identify tactics that are heavily used in a source area but rare or absent in a target area, matches source and target proof states via GPU-accelerated NP-hard analogy (running on a MacBook Air via Apple's MPS backend), and then asks an AI reasoning agent to semantically adapt--not symbol-substitute--the source tactics invocation pattern to the target theorem. In this first part of the study, the method is applied to the pair Probability -> Representation Theory, producing 4 Lean-verified new proofs out of 10 attempts (40%). The proofs compile with zero sorry declarations. The key finding is that tactic schemas decompose into a head (domain-gated, rarely transfers) and a modifier (domain-general, often transfers): filter upwards's head fails in representation theory (no Filter structure), but its [LIST] with ω modifier transfers cleanly as ext1 + simp [LIST] + rfl. Crucially, the underlying matching engine--deep vision lib.py--is entirely domain independent: the same optimization code for an NP-hard matching that matches chess positions by analogy matches Lean proof states by analogy, without knowing which domain it is processing. Only a relation extractor is domain-specific.

Abstract PDF Upgrade to Chat

Authors (1)

Alexandre Linhares

Summary

The paper presents a computational pipeline that transfers formal proof tactics between diverse math domains using relational analogies.
It employs GPU-accelerated NP-hard analogy matching and a semantic adaptation loop to achieve a 40% success rate with Lean-verified proofs.
The study establishes a taxonomy of tactic transferability, distinguishing between domain-gated, domain-general, and homogeneity-sensitive strategies.

Cross-Area Tactic Transfer via Deep Relational Analogy: An Analysis of Yanasse

Overview and Methodological Framework

The paper "Yanasse: Finding New Proofs from Deep Vision's Analogies, Part 1" (2604.17229) introduces a computational pipeline for discovering new proofs in formalized mathematics by transferring tactic usage patterns across distant mathematical domains. Leveraging the Deep Vision framework operating at the relational-analogy level, the system extracts and analyzes tactic schemas from Mathlib's Lean proofs, identifies firm statistical anomalies in tactic distribution across 27 mathematics areas, and matches proof states structurally via GPU-accelerated NP-hard analogy matching. An AI reasoner is employed in a semantic adaptation loop, converting the transferred tactic's operational intent to the new context.

Application of this methodology to the domain pair Probability Theory $\rightarrow$ Representation Theory yielded 4 Lean-verified alternative proofs out of 10 schema transfer attempts (40% success). This marks a quantifiable baseline for the feasibility of cross-area tactic transfer, systematically mapping the efficacy and limitations of analogical reasoning in formal mathematics.

Proof State Representation and Analogical Matching

At the heart of the approach is the domain-independent, relation-centric Deep Vision matcher, previously developed for high-level cognitive tasks such as chess analogy-making and ARC-AGI problem solving. In the context of Lean, proof states are encoded as relational networks—entities are hypotheses, goals, and types; relations comprise rewrite, head-match, equality, and more (14 relation types). The matcher is agnostic to domain features; only the relation extractor encodes domain-specific concepts.

For each candidate transfer, the matcher computes analogy scores between source and target proof states using GPU-accelerated augment-swap heuristics to solve the underlying NP-hard entity correspondence maximization in full tensorized fashion on commodity hardware. The experiment demonstrates that this domain-general optimization engine can detect deep, structural commonalities in mathematical proof states, analogous to its performance on chess positions and cognitive benchmarks.

Extraction, Ranking, and Adaptation Pipeline

The process unfolds in several formal steps:

Distribution Extraction: Parse and extract tactic usage schemas—characterized as (head, arity, modifier, lemma usage)—from Mathlib's Lean corpus (over 210k proof states).
Statistical Ranking: Compute $z$ -scores for each (area, schema) pair, isolating schemas that are statistical outliers (overrepresented in source area, underrepresented or absent in target area). These form transfer candidates.
Analogy Matching: For each transfer candidate, source proofs are structurally matched to all target proof states via the Deep Vision matcher.
Semantic Adaptation: An AI reasoning agent interprets the source proof and its intent, analyzes the matched target theorem, and constructs an adapted Lean tactic sequence addressing genuine mathematical analogs (not by direct symbol substitution).
Verification: The adapted tactic is tested for Lean-verified closure, ensuring no ‘sorry’ declarations remain.

This methodology ensures that only nontrivial, unrepresented proof strategies are proposed for transfer, and that proof closure in the target domain is verifiable.

Key Results: Transferred Proof Patterns and Schematic Decomposition

The four successful alternative proofs illustrate the power and limitations of schematic tactic transfer:

Filter_upwards $\to$ ext1 + simp [LIST] + rfl: The core of Probability’s filter-based combining of hypotheses maps to extensional and simplification tactics in Representation Theory, confirming structural analogies at the level of per-element reduction.
congr with variable $\to$ span_le.2 + rintro: The agreement of pointwise congruence in measurability contexts matches to spanning arguments and explicit introduction in module-theoretic settings.
any_goals $\to$ any_goals rfl: The domain-general dispatch tactic for handling trivial subgoals translates directly, closing algebraically trivial cases in complex targets.
by_cases $\to$ case split on morphism: The “case split” methodology applies, even when the dichotomy is mathematically unnecessary in the target (e.g., splitting on morphism zero/nonzero), reflecting the domain independence of such proof strategies.

The (head, modifier) decomposition of tactic schemas emerges as a core theoretical insight: heads (filter_upwards, congr, lift) are often domain-gated—requiring specific goal shapes or type-class instances—and thus resistant to direct transfer. Modifiers (with-clauses, combinatoric arguments) encode structural reasoning patterns that are domain-general and more transferable. This finding is robustly supported by empirical outcomes and rejection diagnoses.

Empirical Taxonomy of Schematic Transferability

The observed outcomes motivate a threefold taxonomy:

Domain-Gated Heads: Tactics requiring domain-specific goal structures or instances (e.g., filter_upwards, lift, measurability). Modifiers may salvage transferability, but heads rarely port directly.
Domain-General Combinators: Tactics such as any_goals, by_cases, and in some cases congr, which operate at a structural or syntactic level, exhibit strong transfer potential.
Homogeneity-Sensitive Combinators: (all_goals) require uniform sub-goal shapes and fail when the proof context generates heterogeneous goals.

Notably, failures are driven by intrinsic incompatibility in the category-theoretic structure or by missing instantiations in the Lean proof environment.

Diagnostics of Negative Results

The six non-transferring schema attempts are cogently analyzed, with each failure mapped to a principled cause—domain mismatch, lack of required type-class instances, structural incompatibility in extensionality, or lack of operator harmonization. The why-reports embody a rigorous approach to negative result documentation, moving beyond surface error logs to substantive mathematical diagnosis.

Comparison with Neural Approaches and Practical Considerations

The Deep Vision pipeline is contrasted with deep neural proof tactics (e.g., Lean Copilot, AlphaProof), emphasizing explainability, resource efficiency, and adaptability. As a symbolic, relation-driven engine, Deep Vision enables full introspection into analogical mappings, efficient operation on commodity hardware, and dynamic, emergent representations—features not matched by static, data-hungry deep learning paradigms.

Resource consumption is modest: analogy matching was completed in minutes on a laptop GPU, and the full 10-proof pipeline is achievable for sub-cloud cost. The system’s design suggests scalability and accessibility for broad adoption in mathematical and formal reasoning communities.

Theoretical and Practical Implications

The practical implication is that nontrivial alternative proofs across domain boundaries can be systematically discovered using structural analogy, not just statistical or language-model-based next-step prediction. This validates the cognitive-historic thesis that analogical reasoning—augmented by computation—can facilitate cross-pollination across mathematical domains. Theoretically, the demonstrable (head, modifier) separation points to a deep structure in mathematical reasoning, ripe for further formalization and exploitation.

From a research trajectory perspective, the work signals several promising avenues:

Refinement of schema parsing to maximize modifier extraction and quantify transferability.
Broadening to additional (source, target) area pairs to map the geometry of tactic transfer more fully.
Targeting conjectures and unproved statements to probe creative mathematical generation.
Investigating bidirectionality and symmetry of transfer between areas.

Conclusion

"Yanasse: Finding New Proofs from Deep Vision's Analogies" provides a substantive, computationally grounded demonstration that relational analogy is a potent mechanism for transferring mathematical proof strategy across structurally distant domains. Its results clarify the practical limits and theoretical underpinnings of such transfers. The findings argue for the future of explainable, relation-driven AI systems—capable of both supplementing and formalizing analogical reasoning—in mathematical discovery and automated reasoning. The anticipated extensions suggest a rapidly evolving intersection of formal methods, cognitive science, and AI, with broad implications for both research and mathematical practice.

Markdown Report Issue