Papers
Topics
Authors
Recent
Search
2000 character limit reached

TheoremGraph: Formalizing Mathematical Proofs

Updated 1 July 2026
  • TheoremGraph is a framework that represents mathematical statements and proofs as nodes and edges to capture deduction, citation, and dependency relationships.
  • It integrates various graph models—from fine-grained logical DAGs to large-scale citation networks—to support both automated theorem proving and research evaluation.
  • It employs neural embeddings and graph neural networks alongside PageRank-style metrics to enhance retrieval accuracy and cross-formal matching.

A TheoremGraph is a mathematical or computational structure in which the nodes represent mathematical statements (such as theorems, definitions, lemmas, or geometric objects) and the edges encode relations of dependency, deduction, or reference among these statements. The TheoremGraph paradigm formalizes proofs, mathematical knowledge, and the flow of ideas as graphs, providing a unified perspective for formal logic, geometry, algebraic/analytic dependencies, and even research evaluation. Over the last decade, TheoremGraph frameworks have been instantiated at levels ranging from fine-grained logical DAGs for higher-order logic, through object–inference graphs for geometry, to large-scale statement citation graphs across formal and informal mathematics, and hierarchical meta-graphs for impact analysis.

1. Mathematical and Logical Foundations

A TheoremGraph arises wherever mathematical results, their proofs, and their logical or inferential connections are cast as explicit graph structures. In logical and combinatorial form, a TheoremGraph G=(V,E)G = (V,E) may encode:

  • Nodes VV: Statements (theorems, lemmas, definitions), subformulas (subterms in logic), or geometric/mathematical entities (points, lines, segments).
  • Edges EE: Deductive dependencies (a proof of uu uses vv), applications of rules or axioms, citations, or relations in geometry.

In synthetic graph-theoretic settings, TheoremGraph also refers to a finite simple graph endowed with structures such as orientations, cliques, and vertex/edge data, so that discrete geometric or topological theorems (e.g., Gauss–Bonnet, Green–Stokes, Poincaré–Hopf) can be formulated and proved using only combinatorial data (Knill, 2012).

This general abstraction underlies modern approaches to automated theorem proving, mathematical search, structural meta-analyses, and networks of proofs in both formal and informal mathematics.

2. Statement-Level Dependency Graphs: Informal and Formal Mathematics

TheoremGraph infrastructure at scale—particularly as implemented in "TheoremGraph: Bridging Formal and Informal Mathematics"—organizes mathematical knowledge as a unified statement-level dependency graph across arXiv and major formal libraries (Kurgan et al., 24 Jun 2026). The two principal subgraphs are:

  • Informal graph: Extracted from ≈11.75 million theorem-like environments in arXiv LaTeX sources, with nodes representing theorems/lemmas/definitions, and ≈18.32 million directed dependencies extracted using deterministic (LaTeX refs and citations), heuristic (textual/discourse cues), and notation-tracking extractors. Edges are labeled by extractor provenance to allow precision–recall tradeoffs. Extractor precision is 68.1% overall and 98.8% for deterministic links.
  • Formal graph (LeanGraph): Extracted at the elaborator/kernel level of Lean 4 from 25 projects, yielding 388,105 declaration nodes and 11.34 million typed edges in six categories (extends, field, sig, proof, def, docref). Only user-facing constants are retained.

Cross-formality matching is realized by generating a "slogan" (one-sentence NL summary) for each statement and embedding these using a LLM (Qwen3-Embedding-8B) into R4096\mathbb{R}^{4096}, then matching by cosine similarity. An LLM judge (GPT-5.4) verifies matches above a cosine threshold; exact or semantically-close (inexact) matches span 47,952 cross-formality links at cos ≥ 0.8, with acceptance rising to 96% for cos ≥ 0.95.

This unified TheoremGraph enables semantic search, retrieval, and cross-pollination across the divide between informal mathematics and formal proof libraries.

3. TheoremGraphs in Logic and Automated Theorem Proving

For logic and higher-order theorem proving, TheoremGraph-style representations formalize formulas as directed, labeled graphs in which nodes represent subterm occurrences (types, variables, constants, applications, binders) and edges encode syntactic structure, argument order, variable binding, and subexpression sharing (Paliwal et al., 2019). The construction is as follows:

  • Nodes: Each subterm occurrence; labeled by type constructor, operator, or variable name.
  • Edges: Parent–child relationships derived from the abstract syntax tree (AST); edges labeled by child index; reverse edges introduced for bidirectionality; optional edges for variable binding or random connections.
  • Subexpression sharing: Nodes for identical subterms (same token and children) are merged, reducing the graph from a tree to a DAG.
  • Variable blinding: Optionally anonymizes variable names post-construction.

Such logical TheoremGraphs are embedded using TT-step message-passing GNNs, with bidirectional aggregation (parents ↔\leftrightarrow children) and subexpression sharing crucial to representation power. In HOList experiments, a 12-hop subexpression-sharing GNN achieves 49.95% proofs closed, outperforming non-sharing and non-bidirectional variants (Paliwal et al., 2019).

4. TheoremGraphs in Geometry: Automated Proof Graphs

In geometric ATP, as formalized in GraATP, TheoremGraphs encode geometric objects (points, lines, circles), measures (length, angle, ratio), and boolean relations as nodes; labeled edges are derivations via geometric inference rules (e.g., Pythagoras, Similar Triangles, Parallelism). This enables construction and traversal of a proof graph:

  • Initialization: Parameter set PP of free objects and set RR of goal nodes.
  • Expansion: New nodes are generated by applying geometric rules to existing nodes, creating labeled edges for each inference.
  • Proof extraction: When all goal nodes are reachable from VV0, a topological sort yields a proof order matching human strategies.
  • Complexity: Worst-case node count is exponential, but heuristic pruning (tracking only "dependency frontier" nodes) enables efficient proofs for many Olympiad problems (Mahmud et al., 2014).

This framework unifies algebraic and geometric reasoning, accommodates human-readable proof scripts, and can be extended to richer geometric settings through additional node and rule types.

5. Hierarchical and Citation-Based TheoremGraphs for Research Evaluation

Connected Theorems (Ju et al., 25 Aug 2025) generalizes TheoremGraphs into hierarchical graphs that span theorems, papers, and mathematical fields, to data-mine influence and impact:

Node Type Description Example
Theorem Individual statement Lemma 3.2
Paper Document containing thms xxx
Field Math subject area PDE

Edges are directed citations (theorem–theorem, paper–paper, field–field) or undirected membership (theorem-in-paper, paper-in-field). Edge weights are adjusted for citation type and shared authorship. PageRank-style propagation computes influence scores iteratively, with cross-level coupling. Influence propagates both via graph structure and up/down the hierarchy. Field rankings, theorem and paper centralities, and inter-field influence metrics quantify the evolution and interactions of mathematical subfields.

Temporal slicing enables dynamic field rankings and cross-field impact analysis; for example, PDE and Probability consistently rise in influence, while clusters such as (Algebra, AlgGeom, DiffGeom, Topology) emerge, and directional impact is captured (e.g., Dynamical Systems VV1 PDE stronger than reverse).

6. TheoremGraph as a Combinatorial Playground for Mathematical Theorems

In a distinctive approach, TheoremGraph is used as a purely combinatorial model for foundational theorems of topology and geometry within graph theory (Knill, 2012). In this setting, a finite simple graph becomes the carrier for discrete analogues of:

  • Gauss–Bonnet: Vertex curvature VV2, defined by clique counts in unit spheres, sums to the graph Euler characteristic.
  • Poincaré–Hopf: The index of a function at each vertex (counting local exit-set topology) likewise sums to Euler characteristic.
  • Green–Stokes: The graph-theoretic boundary and exterior derivative satisfy an exact finite-sum form of Stokes' theorem.

This perspective illustrates that essential ideas of geometry, curvature, and index can be fully realized through TheoremGraphs: a single finite graph, its cliques and signatures, and finite sums of local invariants.

7. Infrastructure, Interfaces, and Retrieval Evaluation

Modern TheoremGraph infrastructure supports search, attribution, and retrieval-augmented reasoning:

  • Datasets and API: Statement archives with edge provenance, slogans, embeddings; LeanGraph with declaration types, edges, and docstrings; HTTP API endpoints for retrieval, embedding, dependency calls, and cross-formality matching (Kurgan et al., 24 Jun 2026).
  • Editor/agent interfaces: Integration into VS Code/Lean4 via the Math Content Protocol; commands support direct search, dependency listing, and informal–formal matching within proof engineering workflows.
  • Retrieval metrics: Formal→informal blueprint recovery achieves Hit@1 = 43.5%, Hit@10 = 69.9% (MRR = 52.5%); concept retrieval (MathlibQR fair-810) yields Recall@10 = 0.775, equaling LeanSearch v2's reranked Recall@10 = 0.780, without neural reranking.

A plausible implication is that TheoremGraph's retrieval accuracy and infrastructure increasingly ground both mathematical knowledge graph curation and retrieval-augmented LLM systems for proof synthesis, search, and mathematical agent applications.


References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TheoremGraph.