Papers
Topics
Authors
Recent
Search
2000 character limit reached

Topological Divergence (TOHA): A Unified TDA Framework

Updated 26 February 2026
  • TOHA is a framework that converts multiscale topological features into divergence scores via barcode intervals from persistent homology.
  • It applies a cross-filtration strategy to capture both global and local structural differences in contexts like LLM attention and TSP optimization.
  • Its stability, localization, and scalability make TOHA effective for integrating robust diagnostics and performance improvements in various pipelines.

Topological Divergence (TOHA) quantifies dissimilarity in high-dimensional data structures or discrete objects by translating multiscale shape differences into algebraic invariants derived from persistent homology. The TOHA paradigm generalizes across combinatorial optimization, generative modeling evaluation, neural representation comparison, scalar field analysis, and graph-based attention analysis. Across these domains, the central object is a divergence score computable from barcode intervals—birth and death times of topological features—built from filtered graphs or complexes that capture the joint or relative structure of two inputs. The resulting divergence is sensitive to both global and local geometric defects, is stable with respect to perturbations, and can be efficiently integrated into learning, optimization, or diagnosis pipelines.

1. Mathematical Basis and General Formalism

TOHA and related divergences build upon persistent homology and barcodes, which summarize the evolution of homological features across a filtration—a monotonic sequence of subgraphs, simplicial, or cubical complexes parametrized by a threshold applied to edge weights or scalar function levels. The comparison of two objects (graphs, point clouds, functions, matrices) proceeds by constructing a joint or cross-filtered complex that intertwines their individual structures, then extracting the persistent homology intervals (barcodes) unique to this joint complex.

The classical persistent homology setup for a weighted undirected graph G=(V,E,w)G=(V,E,w) uses the Vietoris–Rips filtration:

  • For threshold α\alpha, consider Rα(G)R_\alpha(G), the simplicial complex containing all subsets of vertices whose edges have wijαw_{ij}\le\alpha.
  • Homological features (clusters, loops, cavities) “appear” (are born) and “disappear” (die) as α\alpha increases.

TOHA-type metrics construct an auxiliary or "cross" filtration—e.g., by collapsing distances within one set, taking minima over two functions, or zeroing intra-subset edges—and form barcodes whose intervals (bj,dj)(b_j, d_j) correspond to topological features representing differences between the structures. The final divergence is typically a pp-sum over bar lengths jdjbjp\sum_j |d_j-b_j|^p, and variants include directionality, localization, or edgewise decomposition, depending on the application.

2. Algorithmic Instantiations

2.1. Attention Graph Divergence for LLMs

In the context of LLMs, TOHA measures divergence between prompt and response subgraphs of attention matrices:

  • Convert attention weights WW to pseudo-distances dij=1Wijd_{ij}=1-W_{ij}.
  • Partition VV into prompt PP and response RR.
  • Form a complete graph with weights zeroed for all intra-prompt edges.
  • Compute the H0H_0 cross-barcode via the Vietoris–Rips filtration on the modified graph.
  • The TOHA divergence score is DTOHA(R,P)=[bj,dj](djbj)D_\mathrm{TOHA}(R,P) = \sum_{[b_j,d_j]} (d_j-b_j), equivalent to the total edge weight of the minimum spanning forest (MSF) connecting RR to PP.

This divergence is used as an indicator of hallucination: grounded responses yield low TOHA (tight RRPP attachments via strong attention), while hallucinated responses require traversing weak prompt–response edges, yielding a high TOHA score. Empirical analysis reveals that a small number of attention heads selected by their TOHA discrimination power suffice for robust detection, with state-of-the-art inference speed and competitive accuracy to strong baselines (Bazarova et al., 14 Apr 2025).

2.2. Edge-wise Divergence in Combinatorial Optimization

In TSP optimization, TOHA measures the difference between a candidate tour and the minimum spanning tree (MST):

  • Let G=(V,E)G=(V,E) be the complete graph. Let TtourT_\text{tour} be the candidate tour, TmstT_\text{mst} the MST.
  • The classical length gap L(Ttour)L(Tmst)L(T_\text{tour})-L(T_\text{mst}) is refined by decomposing the gap into edge-level topological divergence Δ(e)\Delta(e).
  • The canonical decomposition theorem establishes a bijection φ:E(Tmst)E(Ttour){emax}\varphi: E(T_\text{mst}) \to E(T_\text{tour}) \setminus\{e_\text{max}\} so that Δ(e)=w(φ(e))w(e)\Delta(e) = w(\varphi(e)) - w(e).
  • Each Δ(e)\Delta(e) corresponds to the length of a bar in the RTD-Lite barcode, summing to the global tour–MST gap.

Algorithmically, compute the RTD-Lite barcode by constructing the union graph with edge weights min(wtour,wmst)\min(w_\text{tour}, w_\text{mst}) and extract the bar lengths via an MST. In topology-guided 2-opt and 3-opt heuristics, edges with highest Δ(e)\Delta(e) are prioritized for removal, leading to improved solution quality and faster convergence (Trofimov et al., 16 Dec 2025).

2.3. Cross-Barcode Framework for Manifold and Representation Comparison

The Cross-Barcode mechanism underlies manifold and neural representation topology divergence:

  • Cross-Barcode(P,Q)(P,Q): take P,QRDP,Q \subset \mathbb{R}^D, set QQQQ distances to zero, and construct the Vietoris–Rips filtration on PQP \cup Q.
  • The resulting barcodes encode where PP and QQ fail to align in homology.
  • Manifold Topology Divergence (MTop-Div): average the sum of bar lengths of H1H_1 intervals in Cross-Barcode1(P,Q)_1(P,Q) over random subsamples.
  • Representation Topology Divergence (RTD): follows a similar construction but is adapted to compare neural network activations, with normalization and random subsampling for scalability and robustness to dimension (Barannikov et al., 2021, Barannikov et al., 2021).

These methods offer domain-agnostic, scalable, and stable alternatives to conventional alignment and quality metrics and reveal finer-grained discrepancies in deep generative models and learned representations.

2.4. Scalar Function Topology Divergence (SFTD)

SFTD extends TOHA to scalar fields on graphs or regular lattices:

  • Let f,g:XRf,g: X\to\mathbb{R} be scalar functions on domain XX (graph/lattice vertices).
  • Construct a doubled domain X~\widetilde X with values encoding f(i)f(i), g(i)g(i), and min(f(i),g(i))\min(f(i),g(i)).
  • Compute the F-Cross-Barcodek(f,g)_k(f,g) using a suitable persistent homology algorithm (simplicial/cubical).
  • SFTD is the total pp-power sum of bar lengths in this barcode; intervals are localized to the part of XX responsible for the difference.

SFTD is stable (bounded by sup-norm perturbations), differentiable, and enables both global and spatially localized topology-aware losses or analyses in computer vision and segmentation (Trofimov et al., 2024).

3. Theoretical Properties and Guarantees

All principal instantiations of TOHA share key properties:

  • Stability: Divergences are bounded by input perturbations (e.g., changes in graph weights or function values), often via bottleneck or Wasserstein stability results on persistence diagrams.
  • Localization: Interval birth and death associates topological mismatch to explicit, localized structures (edges, nodes, spatial locations).
  • Non-negativity and Insensitivity to Trivial Changes: Divergence is zero if and only if the two inputs are topologically identical (structure, cluster, loop, etc.), and non-negative otherwise.
  • Scalability: Efficient barcode computation on up to 10410^4 points and/or high-dimensional data via core TDA libraries and block matrix constructions.

Symmetry may or may not be enforced—representation and manifold divergences allow for directed or averaged scores, capturing recall and precision analogues.

4. Implementation and Algorithmic Complexity

Implementation follows a common pipeline:

  • Compute appropriate pairwise distance (or pseudo-distance, as in attention graphs).
  • Build a joint distance matrix incorporating input-specific zeroing, minima, or block structure.
  • Construct the filtered complex using standard TDA libraries (Ripser++, GUDHI, giotto-ph).
  • Extract barcodes; process bar lengths accordingly.
  • For subsampled or batch-averaged metrics (e.g., RTD, MTop-Div), repeat over random subsamples for statistical robustness.

Complexity breakdown:

  • Pairwise distance computation: O(b2D)O(b^2 D) for bb-size batches and DD-dimensionality.
  • Vietoris–Rips barcode calculation: polynomial in batch size (typically O(b3)O(b^3) in the worst case), but fast in practice with GPU acceleration and sparse boundary matrices.
  • Bookkeeping and heuristics (for optimization): O(n2)O(n^2) per sweep for edgewise scoring in TSP heuristics, consistent with baseline local search costs.

Memory consumption is dominated by the augmented distance matrices or block representations, which can limit scalability in very large domains (especially for SFTD).

5. Practical Applications and Empirical Findings

5.1. Hallucination Detection in LLMs

TOHA applied to attention graphs yields rapid, unsupervised hallucination detection:

  • Outperforms or matches token-entropy and SelfCheckGPT on several RAG QA, summarization, and CoQA/SQuAD benchmarks (e.g., ROC AUC 0.883 on SQuAD for LLaMA-3.1-8B).
  • Inference speed is orders of magnitude faster than sampling-based baselines (\sim1.8 ms/sample versus 1,460 ms/sample for SelfCheckGPT).
  • Transferability: optimal heads and thresholds generalize across tasks with minimal supervision.
  • Only a few attention heads are required; the best heads show recurrence across models and tasks, and their selection is based on the separation in TOHA values across annotated grounded/hallucinated examples (Bazarova et al., 14 Apr 2025).

5.2. Topology-Guided TSP Heuristics

Edgewise TOHA decompositions drive guided heuristics:

  • 2-opt+RTDL yields $0.2$–1.5%1.5\% (random TSP) and $0.5$–1%1\% (TSPLIB) further length reductions over baseline 2-opt.
  • 2-opt+RTDL and 3-opt+RTDL avoid pathological convergence failures, speed up convergence by up to 5×5\times (large-scale neural tour initializations, e.g., n=10,000n=10,000).
  • DQN agents with reward shaped by Δ(e)-\Delta(e) yield 25–35% shorter tours with reduced variance (Trofimov et al., 16 Dec 2025).

5.3. GAN Evaluation and Representation Comparison

  • MTop-Div detects mode dropping, manifold mismatch, and distribution shifts more reliably and monotonically than FID, MMD, JSD, or GScore; aligns well with discriminative metrics and ground truth in image and 3D data (Barannikov et al., 2021).
  • RTD correlates with model disagreement, captures learning dynamics, and tracks transfer and interpretability with high empirical reliability across several datasets and modalities (Barannikov et al., 2021).

5.4. Shape Analysis and Computer Vision

  • SFTD as an additional loss in 3D cell reconstruction reduces IoU, volume, and surface errors; suppresses spurious topological defects persisting under Wasserstein-based losses.
  • SFTD heatmaps localize segmentation discrepancies in medical imaging, outperforming Betti matching losses in sensitive spatial detection (Trofimov et al., 2024).

6. Extensions, Relations, and Open Questions

TOHA’s formalism has inspired several related divergence notions:

  • Scalar field, manifold, and representation divergence all derive from the cross-barcode construction, with differences in how the joint filtration is defined and localized.
  • The minimum spanning forest viewpoint on H0H_0 barcodes provides a direct connection to classic graph-theoretic constructs and efficient optimization.
  • Potentially promising directions include joint analysis of sublevel and superlevel sets (multi-persistence), integration with GNNs for end-to-end training, and scalable approximate computation for massive volumetric or graph data.

Open questions include the full theoretical landscape of metric properties for directed divergences, the behavior of localization in complex settings (e.g., non-Euclidean domains), and the robustness of aligned barcodes in the presence of non-trivial matching ambiguities.


In summary, Topological Divergence (TOHA) and its variants provide a unified, stable, and scalable framework for quantifying shape discrepancy, guiding optimization, diagnosing learned representations, and detecting failure modes in modern data-intensive pipelines, with empirical and theoretical guarantees across combinatorial optimization, deep learning, and topological data analysis (Trofimov et al., 16 Dec 2025, Bazarova et al., 14 Apr 2025, Trofimov et al., 2024, Barannikov et al., 2021, Barannikov et al., 2021).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Topological Divergence (TOHA).