Topological Divergence (TOHA): A Unified TDA Framework

Updated 26 February 2026

TOHA is a framework that converts multiscale topological features into divergence scores via barcode intervals from persistent homology.
It applies a cross-filtration strategy to capture both global and local structural differences in contexts like LLM attention and TSP optimization.
Its stability, localization, and scalability make TOHA effective for integrating robust diagnostics and performance improvements in various pipelines.

Topological Divergence (TOHA) quantifies dissimilarity in high-dimensional data structures or discrete objects by translating multiscale shape differences into algebraic invariants derived from persistent homology. The TOHA paradigm generalizes across combinatorial optimization, generative modeling evaluation, neural representation comparison, scalar field analysis, and graph-based attention analysis. Across these domains, the central object is a divergence score computable from barcode intervals—birth and death times of topological features—built from filtered graphs or complexes that capture the joint or relative structure of two inputs. The resulting divergence is sensitive to both global and local geometric defects, is stable with respect to perturbations, and can be efficiently integrated into learning, optimization, or diagnosis pipelines.

1. Mathematical Basis and General Formalism

TOHA and related divergences build upon persistent homology and barcodes, which summarize the evolution of homological features across a filtration—a monotonic sequence of subgraphs, simplicial, or cubical complexes parametrized by a threshold applied to edge weights or scalar function levels. The comparison of two objects (graphs, point clouds, functions, matrices) proceeds by constructing a joint or cross-filtered complex that intertwines their individual structures, then extracting the persistent homology intervals (barcodes) unique to this joint complex.

The classical persistent homology setup for a weighted undirected graph $G=(V,E,w)$ uses the Vietoris–Rips filtration:

For threshold $\alpha$ , consider $R_\alpha(G)$ , the simplicial complex containing all subsets of vertices whose edges have $w_{ij}\le\alpha$ .
Homological features (clusters, loops, cavities) “appear” (are born) and “disappear” (die) as $\alpha$ increases.

TOHA-type metrics construct an auxiliary or "cross" filtration—e.g., by collapsing distances within one set, taking minima over two functions, or zeroing intra-subset edges—and form barcodes whose intervals $(b_j, d_j)$ correspond to topological features representing differences between the structures. The final divergence is typically a $p$ -sum over bar lengths $\sum_j |d_j-b_j|^p$ , and variants include directionality, localization, or edgewise decomposition, depending on the application.

2. Algorithmic Instantiations

2.1. Attention Graph Divergence for LLMs

In the context of LLMs, TOHA measures divergence between prompt and response subgraphs of attention matrices:

Convert attention weights $W$ to pseudo-distances $d_{ij}=1-W_{ij}$ .
Partition $V$ into prompt $P$ and response $R$ .
Form a complete graph with weights zeroed for all intra-prompt edges.
Compute the $H_0$ cross-barcode via the Vietoris–Rips filtration on the modified graph.
The TOHA divergence score is $D_\mathrm{TOHA}(R,P) = \sum_{[b_j,d_j]} (d_j-b_j)$ , equivalent to the total edge weight of the minimum spanning forest (MSF) connecting $R$ to $P$ .

This divergence is used as an indicator of hallucination: grounded responses yield low TOHA (tight $R$ – $P$ attachments via strong attention), while hallucinated responses require traversing weak prompt–response edges, yielding a high TOHA score. Empirical analysis reveals that a small number of attention heads selected by their TOHA discrimination power suffice for robust detection, with state-of-the-art inference speed and competitive accuracy to strong baselines (Bazarova et al., 14 Apr 2025).

2.2. Edge-wise Divergence in Combinatorial Optimization

In TSP optimization, TOHA measures the difference between a candidate tour and the minimum spanning tree (MST):

Let $G=(V,E)$ be the complete graph. Let $T_\text{tour}$ be the candidate tour, $T_\text{mst}$ the MST.
The classical length gap $L(T_\text{tour})-L(T_\text{mst})$ is refined by decomposing the gap into edge-level topological divergence $\Delta(e)$ .
The canonical decomposition theorem establishes a bijection $\varphi: E(T_\text{mst}) \to E(T_\text{tour}) \setminus\{e_\text{max}\}$ so that $\Delta(e) = w(\varphi(e)) - w(e)$ .
Each $\Delta(e)$ corresponds to the length of a bar in the RTD-Lite barcode, summing to the global tour–MST gap.

Algorithmically, compute the RTD-Lite barcode by constructing the union graph with edge weights $\min(w_\text{tour}, w_\text{mst})$ and extract the bar lengths via an MST. In topology-guided 2-opt and 3-opt heuristics, edges with highest $\Delta(e)$ are prioritized for removal, leading to improved solution quality and faster convergence (Trofimov et al., 16 Dec 2025).

2.3. Cross-Barcode Framework for Manifold and Representation Comparison

The Cross-Barcode mechanism underlies manifold and neural representation topology divergence:

Cross-Barcode $(P,Q)$ : take $P,Q \subset \mathbb{R}^D$ , set $Q$ – $Q$ distances to zero, and construct the Vietoris–Rips filtration on $P \cup Q$ .
The resulting barcodes encode where $P$ and $Q$ fail to align in homology.
Manifold Topology Divergence (MTop-Div): average the sum of bar lengths of $H_1$ intervals in Cross-Barcode $_1(P,Q)$ over random subsamples.
Representation Topology Divergence (RTD): follows a similar construction but is adapted to compare neural network activations, with normalization and random subsampling for scalability and robustness to dimension (Barannikov et al., 2021, Barannikov et al., 2021).

These methods offer domain-agnostic, scalable, and stable alternatives to conventional alignment and quality metrics and reveal finer-grained discrepancies in deep generative models and learned representations.

2.4. Scalar Function Topology Divergence (SFTD)

SFTD extends TOHA to scalar fields on graphs or regular lattices:

Let $f,g: X\to\mathbb{R}$ be scalar functions on domain $X$ (graph/lattice vertices).
Construct a doubled domain $\widetilde X$ with values encoding $f(i)$ , $g(i)$ , and $\min(f(i),g(i))$ .
Compute the F-Cross-Barcode $_k(f,g)$ using a suitable persistent homology algorithm (simplicial/cubical).
SFTD is the total $p$ -power sum of bar lengths in this barcode; intervals are localized to the part of $X$ responsible for the difference.

SFTD is stable (bounded by sup-norm perturbations), differentiable, and enables both global and spatially localized topology-aware losses or analyses in computer vision and segmentation (Trofimov et al., 2024).

3. Theoretical Properties and Guarantees

All principal instantiations of TOHA share key properties:

Stability: Divergences are bounded by input perturbations (e.g., changes in graph weights or function values), often via bottleneck or Wasserstein stability results on persistence diagrams.
Localization: Interval birth and death associates topological mismatch to explicit, localized structures (edges, nodes, spatial locations).
Non-negativity and Insensitivity to Trivial Changes: Divergence is zero if and only if the two inputs are topologically identical (structure, cluster, loop, etc.), and non-negative otherwise.
Scalability: Efficient barcode computation on up to $10^4$ points and/or high-dimensional data via core TDA libraries and block matrix constructions.

Symmetry may or may not be enforced—representation and manifold divergences allow for directed or averaged scores, capturing recall and precision analogues.

4. Implementation and Algorithmic Complexity

Implementation follows a common pipeline:

Compute appropriate pairwise distance (or pseudo-distance, as in attention graphs).
Build a joint distance matrix incorporating input-specific zeroing, minima, or block structure.
Construct the filtered complex using standard TDA libraries (Ripser++, GUDHI, giotto-ph).
Extract barcodes; process bar lengths accordingly.
For subsampled or batch-averaged metrics (e.g., RTD, MTop-Div), repeat over random subsamples for statistical robustness.

Complexity breakdown:

Pairwise distance computation: $O(b^2 D)$ for $b$ -size batches and $D$ -dimensionality.
Vietoris–Rips barcode calculation: polynomial in batch size (typically $O(b^3)$ in the worst case), but fast in practice with GPU acceleration and sparse boundary matrices.
Bookkeeping and heuristics (for optimization): $O(n^2)$ per sweep for edgewise scoring in TSP heuristics, consistent with baseline local search costs.

Memory consumption is dominated by the augmented distance matrices or block representations, which can limit scalability in very large domains (especially for SFTD).

5. Practical Applications and Empirical Findings

5.1. Hallucination Detection in LLMs

TOHA applied to attention graphs yields rapid, unsupervised hallucination detection:

Outperforms or matches token-entropy and SelfCheckGPT on several RAG QA, summarization, and CoQA/SQuAD benchmarks (e.g., ROC AUC 0.883 on SQuAD for LLaMA-3.1-8B).
Inference speed is orders of magnitude faster than sampling-based baselines ( $\sim$ 1.8 ms/sample versus 1,460 ms/sample for SelfCheckGPT).
Transferability: optimal heads and thresholds generalize across tasks with minimal supervision.
Only a few attention heads are required; the best heads show recurrence across models and tasks, and their selection is based on the separation in TOHA values across annotated grounded/hallucinated examples (Bazarova et al., 14 Apr 2025).

5.2. Topology-Guided TSP Heuristics

Edgewise TOHA decompositions drive guided heuristics:

2-opt+RTDL yields $0.2$– $1.5\%$ (random TSP) and $0.5$– $1\%$ (TSPLIB) further length reductions over baseline 2-opt.
2-opt+RTDL and 3-opt+RTDL avoid pathological convergence failures, speed up convergence by up to $5\times$ (large-scale neural tour initializations, e.g., $n=10,000$ ).
DQN agents with reward shaped by $-\Delta(e)$ yield 25–35% shorter tours with reduced variance (Trofimov et al., 16 Dec 2025).

5.3. GAN Evaluation and Representation Comparison

MTop-Div detects mode dropping, manifold mismatch, and distribution shifts more reliably and monotonically than FID, MMD, JSD, or GScore; aligns well with discriminative metrics and ground truth in image and 3D data (Barannikov et al., 2021).
RTD correlates with model disagreement, captures learning dynamics, and tracks transfer and interpretability with high empirical reliability across several datasets and modalities (Barannikov et al., 2021).

5.4. Shape Analysis and Computer Vision

SFTD as an additional loss in 3D cell reconstruction reduces IoU, volume, and surface errors; suppresses spurious topological defects persisting under Wasserstein-based losses.
SFTD heatmaps localize segmentation discrepancies in medical imaging, outperforming Betti matching losses in sensitive spatial detection (Trofimov et al., 2024).

6. Extensions, Relations, and Open Questions

TOHA’s formalism has inspired several related divergence notions:

Scalar field, manifold, and representation divergence all derive from the cross-barcode construction, with differences in how the joint filtration is defined and localized.
The minimum spanning forest viewpoint on $H_0$ barcodes provides a direct connection to classic graph-theoretic constructs and efficient optimization.
Potentially promising directions include joint analysis of sublevel and superlevel sets (multi-persistence), integration with GNNs for end-to-end training, and scalable approximate computation for massive volumetric or graph data.

Open questions include the full theoretical landscape of metric properties for directed divergences, the behavior of localization in complex settings (e.g., non-Euclidean domains), and the robustness of aligned barcodes in the presence of non-trivial matching ambiguities.

In summary, Topological Divergence (TOHA) and its variants provide a unified, stable, and scalable framework for quantifying shape discrepancy, guiding optimization, diagnosing learned representations, and detecting failure modes in modern data-intensive pipelines, with empirical and theoretical guarantees across combinatorial optimization, deep learning, and topological data analysis (Trofimov et al., 16 Dec 2025, Bazarova et al., 14 Apr 2025, Trofimov et al., 2024, Barannikov et al., 2021, Barannikov et al., 2021).