Betti Curve Statistics in TDA

Updated 11 April 2026

Betti curve statistics are functional descriptors that record the number of topological features over a filtration, summarizing persistent homology.
They link interval barcode information with combinatorial methods, including Lie-theoretic partition functions and juggling sequence enumeration.
Applications span cosmology, random field theory, and data science, with techniques enhancing stability and enabling statistical inference.

Betti curve statistics constitute a central class of functional topological descriptors in applied and computational topology, particularly in topological data analysis (TDA), random field theory, and related fields. A Betti curve records as a function of threshold (or filtration parameter) the number of topological features—connected components, loops, voids—present in a filtered space, dataset, or structure. These statistics both summarize persistent homology and serve as interpretable, vector-valued, and machine-learning-compatible objects that encode the multiscale topology of data. Recent work has unified the study of Betti curves with Lie-theoretic combinatorics, statistical random field theory, and Bayesian cosmological pipelines, and addressed algorithmic, stability, and asymptotic properties. This entry systematically reviews the definition, computation, information-theoretic properties, statistical behavior, and applications of Betti curve statistics.

1. Formal Definitions and Algebraic Structure

Let $K$ be a filtered simplicial complex with filtration values $0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}$ . In homological dimension $n$ , the persistent homology decomposes as a barcode $B = \{[b_j, d_j)\}_{j=1}^M$ , where $[b_j, d_j)$ records birth and death of an $n$ -cycle. The Betti curve or Betti sequence $\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N$ is defined by

$v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}$

that is, $v_i$ counts the number of $n$ -cycles alive in $0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}$ 0. In terms of homology, for $0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}$ 1 the subcomplex at parameter $0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}$ 2,

$0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}$ 3

so the Betti curve is the vector $0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}$ 4. In TDA, Betti curves as functions of filtration parameter encode the topology at each scale for each dimension. In one-parameter persistent homology over a field $0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}$ 5, for $0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}$ 6 a persistence module, the Betti curve is the dimension (Hilbert) function $0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}$ 7, written as a function $0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}$ 8 (Ashley et al., 9 Feb 2026).

Barcodes, viewed as multisets $0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}$ 9 of intervals with multiplicities $n$ 0, relate to Betti curves as

$n$ 1

where $n$ 2 if $n$ 3, else $n$ 4. Thus, Betti curves are sums of interval indicator functions with integer weights.

In various structural settings—filtered graphs, random fields, or symmetric matrices—adapted versions of Betti sequence definitions develop, always encoding the instantaneous topological signature as a function of threshold (Curto et al., 2021, Pranav et al., 2018).

2. Algorithmic and Combinatorial Enumeration

A foundational inverse question asks: how much does the Betti curve compress the information of the persistence barcode? Given $n$ 5, how many distinct barcodes (multisets of intervals) induce the same Betti curve? The answer connects to the Kostant partition function in type $n$ 6 root systems. For positive roots $n$ 7 and simple roots $n$ 8,

$n$ 9

where $B = \{[b_j, d_j)\}_{j=1}^M$ 0 counts the number of multisets of positive roots summing to $B = \{[b_j, d_j)\}_{j=1}^M$ 1. Equivalently, this is the number of barcodes corresponding to $B = \{[b_j, d_j)\}_{j=1}^M$ 2.

Furthermore, a bijection exists between barcodes realizing $B = \{[b_j, d_j)\}_{j=1}^M$ 3 and "magic" juggling sequences of length $B = \{[b_j, d_j)\}_{j=1}^M$ 4 (chains of integer state vectors evolving by redistribution at each "catch and rethrow" step). Explicit enumeration and correspondence with juggling sequences yield both a conceptual and a computational tool for quantifying the "fiber size" (preimage) of the Betti curve map (Ashley et al., 9 Feb 2026).

Algorithmically, two approaches apply:

Recursive dynamic programming: $B = \{[b_j, d_j)\}_{j=1}^M$ 5, with $B = \{[b_j, d_j)\}_{j=1}^M$ 6 over all non-increasing sequences (Young diagrams) beneath $B = \{[b_j, d_j)\}_{j=1}^M$ 7;
Polyhedral/integer-point counting: interpreting $B = \{[b_j, d_j)\}_{j=1}^M$ 8 as counting integer points in flow polytopes, enabling Barvinok-type polynomial-time algorithms for fixed $B = \{[b_j, d_j)\}_{j=1}^M$ 9.

3. Information Loss, Statistical Summaries, and Entropy

Passing from a barcode $[b_j, d_j)$ 0 to its Betti curve $[b_j, d_j)$ 1 collapses $[b_j, d_j)$ 2 distinct barcodes to the same functional signature, quantifying the information lossiness of dimension reduction in TDA. The growth of $[b_j, d_j)$ 3 as $[b_j, d_j)$ 4 or the $[b_j, d_j)$ 5 increase can be rapid, establishing that Betti curves coarsen topological summaries (Ashley et al., 9 Feb 2026).

A fiber $[b_j, d_j)$ 6 can be treated as a discrete "posterior" over barcodes given the summary $[b_j, d_j)$ 7. Assuming a uniform prior,

$[b_j, d_j)$ 8

is the entropy of barcodes with fixed Betti curve. Conditional expectations of barcode statistics (number of bars, bar-length moments) can be computed as averages over fibers, by mapping to the corresponding set of juggling sequences or root partitions.

In random field theory, Betti curves as functions of threshold (e.g., $[b_j, d_j)$ 9 or $n$ 0 for level sets) are modeled as sums of non-negative integer-valued random variables (often binomial, in excursion set decompositions), enabling exact computation or estimation of means, variances, and covariances at each threshold (Chingangbam, 7 Jul 2025).

4. Statistical Limit Theorems and Asymptotics

Betti curves admit rich probabilistic structure. For high-dimensional point clouds or excursion sets from random fields, under both Poisson and binomial models (with or without dependence),

Law of large numbers: For $n$ 1 samples, $n$ 2, with deterministic limit given by an integral against local density (Krebs, 2019).
Central limit theorem: $n$ 3 under stabilization and regularity assumptions (Krebs et al., 2019). This extends to joint convergence over finite (or functional) collections of filtration parameters.
In the sparse regime of Gaussian excursions, explicit Poisson approximation and CLT are established near phase-transition thresholds; with scaling, Betti numbers vanish, become Poisson, or concentrate to Gaussian in different regimes as the level and size scale (Thoppe et al., 2018).

For smooth random fields, Betti numbers at each threshold are modeled as sums over combinatorial basis elements with binomial coefficients, yielding approximate Gaussianity and computable covariance structure in the high-resolution or large-volume limit (Chingangbam, 7 Jul 2025). The precise conditions for Gaussianity, breakdown near critical thresholds, and influence of manifold size are all characterized.

5. Instability, Stabilization, and Distance Metrics

It is provable that the raw Betti curve $n$ 4 lacks stability under the 1-Wasserstein (or bottleneck) metric on persistence diagrams: $n$ 5 fails for any constant $n$ 6. Small perturbations to endpoints in $n$ 7 can cause arbitrarily large jumps in $n$ 8, especially at discretization thresholds.

A stabilized Betti sequence replaces sharp indicators with Gaussian kernels: for each index $n$ 9, define

$\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N$ 0

where $\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N$ 1 are isotropic Gaussians centered at bin $\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N$ 2 in $\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N$ 3 space, $\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N$ 4. The resulting $\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N$ 5 is Lipschitz and stable with respect to $\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N$ 6, and empirically much less sensitive to small perturbations (Johnson et al., 2021).

Normalized cumulative Betti sequences (aggregated up to index $\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N$ 7 and rescaled to maximum 1) provide globally stable signatures, emphasizing persistent global features.

Distances between Betti curves are commonly measured via $\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N$ 8, $\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N$ 9, or Sobolev-type norms, enabling statistical inference, clustering, or machine learning applications.

6. Applications and Interpretative Value

Betti curves serve as interpretable, low-dimensional summary statistics in diverse applied domains:

Cosmology and large-scale structure: Betti curves describe the topological content of galaxy or halo distributions, capturing clusters, filaments, and voids across scales. Persistent-homology-based Betti curves, computed via alpha complexes or Vietoris–Rips filtrations, form data vectors for Bayesian cosmological parameter inference. They provide sensitivity to features invisible to traditional two-point statistics, and—when combined with the power spectrum—break degeneracies and significantly tighten cosmological constraints on $v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}$ 0, $v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}$ 1, $v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}$ 2, and $v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}$ 3 (Li et al., 8 Dec 2025).
Random field theory: In Gaussian random fields, Betti curves $v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}$ 4 as functions of normalized threshold $v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}$ 5 partition the topology by regime: islands (β₀) at high $v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}$ 6, tunnels (β₁) at mid $v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}$ 7, voids (β₂) at low $v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}$ 8. They provide more granular discrimination than classical genus statistics, and carry explicit dependence on the power spectrum. Their analytic or empirical study enables characterization and detection of non-Gaussianity and multiscale topology (Park et al., 2013, Pranav et al., 2018, Chingangbam, 7 Jul 2025).
Persistent homology in dependent and high-dimensional data: LLN and concentration for Betti curves, even under dependence, provide statistical robustness for time series and stochastic processes (Krebs, 2019).
Topological signatures in matrix data: For symmetric matrices, e.g. covariance or correlation matrices in neuroscience, the Betti curve is an order-invariant descriptor that detects low-rank or rank-1 structure robust to monotone nonlinearity. Betti signatures can distinguish biologically meaningful assemblies from controls, given that singular-value or spectrum-based invariants may fail to do so (Curto et al., 2021).
Algebraic geometry: In the study of high-degree embeddings of algebraic curves, the asymptotic behavior of the Betti table is governed in the Boij–Söderberg sense by a pure diagram determined by genus, a phenomenon captured by the structure of corresponding Betti sequences (Erman, 2013).

7. Perspectives and Open Problems

Betti curve statistics have unified computational, combinatorial, probabilistic, and geometric perspectives in the quantitative study of topological structure in data. They bridge persistent homology, root-system combinatorics, random field topology, and algebraic geometry.

Ongoing challenges include:

Further analysis and algorithmic optimization of fiber enumeration, especially for high $v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}$ 9 and large $v_i$ 0.
Extending statistical limit theorems to more complex dependence, multi-parameter persistence, and non-Euclidean settings.
Enhancement of stability for machine learning and inference through new regularized or kernel-based Betti curve formulations.
Comprehensive utilization of Betti curves, alone or in combination with other invariants, in multiparametric statistical inference pipelines.