Papers
Topics
Authors
Recent
Search
2000 character limit reached

Betti Curve Statistics in TDA

Updated 11 April 2026
  • Betti curve statistics are functional descriptors that record the number of topological features over a filtration, summarizing persistent homology.
  • They link interval barcode information with combinatorial methods, including Lie-theoretic partition functions and juggling sequence enumeration.
  • Applications span cosmology, random field theory, and data science, with techniques enhancing stability and enabling statistical inference.

Betti curve statistics constitute a central class of functional topological descriptors in applied and computational topology, particularly in topological data analysis (TDA), random field theory, and related fields. A Betti curve records as a function of threshold (or filtration parameter) the number of topological features—connected components, loops, voids—present in a filtered space, dataset, or structure. These statistics both summarize persistent homology and serve as interpretable, vector-valued, and machine-learning-compatible objects that encode the multiscale topology of data. Recent work has unified the study of Betti curves with Lie-theoretic combinatorics, statistical random field theory, and Bayesian cosmological pipelines, and addressed algorithmic, stability, and asymptotic properties. This entry systematically reviews the definition, computation, information-theoretic properties, statistical behavior, and applications of Betti curve statistics.

1. Formal Definitions and Algebraic Structure

Let KK be a filtered simplicial complex with filtration values 0=τ0<τ1<<τN=τmax0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}. In homological dimension nn, the persistent homology decomposes as a barcode B={[bj,dj)}j=1MB = \{[b_j, d_j)\}_{j=1}^M, where [bj,dj)[b_j, d_j) records birth and death of an nn-cycle. The Betti curve or Betti sequence v(B)=(v1,...,vN)RN\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N is defined by

vi=#{[bj,dj):τi1<τ<τi for some τ[bj,dj)}v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}

that is, viv_i counts the number of nn-cycles alive in 0=τ0<τ1<<τN=τmax0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}0. In terms of homology, for 0=τ0<τ1<<τN=τmax0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}1 the subcomplex at parameter 0=τ0<τ1<<τN=τmax0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}2,

0=τ0<τ1<<τN=τmax0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}3

so the Betti curve is the vector 0=τ0<τ1<<τN=τmax0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}4. In TDA, Betti curves as functions of filtration parameter encode the topology at each scale for each dimension. In one-parameter persistent homology over a field 0=τ0<τ1<<τN=τmax0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}5, for 0=τ0<τ1<<τN=τmax0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}6 a persistence module, the Betti curve is the dimension (Hilbert) function 0=τ0<τ1<<τN=τmax0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}7, written as a function 0=τ0<τ1<<τN=τmax0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}8 (Ashley et al., 9 Feb 2026).

Barcodes, viewed as multisets 0=τ0<τ1<<τN=τmax0 = \tau_0 < \tau_1 < \dots < \tau_N = \tau_{\max}9 of intervals with multiplicities nn0, relate to Betti curves as

nn1

where nn2 if nn3, else nn4. Thus, Betti curves are sums of interval indicator functions with integer weights.

In various structural settings—filtered graphs, random fields, or symmetric matrices—adapted versions of Betti sequence definitions develop, always encoding the instantaneous topological signature as a function of threshold (Curto et al., 2021, Pranav et al., 2018).

2. Algorithmic and Combinatorial Enumeration

A foundational inverse question asks: how much does the Betti curve compress the information of the persistence barcode? Given nn5, how many distinct barcodes (multisets of intervals) induce the same Betti curve? The answer connects to the Kostant partition function in type nn6 root systems. For positive roots nn7 and simple roots nn8,

nn9

where B={[bj,dj)}j=1MB = \{[b_j, d_j)\}_{j=1}^M0 counts the number of multisets of positive roots summing to B={[bj,dj)}j=1MB = \{[b_j, d_j)\}_{j=1}^M1. Equivalently, this is the number of barcodes corresponding to B={[bj,dj)}j=1MB = \{[b_j, d_j)\}_{j=1}^M2.

Furthermore, a bijection exists between barcodes realizing B={[bj,dj)}j=1MB = \{[b_j, d_j)\}_{j=1}^M3 and "magic" juggling sequences of length B={[bj,dj)}j=1MB = \{[b_j, d_j)\}_{j=1}^M4 (chains of integer state vectors evolving by redistribution at each "catch and rethrow" step). Explicit enumeration and correspondence with juggling sequences yield both a conceptual and a computational tool for quantifying the "fiber size" (preimage) of the Betti curve map (Ashley et al., 9 Feb 2026).

Algorithmically, two approaches apply:

  • Recursive dynamic programming: B={[bj,dj)}j=1MB = \{[b_j, d_j)\}_{j=1}^M5, with B={[bj,dj)}j=1MB = \{[b_j, d_j)\}_{j=1}^M6 over all non-increasing sequences (Young diagrams) beneath B={[bj,dj)}j=1MB = \{[b_j, d_j)\}_{j=1}^M7;
  • Polyhedral/integer-point counting: interpreting B={[bj,dj)}j=1MB = \{[b_j, d_j)\}_{j=1}^M8 as counting integer points in flow polytopes, enabling Barvinok-type polynomial-time algorithms for fixed B={[bj,dj)}j=1MB = \{[b_j, d_j)\}_{j=1}^M9.

3. Information Loss, Statistical Summaries, and Entropy

Passing from a barcode [bj,dj)[b_j, d_j)0 to its Betti curve [bj,dj)[b_j, d_j)1 collapses [bj,dj)[b_j, d_j)2 distinct barcodes to the same functional signature, quantifying the information lossiness of dimension reduction in TDA. The growth of [bj,dj)[b_j, d_j)3 as [bj,dj)[b_j, d_j)4 or the [bj,dj)[b_j, d_j)5 increase can be rapid, establishing that Betti curves coarsen topological summaries (Ashley et al., 9 Feb 2026).

A fiber [bj,dj)[b_j, d_j)6 can be treated as a discrete "posterior" over barcodes given the summary [bj,dj)[b_j, d_j)7. Assuming a uniform prior,

[bj,dj)[b_j, d_j)8

is the entropy of barcodes with fixed Betti curve. Conditional expectations of barcode statistics (number of bars, bar-length moments) can be computed as averages over fibers, by mapping to the corresponding set of juggling sequences or root partitions.

In random field theory, Betti curves as functions of threshold (e.g., [bj,dj)[b_j, d_j)9 or nn0 for level sets) are modeled as sums of non-negative integer-valued random variables (often binomial, in excursion set decompositions), enabling exact computation or estimation of means, variances, and covariances at each threshold (Chingangbam, 7 Jul 2025).

4. Statistical Limit Theorems and Asymptotics

Betti curves admit rich probabilistic structure. For high-dimensional point clouds or excursion sets from random fields, under both Poisson and binomial models (with or without dependence),

  • Law of large numbers: For nn1 samples, nn2, with deterministic limit given by an integral against local density (Krebs, 2019).
  • Central limit theorem: nn3 under stabilization and regularity assumptions (Krebs et al., 2019). This extends to joint convergence over finite (or functional) collections of filtration parameters.
  • In the sparse regime of Gaussian excursions, explicit Poisson approximation and CLT are established near phase-transition thresholds; with scaling, Betti numbers vanish, become Poisson, or concentrate to Gaussian in different regimes as the level and size scale (Thoppe et al., 2018).

For smooth random fields, Betti numbers at each threshold are modeled as sums over combinatorial basis elements with binomial coefficients, yielding approximate Gaussianity and computable covariance structure in the high-resolution or large-volume limit (Chingangbam, 7 Jul 2025). The precise conditions for Gaussianity, breakdown near critical thresholds, and influence of manifold size are all characterized.

5. Instability, Stabilization, and Distance Metrics

It is provable that the raw Betti curve nn4 lacks stability under the 1-Wasserstein (or bottleneck) metric on persistence diagrams: nn5 fails for any constant nn6. Small perturbations to endpoints in nn7 can cause arbitrarily large jumps in nn8, especially at discretization thresholds.

A stabilized Betti sequence replaces sharp indicators with Gaussian kernels: for each index nn9, define

v(B)=(v1,...,vN)RN\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N0

where v(B)=(v1,...,vN)RN\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N1 are isotropic Gaussians centered at bin v(B)=(v1,...,vN)RN\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N2 in v(B)=(v1,...,vN)RN\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N3 space, v(B)=(v1,...,vN)RN\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N4. The resulting v(B)=(v1,...,vN)RN\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N5 is Lipschitz and stable with respect to v(B)=(v1,...,vN)RN\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N6, and empirically much less sensitive to small perturbations (Johnson et al., 2021).

Normalized cumulative Betti sequences (aggregated up to index v(B)=(v1,...,vN)RN\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N7 and rescaled to maximum 1) provide globally stable signatures, emphasizing persistent global features.

Distances between Betti curves are commonly measured via v(B)=(v1,...,vN)RN\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N8, v(B)=(v1,...,vN)RN\vec v(B) = (v_1, ..., v_N) \in \mathbb{R}^N9, or Sobolev-type norms, enabling statistical inference, clustering, or machine learning applications.

6. Applications and Interpretative Value

Betti curves serve as interpretable, low-dimensional summary statistics in diverse applied domains:

  • Cosmology and large-scale structure: Betti curves describe the topological content of galaxy or halo distributions, capturing clusters, filaments, and voids across scales. Persistent-homology-based Betti curves, computed via alpha complexes or Vietoris–Rips filtrations, form data vectors for Bayesian cosmological parameter inference. They provide sensitivity to features invisible to traditional two-point statistics, and—when combined with the power spectrum—break degeneracies and significantly tighten cosmological constraints on vi=#{[bj,dj):τi1<τ<τi for some τ[bj,dj)}v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}0, vi=#{[bj,dj):τi1<τ<τi for some τ[bj,dj)}v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}1, vi=#{[bj,dj):τi1<τ<τi for some τ[bj,dj)}v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}2, and vi=#{[bj,dj):τi1<τ<τi for some τ[bj,dj)}v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}3 (Li et al., 8 Dec 2025).
  • Random field theory: In Gaussian random fields, Betti curves vi=#{[bj,dj):τi1<τ<τi for some τ[bj,dj)}v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}4 as functions of normalized threshold vi=#{[bj,dj):τi1<τ<τi for some τ[bj,dj)}v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}5 partition the topology by regime: islands (β₀) at high vi=#{[bj,dj):τi1<τ<τi for some τ[bj,dj)}v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}6, tunnels (β₁) at mid vi=#{[bj,dj):τi1<τ<τi for some τ[bj,dj)}v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}7, voids (β₂) at low vi=#{[bj,dj):τi1<τ<τi for some τ[bj,dj)}v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}8. They provide more granular discrimination than classical genus statistics, and carry explicit dependence on the power spectrum. Their analytic or empirical study enables characterization and detection of non-Gaussianity and multiscale topology (Park et al., 2013, Pranav et al., 2018, Chingangbam, 7 Jul 2025).
  • Persistent homology in dependent and high-dimensional data: LLN and concentration for Betti curves, even under dependence, provide statistical robustness for time series and stochastic processes (Krebs, 2019).
  • Topological signatures in matrix data: For symmetric matrices, e.g. covariance or correlation matrices in neuroscience, the Betti curve is an order-invariant descriptor that detects low-rank or rank-1 structure robust to monotone nonlinearity. Betti signatures can distinguish biologically meaningful assemblies from controls, given that singular-value or spectrum-based invariants may fail to do so (Curto et al., 2021).
  • Algebraic geometry: In the study of high-degree embeddings of algebraic curves, the asymptotic behavior of the Betti table is governed in the Boij–Söderberg sense by a pure diagram determined by genus, a phenomenon captured by the structure of corresponding Betti sequences (Erman, 2013).

7. Perspectives and Open Problems

Betti curve statistics have unified computational, combinatorial, probabilistic, and geometric perspectives in the quantitative study of topological structure in data. They bridge persistent homology, root-system combinatorics, random field topology, and algebraic geometry.

Ongoing challenges include:

  • Further analysis and algorithmic optimization of fiber enumeration, especially for high vi=#{[bj,dj):τi1<τ<τi for some τ[bj,dj)}v_i = \#\{[b_j, d_j) : \tau_{i-1} < \tau < \tau_i \text{ for some } \tau \in [b_j, d_j)\}9 and large viv_i0.
  • Extending statistical limit theorems to more complex dependence, multi-parameter persistence, and non-Euclidean settings.
  • Enhancement of stability for machine learning and inference through new regularized or kernel-based Betti curve formulations.
  • Comprehensive utilization of Betti curves, alone or in combination with other invariants, in multiparametric statistical inference pipelines.

Betti curves thus remain central both in the theoretical refinement of topological invariants and in the methodological toolkit of modern data science (Ashley et al., 9 Feb 2026, Li et al., 8 Dec 2025, Johnson et al., 2021, Park et al., 2013, Pranav et al., 2018, Chingangbam, 7 Jul 2025, Krebs et al., 2019, Krebs, 2019, Thoppe et al., 2018, Curto et al., 2021, Erman, 2013).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Betti Curve Statistics.