Distance-to-a-Measure (DTM)
- Distance-to-a-Measure (DTM) is a robust statistical and geometric tool that generalizes the classic distance function by using a probability measure to capture structural features.
- It employs a mass parameter and k-nearest neighbor averaging to control the bias-variance trade-off and efficiently estimate geometric properties from data.
- DTM supports stable topological inference through Lipschitz continuity and Wasserstein stability, making it valuable for analyzing noisy point cloud data.
The Distance-to-a-Measure (DTM) is a statistical and geometric functional designed to robustly estimate geometric and topological properties of data distributions in Euclidean and general metric spaces. DTM generalizes the classical distance-to-set function by replacing the support set with a probability measure, providing stability under noise and outliers and enabling rigorous inference procedures for geometric and topological data analysis.
1. Formal Definition and Variants
Let be a Borel probability measure on (or a Polish metric space ), and let be a mass parameter (often called the "resolution" or "smoothing" parameter). For each , define the minimal radius required to capture mass around by
The DTM of order at is then
0
The case 1 is common in applications, often written as 2. When the underlying measure admits a density 3, the DTM recovers scale-adapted density information as 4 via 5 under regularity assumptions (Taupin et al., 3 Apr 2025).
For empirical estimation, given 6 i.i.d. points 7, the empirical measure 8 yields the empirical DTM (DTEM): 9 When 0, this admits the discrete representation
1
where 2 are the indices of the 3 nearest neighbors of 4 among 5 (Chazal et al., 2015, Anai et al., 2018, Chazal et al., 2014).
2. Stability, Lipschitz Properties, and Robustness
One of the core features of DTM is its regularity and stability under perturbations:
- Lipschitz continuity in 6: 7 is 8-Lipschitz for 9 and 0-Lipschitz in general (with respect to the Euclidean metric), ensuring geometric smoothness and well-behaved sublevel sets (Taupin et al., 3 Apr 2025, Anai et al., 2018, Chazal et al., 2014, Proksch et al., 2022).
- Wasserstein stability to changes in measure:
1
so small 2-perturbations of 3 yield controlled perturbations of DTM. For 4, the stability constant is 5 (Taupin et al., 3 Apr 2025, Chazal et al., 2014, Anai et al., 2018, Brécheteau, 2017).
- Outlier robustness: DTM substantially suppresses the impact of outliers due to its averaging over neighborhoods of mass 6. For small 7, it interpolates between the raw distance-to-set and a 8-nearest-neighbors average (Guibas et al., 2011, Chazal et al., 2014).
- Monotonicity: 9 is nonincreasing, governing a bias-variance trade-off in practical inference (Taupin et al., 3 Apr 2025).
3. Rates of Convergence and Statistical Properties
The behavior of the empirical DTM (DTEM) and inferential guarantees depend on regularity of the quantile function of the distance distribution: 0
- Deviation bounds: With 1,
2
where 3 is a modulus of continuity for the quantile function. For 4-standard measures, 5. This upper bound matches lower bounds up to constant factors for small 6 (Chazal et al., 2015).
- Convergence rates: For measures with 7-dimensional behavior and when 8 is fixed,
9
with uniform control over compact sets (Proksch et al., 2022). For sets with low intrinsic dimension, one recovers the parametric 0 rate.
- Central limit theorems: For fixed 1 and regular quantiles, 2 converges in law to a normal variable, and there is a functional CLT for uniform convergence over compact sets (Chazal et al., 2014).
- Bootstrap inference: Both functional and bottleneck-bootstrapping for the sublevel set persistence diagrams provide valid confidence bands that directly translate to significance levels for topological inference (Chazal et al., 2014).
4. Algorithmic Computation and Approximations
Direct computation of the DTM at all points in a large dataset can be expensive. Several algorithmic approximations have been developed to maintain computational tractability:
- Nearest-neighbors averaging: The empirical DTM reduces to 3-nearest neighbor averaging, supporting fast 4 implementations via k-d trees, ball trees, or approximate nearest-neighbor search (Anai et al., 2018).
- Power distance and barycentric formulations: The DTM can be written as a power distance over all barycenters of 5-subsets of the data, but the combinatorial explosion restricts this to small 6 unless an approximation is used (Guibas et al., 2011).
- Witnessed k-distance: By restricting barycenters to those “witnessed” by each point and its nearest neighbors, one obtains an 7-size representation with controlled multiplicative error:
8
- k-PDTM: The k-power DTM (k-PDTM) trades data points for 9 cluster centers, reducing sublevel set complexity from 0 balls to 1 balls. For intrinsic dimension 2, the 3 error between DTM and k-PDTM scales as 4. Algorithmically, a Lloyd-type Voronoi iteration finds 5 local means and variances, supporting topological computations that scale sublinearly with 6 (Brécheteau et al., 2018).
5. Applications in Topological and Geometric Data Analysis
DTM is widely used in topological data analysis for robust inference of geometrical and topological features from noisy point clouds:
- DTM-based filtrations: Weighted Čech or Rips filtrations based on DTM values yield persistent homology diagrams that are stable to Wasserstein perturbations and outliers (Anai et al., 2018). In contrast to traditional distance-to-set filtrations, DTM guarantees reduced outlier sensitivity and has explicit quantitative stability bounds.
- Support and homology recovery: Sublevel sets 7 consistently estimate the support of 8 even in high dimensions when 9 is restricted to a low-dimensional manifold (Taupin et al., 3 Apr 2025, Chazal et al., 2014).
- Statistical inference: Confidence sets for topological features, such as persistence diagram banding or max-persistence rules for choosing 0, are derived via bootstrap and probability inequalities for the DTEM (Chazal et al., 2014, Chazal et al., 2015).
- DTM signatures: The DTM-signature 1, defined as the pushforward measure of 2 under 3, provides a one-dimensional summary for metric-measure spaces, supports Gromov-Wasserstein-based lower bounds, and enables asymptotic two-sample testing with proven error guarantees (Brécheteau, 2017).
- Kernel density and classification: Density estimation on DTM-transformed data yields robust geometric features enabling high-accuracy clustering and classification in applications such as single molecule microscopy (Proksch et al., 2022).
6. Extensions, Metrics, and Related Constructions
- Fermat Distance-to-Measure: A generalization of Fermat/density-driven metrics using DTM instead of density, yielding a conformal metric defined for any probability measure (no absolute continuity required), with provable stability and explicit convergence rates (Taupin et al., 3 Apr 2025).
- Bias-variance trade-off: The parameter 4 controls the trade-off between statistical stability and geometric bias. Data-driven methods such as maximizing the sum or count of significant persistence lifetimes under bootstrap critical values optimize this trade-off in practice (Chazal et al., 2014).
- Sample complexity and minimax-optimal rates: For intrinsic dimension 5, 6 is minimax-optimal for uniform convergence, bridging statistical and geometric complexities (Proksch et al., 2022, Chazal et al., 2014).
- Multiscale and density-adaptation: DTM acts as a multiscale smoothing functional, interpolating between classical fedrto-density transforms and raw support-based inference as 7 (Taupin et al., 3 Apr 2025, Chazal et al., 2014).
7. Numerical Experiments and Empirical Behavior
- Numerical experiments confirm the theoretical rates and tightness of deviation bounds for DTEM, including in the presence of Gaussian or clutter noise. The empirical bias matches the theoretical shape 8, with no slack of bounds even in challenging regimes (Chazal et al., 2015).
- Empirical DTM densities enable perfect or near-perfect separation of point clouds with subtle geometric differences, demonstrating the practical discriminativity and robustness of the DTM transformation (Proksch et al., 2022).
The DTM thus provides a rigorous, robust, and computationally effective framework for geometric and topological inference from point cloud data, with deep connections to empirical process theory, Wasserstein geometry, and statistical learning (Chazal et al., 2015, Taupin et al., 3 Apr 2025, Guibas et al., 2011, Chazal et al., 2014, Brécheteau et al., 2018, Anai et al., 2018, Brécheteau, 2017, Proksch et al., 2022).