Papers
Topics
Authors
Recent
Search
2000 character limit reached

Distance-to-a-Measure (DTM)

Updated 27 May 2026
  • Distance-to-a-Measure (DTM) is a robust statistical and geometric tool that generalizes the classic distance function by using a probability measure to capture structural features.
  • It employs a mass parameter and k-nearest neighbor averaging to control the bias-variance trade-off and efficiently estimate geometric properties from data.
  • DTM supports stable topological inference through Lipschitz continuity and Wasserstein stability, making it valuable for analyzing noisy point cloud data.

The Distance-to-a-Measure (DTM) is a statistical and geometric functional designed to robustly estimate geometric and topological properties of data distributions in Euclidean and general metric spaces. DTM generalizes the classical distance-to-set function by replacing the support set with a probability measure, providing stability under noise and outliers and enabling rigorous inference procedures for geometric and topological data analysis.

1. Formal Definition and Variants

Let PP be a Borel probability measure on Rd\mathbb{R}^d (or a Polish metric space (X,δ)(\mathcal{X},\delta)), and let m(0,1]m\in (0,1] be a mass parameter (often called the "resolution" or "smoothing" parameter). For each xRdx\in \mathbb{R}^d, define the minimal radius required to capture mass umu \leq m around xx by

δP,u(x)=inf{t>0:P(B(x,t))u}.\delta_{P,u}(x) = \inf\{ t > 0 : P(\overline{B}(x, t)) \geq u \}.

The DTM of order r1r\geq 1 at xx is then

Rd\mathbb{R}^d0

The case Rd\mathbb{R}^d1 is common in applications, often written as Rd\mathbb{R}^d2. When the underlying measure admits a density Rd\mathbb{R}^d3, the DTM recovers scale-adapted density information as Rd\mathbb{R}^d4 via Rd\mathbb{R}^d5 under regularity assumptions (Taupin et al., 3 Apr 2025).

For empirical estimation, given Rd\mathbb{R}^d6 i.i.d. points Rd\mathbb{R}^d7, the empirical measure Rd\mathbb{R}^d8 yields the empirical DTM (DTEM): Rd\mathbb{R}^d9 When (X,δ)(\mathcal{X},\delta)0, this admits the discrete representation

(X,δ)(\mathcal{X},\delta)1

where (X,δ)(\mathcal{X},\delta)2 are the indices of the (X,δ)(\mathcal{X},\delta)3 nearest neighbors of (X,δ)(\mathcal{X},\delta)4 among (X,δ)(\mathcal{X},\delta)5 (Chazal et al., 2015, Anai et al., 2018, Chazal et al., 2014).

2. Stability, Lipschitz Properties, and Robustness

One of the core features of DTM is its regularity and stability under perturbations:

  • Lipschitz continuity in (X,δ)(\mathcal{X},\delta)6: (X,δ)(\mathcal{X},\delta)7 is (X,δ)(\mathcal{X},\delta)8-Lipschitz for (X,δ)(\mathcal{X},\delta)9 and m(0,1]m\in (0,1]0-Lipschitz in general (with respect to the Euclidean metric), ensuring geometric smoothness and well-behaved sublevel sets (Taupin et al., 3 Apr 2025, Anai et al., 2018, Chazal et al., 2014, Proksch et al., 2022).
  • Wasserstein stability to changes in measure:

m(0,1]m\in (0,1]1

so small m(0,1]m\in (0,1]2-perturbations of m(0,1]m\in (0,1]3 yield controlled perturbations of DTM. For m(0,1]m\in (0,1]4, the stability constant is m(0,1]m\in (0,1]5 (Taupin et al., 3 Apr 2025, Chazal et al., 2014, Anai et al., 2018, Brécheteau, 2017).

  • Outlier robustness: DTM substantially suppresses the impact of outliers due to its averaging over neighborhoods of mass m(0,1]m\in (0,1]6. For small m(0,1]m\in (0,1]7, it interpolates between the raw distance-to-set and a m(0,1]m\in (0,1]8-nearest-neighbors average (Guibas et al., 2011, Chazal et al., 2014).
  • Monotonicity: m(0,1]m\in (0,1]9 is nonincreasing, governing a bias-variance trade-off in practical inference (Taupin et al., 3 Apr 2025).

3. Rates of Convergence and Statistical Properties

The behavior of the empirical DTM (DTEM) and inferential guarantees depend on regularity of the quantile function of the distance distribution: xRdx\in \mathbb{R}^d0

  • Deviation bounds: With xRdx\in \mathbb{R}^d1,

xRdx\in \mathbb{R}^d2

where xRdx\in \mathbb{R}^d3 is a modulus of continuity for the quantile function. For xRdx\in \mathbb{R}^d4-standard measures, xRdx\in \mathbb{R}^d5. This upper bound matches lower bounds up to constant factors for small xRdx\in \mathbb{R}^d6 (Chazal et al., 2015).

  • Convergence rates: For measures with xRdx\in \mathbb{R}^d7-dimensional behavior and when xRdx\in \mathbb{R}^d8 is fixed,

xRdx\in \mathbb{R}^d9

with uniform control over compact sets (Proksch et al., 2022). For sets with low intrinsic dimension, one recovers the parametric umu \leq m0 rate.

  • Central limit theorems: For fixed umu \leq m1 and regular quantiles, umu \leq m2 converges in law to a normal variable, and there is a functional CLT for uniform convergence over compact sets (Chazal et al., 2014).
  • Bootstrap inference: Both functional and bottleneck-bootstrapping for the sublevel set persistence diagrams provide valid confidence bands that directly translate to significance levels for topological inference (Chazal et al., 2014).

4. Algorithmic Computation and Approximations

Direct computation of the DTM at all points in a large dataset can be expensive. Several algorithmic approximations have been developed to maintain computational tractability:

  • Nearest-neighbors averaging: The empirical DTM reduces to umu \leq m3-nearest neighbor averaging, supporting fast umu \leq m4 implementations via k-d trees, ball trees, or approximate nearest-neighbor search (Anai et al., 2018).
  • Power distance and barycentric formulations: The DTM can be written as a power distance over all barycenters of umu \leq m5-subsets of the data, but the combinatorial explosion restricts this to small umu \leq m6 unless an approximation is used (Guibas et al., 2011).
  • Witnessed k-distance: By restricting barycenters to those “witnessed” by each point and its nearest neighbors, one obtains an umu \leq m7-size representation with controlled multiplicative error:

umu \leq m8

(Guibas et al., 2011).

  • k-PDTM: The k-power DTM (k-PDTM) trades data points for umu \leq m9 cluster centers, reducing sublevel set complexity from xx0 balls to xx1 balls. For intrinsic dimension xx2, the xx3 error between DTM and k-PDTM scales as xx4. Algorithmically, a Lloyd-type Voronoi iteration finds xx5 local means and variances, supporting topological computations that scale sublinearly with xx6 (Brécheteau et al., 2018).

5. Applications in Topological and Geometric Data Analysis

DTM is widely used in topological data analysis for robust inference of geometrical and topological features from noisy point clouds:

  • DTM-based filtrations: Weighted Čech or Rips filtrations based on DTM values yield persistent homology diagrams that are stable to Wasserstein perturbations and outliers (Anai et al., 2018). In contrast to traditional distance-to-set filtrations, DTM guarantees reduced outlier sensitivity and has explicit quantitative stability bounds.
  • Support and homology recovery: Sublevel sets xx7 consistently estimate the support of xx8 even in high dimensions when xx9 is restricted to a low-dimensional manifold (Taupin et al., 3 Apr 2025, Chazal et al., 2014).
  • Statistical inference: Confidence sets for topological features, such as persistence diagram banding or max-persistence rules for choosing δP,u(x)=inf{t>0:P(B(x,t))u}.\delta_{P,u}(x) = \inf\{ t > 0 : P(\overline{B}(x, t)) \geq u \}.0, are derived via bootstrap and probability inequalities for the DTEM (Chazal et al., 2014, Chazal et al., 2015).
  • DTM signatures: The DTM-signature δP,u(x)=inf{t>0:P(B(x,t))u}.\delta_{P,u}(x) = \inf\{ t > 0 : P(\overline{B}(x, t)) \geq u \}.1, defined as the pushforward measure of δP,u(x)=inf{t>0:P(B(x,t))u}.\delta_{P,u}(x) = \inf\{ t > 0 : P(\overline{B}(x, t)) \geq u \}.2 under δP,u(x)=inf{t>0:P(B(x,t))u}.\delta_{P,u}(x) = \inf\{ t > 0 : P(\overline{B}(x, t)) \geq u \}.3, provides a one-dimensional summary for metric-measure spaces, supports Gromov-Wasserstein-based lower bounds, and enables asymptotic two-sample testing with proven error guarantees (Brécheteau, 2017).
  • Kernel density and classification: Density estimation on DTM-transformed data yields robust geometric features enabling high-accuracy clustering and classification in applications such as single molecule microscopy (Proksch et al., 2022).
  • Fermat Distance-to-Measure: A generalization of Fermat/density-driven metrics using DTM instead of density, yielding a conformal metric defined for any probability measure (no absolute continuity required), with provable stability and explicit convergence rates (Taupin et al., 3 Apr 2025).
  • Bias-variance trade-off: The parameter δP,u(x)=inf{t>0:P(B(x,t))u}.\delta_{P,u}(x) = \inf\{ t > 0 : P(\overline{B}(x, t)) \geq u \}.4 controls the trade-off between statistical stability and geometric bias. Data-driven methods such as maximizing the sum or count of significant persistence lifetimes under bootstrap critical values optimize this trade-off in practice (Chazal et al., 2014).
  • Sample complexity and minimax-optimal rates: For intrinsic dimension δP,u(x)=inf{t>0:P(B(x,t))u}.\delta_{P,u}(x) = \inf\{ t > 0 : P(\overline{B}(x, t)) \geq u \}.5, δP,u(x)=inf{t>0:P(B(x,t))u}.\delta_{P,u}(x) = \inf\{ t > 0 : P(\overline{B}(x, t)) \geq u \}.6 is minimax-optimal for uniform convergence, bridging statistical and geometric complexities (Proksch et al., 2022, Chazal et al., 2014).
  • Multiscale and density-adaptation: DTM acts as a multiscale smoothing functional, interpolating between classical fedrto-density transforms and raw support-based inference as δP,u(x)=inf{t>0:P(B(x,t))u}.\delta_{P,u}(x) = \inf\{ t > 0 : P(\overline{B}(x, t)) \geq u \}.7 (Taupin et al., 3 Apr 2025, Chazal et al., 2014).

7. Numerical Experiments and Empirical Behavior

  • Numerical experiments confirm the theoretical rates and tightness of deviation bounds for DTEM, including in the presence of Gaussian or clutter noise. The empirical bias matches the theoretical shape δP,u(x)=inf{t>0:P(B(x,t))u}.\delta_{P,u}(x) = \inf\{ t > 0 : P(\overline{B}(x, t)) \geq u \}.8, with no slack of bounds even in challenging regimes (Chazal et al., 2015).
  • Empirical DTM densities enable perfect or near-perfect separation of point clouds with subtle geometric differences, demonstrating the practical discriminativity and robustness of the DTM transformation (Proksch et al., 2022).

The DTM thus provides a rigorous, robust, and computationally effective framework for geometric and topological inference from point cloud data, with deep connections to empirical process theory, Wasserstein geometry, and statistical learning (Chazal et al., 2015, Taupin et al., 3 Apr 2025, Guibas et al., 2011, Chazal et al., 2014, Brécheteau et al., 2018, Anai et al., 2018, Brécheteau, 2017, Proksch et al., 2022).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Distance-to-a-Measure (DTM).