Riemannian Metric Matching for Scalable Geometric Modeling of Distributions

Published 12 Jun 2026 in cs.LG and math.DG | (2606.14334v1)

Abstract: High-dimensional datasets often concentrate near low-dimensional structures, but estimating their geometry from samples typically relies on graphs and kernels that scale poorly with dataset size and dimension. We propose Riemannian metric matching: a denoising probabilistic framework for learning the Riemannian geometry of data using neural networks. Specifically, we learn the carré du champ operator, which, using diffusion geometry, gives us access to the Riemannian geometry toolkit for downstream machine learning and statistical tasks. Our key observation is that the carré du champ operator can be formulated as a conditional expectation over random perturbations of the data, which can be exploited for sample-wise training and constant cost, amortized inference without explicit kernel construction. Empirically, metric matching rivals or improves the accuracy of $k$-NN-based diffusion geometry estimators, while enabling amortized inference that is up to $400\times$ faster, and supports graph-free geometric analysis on high-dimensional images where nearest neighbors break down.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper presents a neural, denoising-based method that reformulates local Riemannian metric estimation as a regression problem via the carré du champ operator.
It demonstrates up to 400× faster inference and notable memory savings compared to traditional k-NN estimators on high-dimensional datasets.
The method yields stable latent space geometry and smooth on-manifold interpolation, offering practical benefits for generative and representation learning.

Riemannian Metric Matching for Scalable Geometric Modeling of Distributions

Introduction and Motivation

The challenge of extracting intrinsic geometric information from high-dimensional data is central to modern representation learning, manifold learning, and generative modeling. This paper addresses the problem of scalable and accurate estimation of Riemannian geometry—specifically the local metric structure—directly from data, circumventing the computational and statistical limitations associated with traditional graph-based and kernel-based estimators. The core proposal is Riemannian metric matching, a neural, denoising-based approach that directly learns the carré du champ (CDC) operator, enabling access to the full toolkit of Riemannian geometry in a data-driven, scalable fashion.

Method: Conditional Metric Matching via Carré du Champ Learning

Traditional approaches to diffusion geometry and manifold analysis construct neighborhood graphs (e.g., $k$ -NN), estimate pairwise distances, and compute kernel-weighted operators, but these scale poorly—and can become unreliable—in high dimensions. The method introduced formulates the estimation of the data’s local Riemannian metric as a regression problem: a neural network is trained to approximate the CDC operator as a conditional expectation under local perturbations, inspired by denoising objectives in diffusion models.

Given data point $x$ , the carré du champ of the coordinate functions forms a local covariance structure encoding the ambient projection onto the manifold’s tangent space. The key insight is that the CDC operator can be efficiently estimated as a conditional moment under Gaussian noise, making the learning problem decomposable and highly parallelizable, and obviating the need for explicit pairwise or graph computations.

The loss function $\mathcal{L}_{cond}^{CDC}$ and its matrix-valued version for Riemannian metric matching are constructed to be unbiased surrogates for the classical CDC, with theoretical guarantees that, in the limit, recovery matches manifold geometry as the neighborhood scale shrinks.

Theoretical Guarantees and Parameterization

The authors prove that, under mild locality and smoothness conditions, the learned CDC via metric matching converges to the true Riemannian metric as the Gaussian perturbation bandwidth tends to zero, even when the intrinsic dimension varies or the data lacks a global manifold structure. Moreover, for computational efficiency and robustness, the CDC is parameterized via a low-rank factorization, with rank bounds informed by geometric topology (minimal $r \geq 2d-1$ for $d$ -dimensional manifolds).

Empirical Results: Scalability, Accuracy, and Applications

Throughput and Performance

Comparative experiments on synthetic high-dimensional spheres demonstrate the method’s scalability: the neural metric-matching surrogate achieves up to 400× faster inference than $k$ -NN-based CDC estimators for large datasets, with significant memory savings. Additionally, the low-rank decomposition enables efficient eigendecomposition, critical for tangent space estimation.

Figure 1: Comparison of throughput and accuracy between neural metric matching and k-NN CDC estimators on varying dataset sizes.

Eigenstructure Analysis

The metric-matching network accurately reconstructs the local geometry, as evidenced by eigenanalysis of the learned CDC matrices. On real datasets such as MNIST and FFHQ, the principal eigenvectors correspond to semantically meaningful directions—such as translations or deformations—aligning with the intrinsic data structure.

Figure 2: Visualization of the learned CDC eigenvectors at different positions, revealing interpretable tangent directions.

Figure 3: CDC eigenvectors learned on FFHQ showing the multi-scale, interpretable tangent structure in high-dimensional data.

Stability of Features under Tangent Perturbations

Quality of the learned Riemannian metric is quantified by measuring feature stability under CDC-induced random perturbations in inception space. Metric matching yields significantly lower variation compared to isotropic (Euclidean) or rank-1 (Stein) metrics, especially in high dimensions, matching or outperforming the Jacobian-based (“score”) metrics but with vastly reduced computational cost.

Figure 4: Inception feature stability under tangent perturbations—lower is better, showing metric matching’s advantage across datasets.

Interpolation and Manifold Optimization

The learned CDC enables construction of interpolating paths constrained to the data manifold via Riemannian gradient flows. Compared to linear interpolants, paths generated using metric matching smoothly traverse the high-density regions of the data distribution, remaining perceptually valid and within-class.

Figure 5: Visual comparison of on-manifold interpolation (metric matching) versus linear interpolation in the MNIST dataset.

Figure 6: MNIST interpolation paths under different metrics; metric matching yields natural morphing within the data class, contrasting with ghosting artifacts in LERP.

Latent Space Geometry

When applied in the latent space of generative models (e.g., VAEs), metric matching constructs a Riemannian structure that reveals curvature and nonlinearity missed by naive Euclidean geometry. Comparison with pullback metrics further corroborates the interpretability and flexibility of the approach.

Figure 7: Visualization of metric matching versus linear and pullback metrics in VAE latent spaces, illustrating improved path regularity.

Implications and Future Directions

Practically, the method extends the reach of manifold and diffusion geometry tools to massive, high-dimensional datasets—such as natural image collections—where graph-based approaches are infeasible. Theoretical implications include enabling geometric learning and optimization (e.g., geodesics, intrinsic gradients) directly from data, advancing data-driven Riemannian statistics.

The decoupling from pairwise computations opens avenues for integrating metric learning seamlessly into deep learning pipelines, both for analysis (dimension estimation, tangent recovery) and synthesis (diffusion/generative models). Future work will likely explore generalization phenomena of neural CDC surrogates, synergy with geometric structure in architectures, and the extension to distributions lacking manifold structure.

Conclusion

Riemannian metric matching provides a scalable, theoretically-grounded framework for geometric modeling of high-dimensional distributions using neural networks. It surpasses existing graph-based and Jacobian-based approaches in computational efficiency, matches or improves geometric fidelity, and situates Riemannian geometry as a tractable, expressive tool for large-scale machine learning, with broad implications for representation learning, generative modeling, and geometric data analysis.

Markdown Report Issue