Score-Based Riemannian Geometry
- Score-based Riemannian geometry is the synthesis of deep generative models with geometric structures, enabling analysis of data concentrated on low-dimensional manifolds.
- It constructs Riemannian metrics from score functions, facilitating efficient computation of geodesics, projections, and optimization through local curvature and tangent space estimation.
- Applications include generative sampling, image interpolation, and dimensionality recovery, with theoretical guarantees ensuring robust manifold recovery and scalable algorithm performance.
Score-based Riemannian geometry refers to the synthesis of Riemannian geometric structures with score-based models—primarily denoising diffusion and score-based generative models—for the analysis, sampling, optimization, and representation of high-dimensional data presumed to be concentrated near, or supported on, low-dimensional manifolds embedded in ambient spaces. This field bridges advances in deep generative modeling (notably diffusion models) with foundational geometric techniques, enabling the extraction, manipulation, and computation of manifold geometry directly from learned score functions, and extending these methodologies to arbitrary Riemannian and sub-Riemannian settings.
1. Theoretical Foundations: Manifold Hypothesis and Score-Geometry Interaction
Under the manifold hypothesis, a high-dimensional data distribution is supported on or near a compact, embedded submanifold of intrinsic dimension . In generative frameworks, the data distribution is accessed only via samples or densities, not through explicit manifold parameterizations. Gaussian smoothing operations lead to densities concentrated in tubular neighborhoods of , enabling the definition of a Stein score .
A critical insight from recent work is the separation of information scales: the score contains -scale information about the normal direction (dictating the geometry of ), while carrying only -scale information about the tangential, on-manifold density (Li et al., 29 Sep 2025). This scale separation underpins the feasibility of manifold recovery, manifold-aware sampling, and optimization via learned score functions, even when density estimation remains statistically or computationally harder.
2. Metric Construction: Score-Induced Riemannian Structures
Multiple constructions of Riemannian metrics from score functions have emerged:
- Ambient Stein-score metric: Given 0, define at 1,
2
where 3 penalizes components in the (typically normal) score direction. This structure stretches ambient space in directions normal to the data manifold, preserving tangential distances (Azeglio et al., 16 May 2025).
- Pullback score metric: For a learned diffeomorphism 4 and convex 5 where 6,
7
provides a Riemannian metric that admits closed-form geodesics in terms of 8 and 9, allowing analytic computation of exponential/log maps and distances (Diepeveen et al., 2024).
- Score Hessian/Jacobian-based metric: Using 0 (i.e., the Hessian of the log-density at noise level 1),
2
and defining the Riemannian metric via the inner product 3 (Saito et al., 28 Apr 2025).
Each construction leverages the fact that, for well-trained generative models under the manifold hypothesis, 4 is (approximately) normal to the manifold, and its Jacobian encodes local curvature and tangential structure. These constructions are formulated intrinsically (on 5) or extrinsically (in ambient coordinates).
3. Algorithms: Geodesics, Optimization, and Sampling Schemes
Fundamental geometric operations—projection, retraction, geodesics, Riemannian gradients—can be approximated or computed solely from the score and its derivatives:
- Projection and tangent space approximation: The link function
6
satisfies 7 ambient closest-point projection onto 8; the Hessian 9 approximates the tangent projector 0 (Kharitenko et al., 27 Sep 2025).
- Geodesic computation: Minimizing the variational energy
1
yields score-geodesics, where movement off the learned manifold is penalized (Azeglio et al., 16 May 2025, Saito et al., 28 Apr 2025). Discrete optimization is achieved via gradient descent or Riemannian-Adam.
- Optimization via DLF and DRGD: Denoising Landing Flow (DLF) and Denoising Riemannian Gradient Descent (DRGD) perform Riemannian optimization over the implicit manifold using only the score and its Jacobian, yielding convergence and feasibility guarantees under mild regularity and error bounds (Kharitenko et al., 27 Sep 2025).
- Sampling/regression: Modifications to Langevin or predictor-corrector steps leveraging approximate scores, or tampered-score sampling, enable manifold-concentrated or uniform-on-manifold sampling (Li et al., 29 Sep 2025). Algorithms extend to estimation of Riemannian 2-means/means or regression by exploiting analytic score-based surrogates for geodesic distances (Rygaard et al., 18 Feb 2025).
4. Applications: Generative Modeling, Data Interpolation, and Representation
- Generative sampling: Riemannian diffusion models generalize score-based generative modeling to arbitrary manifolds, with theory and implementation covering SDE reversals, likelihood estimation with Riemannian divergences, and sampling via geodesic random walks (Bortoli et al., 2022, Huang et al., 2022, Lou et al., 2023). Score-based Schrödinger bridges extend to interpolation between arbitrary marginal distributions on compact manifolds (Thornton et al., 2022).
- Manifold-aware image and data interpolation: Score-based metrics enable geodesic interpolation and extrapolation directly on data manifolds learned by diffusion models, yielding smoother, more structure-preserving transitions than Euclidean or spherical methods (Azeglio et al., 16 May 2025, Saito et al., 28 Apr 2025). This is effective in both latent and pixel/ambient spaces, as demonstrated on MNIST, natural images, and high-dimensional synthetic spheres.
- Dimension recovery and representation: Through score-based pullback geometry, Riemannian autoencoders can recover intrinsic manifold dimension and provide near-isometric charts for downstream tasks. Error bounds guarantee dimension recovery and near-isometric reconstruction (Diepeveen et al., 2024).
- Sub-Riemannian extensions: Score-based methods have been generalized to hypoelliptic, non-integrable geometries (e.g., the Heisenberg group) where the horizontal gradient replaces the full gradient and denoising losses must accommodate non-holonomic frames and the lack of closed-form heat kernels (Grong et al., 2024).
5. Theoretical Guarantees and Rate-Limiting Results
- Manifold recovery rates: Uniform approximation of projection and tangent projector 3 and 4 hold in a tubular neighborhood of 5 (Kharitenko et al., 27 Sep 2025).
- Rate separations: Manifold support is learned at 6 score error, while recovering on-manifold density or functional statistics requires 7 score error—a scale separation central to manifold-geometric algorithm design (Li et al., 29 Sep 2025).
- Convergence of optimization and sampling: DLF and DRGD converge to points with 8 manifold distance and Riemannian gradient norm. Langevin/tampered-score samplers with 9 score errors yield measures concentrating uniformly on the manifold (Kharitenko et al., 27 Sep 2025, Li et al., 29 Sep 2025).
- Scalability: High-dimensional Riemannian symmetric spaces (e.g., 0, 1, high-dimensional spheres) admit efficient computational ansätze for heat kernel and score evaluation, with proven generative and likelihood performance (Lou et al., 2023).
6. Empirical and Algorithmic Validation
Experimental studies demonstrate:
- Substantial improvements in interpolation fidelity (LPIPS, FID, KID, SSIM) for score-geodesic paths over baselines—especially for image morphing and semantic transitions in synthetic and natural image data (Azeglio et al., 16 May 2025, Saito et al., 28 Apr 2025).
- Efficient computation of Fréchet and diffusion means, 2-means clustering, and maximum-likelihood regression using score-matched estimates, avoiding the cubic scaling of traditional geodesic-based algorithms (Rygaard et al., 18 Feb 2025).
- Superior density estimation and sample quality for manifold-constrained data, outperforming traditional flows and variational methods on spheres, tori, and Lie groups (Huang et al., 2022, Lou et al., 2023).
- Robustness of manifold-concentrated sampling to score approximation errors, and practical applicability for large-scale models (e.g., Stable Diffusion 1.5) (Li et al., 29 Sep 2025).
7. Extensions and Open Directions
- Generalization to sub-Riemannian/hypoelliptic geometry: Horizontal score learning, divergence, and denoising losses adapted to non-holonomic distributions, with practical success on Heisenberg group bridges (Grong et al., 2024).
- Riemannian Schrödinger bridges and path-space interpolation: Variational and iterative proportional fitting techniques for interpolating distributions on 3 have been developed and shown convergent both theoretically and in practical generative settings (Thornton et al., 2022).
- Automated dimension and chart discovery: Score-based Riemannian autoencoders guarantee dimension recovery, nearly isometric reconstruction, and closed-form chart computation, validated via synthetic and real datasets (Diepeveen et al., 2024).
- Computational efficiency and scale: Efficient divergence estimation (QR/Hutchinson trace), adaptive heat-kernel computation, and manifold-optimized neural architectures enable these methods to scale to large data and model sizes (Huang et al., 2022, Lou et al., 2023).
- Future research: Open challenges include boundary and singular manifold cases, further scaling in sub-Riemannian and non-symmetric settings, and deeper integration of geometry-aware statistics in deep generative models.
Score-based Riemannian geometry stands at the intersection of generative modeling, stochastic analysis, and computational geometry, enabling the recovery and exploitation of manifold structure directly from learned or sampled score fields. The field’s rapid development is driven both by methodological advances—precise geometric surrogates, scalable algorithms, provable guarantees—and by the demands of practical, high-dimensional generative and control tasks in modern machine learning.