Score-Based Pullback Formulations
- Score-based pullback formulations define a Riemannian metric on data manifolds by pulling back the Euclidean metric through the score map, capturing intrinsic geometric properties.
- They enable closed-form geodesic computations and Riemannian distances, facilitating effective manifold interpolation and applications like Riemannian autoencoders.
- The integration with anisotropic normalizing flows and isometry regularization ensures scalability, accurate dimension estimation, and robust manifold learning.
Score-based pullback formulations constitute a scalable, data-driven approach to extracting and utilizing the Riemannian geometry of data manifolds. By integrating concepts from pullback Riemannian geometry and generative modeling—specifically, the differential structure induced by the probability density function of the data—these frameworks operationalize a metric structure via the pullback of the Euclidean metric under the score map. The construction features closed-form geodesics aligned with the data distribution, endowing the learned Riemannian manifold with interpretable charts, autoencoders with dimension estimation, and efficient integration with anisotropic normalizing flows. This geometry-centric methodology is demonstrably tractable for both geometry extraction and learning scalability, with closed-form solutions for key manifold operations, global charts, and practical error control (Diepeveen et al., 2024).
1. Score-Based Pullback Metric
Let be a smooth probability density on , with corresponding score . The score-based pullback metric endows with a Riemannian metric , defined as the pullback of the standard Euclidean metric under the score map . For tangent vectors , the local inner product becomes
where is the Jacobian of the score. In matrix notation, the metric tensor is
This data-driven construction ensures that the local geometry reflects properties of the data distribution through second-order score structure.
2. Geodesics and Riemannian Distance
For general densities, the geodesic equation on takes the form , with the Levi-Civita connection of , leading to a second-order ODE involving the Christoffel symbols of . Crucially, if the density admits the factorization , with strongly convex and diffeomorphism , geodesics and their Riemannian distances admit closed-form solutions. Specifically, the geodesic between and is
where is the Fenchel conjugate of . In the important quadratic case , with symmetric positive definite, these simplify:
The linearization induced by maps the density to a Gaussian-like structure, so straight-line interpolation in the feature-space is mapped back to curves in -space that follow regions of high data density.
3. Riemannian Autoencoder Construction
The framework supports global charting of the data manifold via a Riemannian autoencoder (RAE), which exploits the quadratic-pullback scenario for explicit encoding and decoding maps. Given base point and a selection of principal variance directions, the encoder is defined, via the Riemannian log map, as
In the quadratic case, this reduces to coordinate selection in feature-space. The decoder is
This formulation admits provable reconstruction error bounds: if the neglected variance directions sum to at most , then
with dependent on Jacobian norms and determinants of , .
4. Integration with Anisotropic Normalizing Flows
Score-based pullback geometry integrates naturally with anisotropic normalizing flows (NFs), allowing learned diffeomorphisms parameterize the transformation to latent Gaussian structure. The NF is trained using an objective function that includes isometry regularization: Here, the isometry regularizer
enforces approximate local orthonormality of , ensuring that approaches a local -isometry and aligning the learned pullback metric with the score-based metric.
5. Scalability and Computational Properties
The methodology is architected for efficiency in both training and downstream geometric operations. The workflow is as follows:
- Anisotropic NF and covariance are trained via the loss .
- The learned and are fixed, and the pullback metric is constructed.
- Geodesics and Riemannian distances are computed in closed form, bypassing ODE integration.
- The Riemannian autoencoder is constructed using principal variance directions and closed-form encoding/decoding maps.
Computational costs are dominated by the Jacobian computation:
- Metric evaluation: per data point.
- Geodesic interpolation: with precomputed , .
- Isometry loss: naively, reduced to – per layer with structured parameterizations.
This framework is the first scalable approach for extracting the complete geometry of the data manifold, producing geodesics that traverse data support, estimating intrinsic dimensions, and enabling interpretable manifold learning (Diepeveen et al., 2024).
6. Empirical Performance and Applications
Empirical results on diverse datasets, including image data, demonstrate that score-based pullback formulations yield high-quality geodesics restricted to the data manifold, accurate estimation of intrinsic manifold dimension, and coherent global charts. The use of isometry regularization in conjunction with anisotropic flows ensures that the learned geometry is faithful to the underlying data distribution, facilitating effective representation learning, manifold interpolation, and downstream inference tasks. The construction also provides non-asymptotic error guarantees for autoencoder reconstruction, giving rigorous performance control for practical applications (Diepeveen et al., 2024).