Papers
Topics
Authors
Recent
Search
2000 character limit reached

Score-Based Pullback Formulations

Updated 5 March 2026
  • Score-based pullback formulations define a Riemannian metric on data manifolds by pulling back the Euclidean metric through the score map, capturing intrinsic geometric properties.
  • They enable closed-form geodesic computations and Riemannian distances, facilitating effective manifold interpolation and applications like Riemannian autoencoders.
  • The integration with anisotropic normalizing flows and isometry regularization ensures scalability, accurate dimension estimation, and robust manifold learning.

Score-based pullback formulations constitute a scalable, data-driven approach to extracting and utilizing the Riemannian geometry of data manifolds. By integrating concepts from pullback Riemannian geometry and generative modeling—specifically, the differential structure induced by the probability density function of the data—these frameworks operationalize a metric structure via the pullback of the Euclidean metric under the score map. The construction features closed-form geodesics aligned with the data distribution, endowing the learned Riemannian manifold with interpretable charts, autoencoders with dimension estimation, and efficient integration with anisotropic normalizing flows. This geometry-centric methodology is demonstrably tractable for both geometry extraction and learning scalability, with closed-form solutions for key manifold operations, global charts, and practical error control (Diepeveen et al., 2024).

1. Score-Based Pullback Metric

Let p(x)p(x) be a smooth probability density on Rn\mathbb{R}^n, with corresponding score s(x)=xlogp(x)s(x) = \nabla_x \log p(x). The score-based pullback metric endows Rn\mathbb{R}^n with a Riemannian metric gxg_x, defined as the pullback of the standard Euclidean metric under the score map Φ(x)=s(x)\Phi(x) = s(x). For tangent vectors v,wTxRnRnv, w \in T_x\mathbb{R}^n \cong \mathbb{R}^n, the local inner product becomes

gx(v,w)=Dxs[v],Dxs[w]2=(Dxs(x)v)(Dxs(x)w),g_x(v, w) = \langle D_x s [v], D_x s [w] \rangle_{\ell^2} = (D_x s(x) v)^\top (D_x s(x) w),

where Dxs(x)D_x s(x) is the Jacobian of the score. In matrix notation, the metric tensor is

G(x)=(Dxs(x))Dxs(x)Rn×n.G(x) = (D_x s(x))^\top D_x s(x) \in \mathbb{R}^{n \times n}.

This data-driven construction ensures that the local geometry reflects properties of the data distribution through second-order score structure.

2. Geodesics and Riemannian Distance

For general densities, the geodesic equation on (Rn,g)(\mathbb{R}^n, g) takes the form γ˙γ˙=0\nabla_{\dot{\gamma}} \dot{\gamma} = 0, with \nabla the Levi-Civita connection of gg, leading to a second-order ODE involving the Christoffel symbols of G(x)G(x). Crucially, if the density admits the factorization p(x)eψ(φ(x))p(x) \propto e^{-\psi(\varphi(x))}, with strongly convex ψ\psi and diffeomorphism φ\varphi, geodesics and their Riemannian distances admit closed-form solutions. Specifically, the geodesic γx,y(t)\gamma_{x,y}(t) between xx and yy is

γx,y(t)=(φ1ψ)((1t)ψφ(x)+tψφ(y)),t[0,1],\gamma_{x, y}(t) = \left( \varphi^{-1} \circ \nabla \psi^* \right) \left( (1-t) \nabla \psi \circ \varphi(x) + t \nabla \psi \circ \varphi(y) \right), \quad t \in [0, 1],

where ψ\psi^* is the Fenchel conjugate of ψ\psi. In the important quadratic case ψ(v)=12vΣ1v\psi(v) = \frac{1}{2} v^\top \Sigma^{-1} v, with Σ\Sigma symmetric positive definite, these simplify: γx,y(t)=φ1((1t)φ(x)+tφ(y)),\gamma_{x, y}(t) = \varphi^{-1}\big((1-t)\varphi(x) + t\varphi(y)\big),

dg(x,y)=Σ1(φ(x)φ(y))2.d_g(x, y) = \|\Sigma^{-1}(\varphi(x)-\varphi(y))\|_2.

The linearization induced by Φ\Phi maps the density to a Gaussian-like structure, so straight-line interpolation in the feature-space is mapped back to curves in xx-space that follow regions of high data density.

3. Riemannian Autoencoder Construction

The framework supports global charting of the data manifold via a Riemannian autoencoder (RAE), which exploits the quadratic-pullback scenario for explicit encoding and decoding maps. Given base point x0=φ1(0)x_0 = \varphi^{-1}(0) and a selection of dεd_\varepsilon principal variance directions, the encoder Eε:RnRdεE_\varepsilon: \mathbb{R}^n \rightarrow \mathbb{R}^{d_\varepsilon} is defined, via the Riemannian log map, as

Eε(x)i=logx0φ(x),vig,vi=D0φ1[ei].E_\varepsilon(x)_i = \langle \log^\varphi_{x_0}(x), v_i \rangle_g, \qquad v_i = D_0 \varphi^{-1}[e_i].

In the quadratic case, this reduces to coordinate selection in feature-space. The decoder Dε:RdεRnD_\varepsilon: \mathbb{R}^{d_\varepsilon} \rightarrow \mathbb{R}^n is

Dε(z)=φ1(i=1dεziei).D_\varepsilon(z) = \varphi^{-1}\left( \sum_{i=1}^{d_\varepsilon} z_i e_i \right).

This formulation admits provable reconstruction error bounds: if the neglected variance directions sum to at most εi=1nΣii\varepsilon \sum_{i=1}^n \Sigma_{ii}, then

Exp[Dε(Eε(x))x22]Cεi=1nΣii+o(ε),\mathbb{E}_{x \sim p}\left[ \| D_\varepsilon(E_\varepsilon(x)) - x \|_2^2 \right] \leq C\varepsilon \sum_{i=1}^n \Sigma_{ii} + o(\varepsilon),

with CC dependent on Jacobian norms and determinants of φ\varphi, φ1\varphi^{-1}.

4. Integration with Anisotropic Normalizing Flows

Score-based pullback geometry integrates naturally with anisotropic normalizing flows (NFs), allowing learned diffeomorphisms φθ\varphi_\theta parameterize the transformation to latent Gaussian structure. The NF is trained using an objective function that includes isometry regularization: L(ϕ,θ)=Expdata[logpϕ,θ(x)]+λvolE[(logdetDxφθ)2]+λisoE[DxφθDxφθIF2].\mathcal{L}(\phi, \theta) = \mathbb{E}_{x \sim p_{\mathrm{data}}}[-\log p_{\phi, \theta}(x)] + \lambda_{\mathrm{vol}} \mathbb{E}[ (\log|\det D_x \varphi_\theta|)^2 ] + \lambda_{\mathrm{iso}} \mathbb{E}[ \| D_x \varphi_\theta^\top D_x \varphi_\theta - I \|_F^2 ]. Here, the isometry regularizer

Riso(θ)=EDxφθDxφθIF2R_{\mathrm{iso}}(\theta) = \mathbb{E}\left\| D_x \varphi_\theta^\top D_x \varphi_\theta - I \right\|_F^2

enforces approximate local orthonormality of DxφθD_x \varphi_\theta, ensuring that φθ\varphi_\theta approaches a local 2\ell^2-isometry and aligning the learned pullback metric with the score-based metric.

5. Scalability and Computational Properties

The methodology is architected for efficiency in both training and downstream geometric operations. The workflow is as follows:

  1. Anisotropic NF φθ\varphi_\theta and covariance Σϕ\Sigma_\phi are trained via the loss L(ϕ,θ)\mathcal{L}(\phi, \theta).
  2. The learned φ\varphi and Σ\Sigma are fixed, and the pullback metric gx=(Dx(Σ1φ(x)))Dx(Σ1φ(x))g_x = (D_x(\Sigma^{-1}\varphi(x)))^\top D_x(\Sigma^{-1}\varphi(x)) is constructed.
  3. Geodesics and Riemannian distances are computed in closed form, bypassing ODE integration.
  4. The Riemannian autoencoder is constructed using principal variance directions and closed-form encoding/decoding maps.

Computational costs are dominated by the Jacobian computation:

  • Metric evaluation: O(n2)O(n^2) per data point.
  • Geodesic interpolation: O(n)O(n) with precomputed φ\varphi, ψ\psi.
  • Isometry loss: O(n3)O(n^3) naively, reduced to O(n)O(n)O(n2)O(n^2) per layer with structured parameterizations.

This framework is the first scalable approach for extracting the complete geometry of the data manifold, producing geodesics that traverse data support, estimating intrinsic dimensions, and enabling interpretable manifold learning (Diepeveen et al., 2024).

6. Empirical Performance and Applications

Empirical results on diverse datasets, including image data, demonstrate that score-based pullback formulations yield high-quality geodesics restricted to the data manifold, accurate estimation of intrinsic manifold dimension, and coherent global charts. The use of isometry regularization in conjunction with anisotropic flows ensures that the learned geometry is faithful to the underlying data distribution, facilitating effective representation learning, manifold interpolation, and downstream inference tasks. The construction also provides non-asymptotic error guarantees for autoencoder reconstruction, giving rigorous performance control for practical applications (Diepeveen et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Score-Based Pullback Formulations.