Intrinsic Wasserstein Rates for Score-Based Generative Models on Smooth Manifolds

Published 15 May 2026 in cs.LG and stat.ML | (2605.15822v1)

Abstract: Score-based generative models are trained in high-dimensional ambient spaces, yet many data distributions are supported on low-dimensional nonlinear structures. We prove that, for compact $d$-dimensional smooth manifolds $\mathcal{M} \subset [0,1]^D$ with $d > 2$ and $β$-Hölder densities strictly positive on $\mathcal{M}$, a variance-preserving SGM estimator attains the intrinsic Wasserstein--1 sample exponent $\tilde{\mathcal{O}}(D^{{\mathcal{O}_β(d)}n^{{-(β+1)/(d+2β)})$,}} up to logarithmic factors and explicit geometry and density factors. The full nonasymptotic bound explicitly isolates the finite-order geometry envelope, Hölder radius, density lower bound, ambient dependence, and finite-order correction terms. The analysis separates score approximation into a large-noise tangent-cell regime and a small-noise projection-centered, de-Gaussianized Laplace regime. The key technical ingredient is a ReLU implementation of nearest-projection coordinates via finite intrinsic anchors and Gauss--Newton iterations, rather than approximating the manifold projection as a black-box high-dimensional smooth map. Consequently, for families with polynomially controlled geometry and density lower bounds, the constructed score-network parameters have polynomial ambient dependence.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper demonstrates that SGMs achieve intrinsic minimax Wasserstein-1 rates on smooth manifolds by decoupling ambient and intrinsic dimensions.
It introduces a two-regime strategy, employing large-noise tangent-cell and small-noise projection-centered Laplace approximations to manage manifold singularity.
It provides dimension-explicit neural network constructions with polynomial dependence on the ambient dimension, enabling practical and efficient SGM implementations.

Intrinsic Wasserstein Rates for Score-Based Generative Models on Smooth Manifolds

Motivation and Problem Setting

This paper develops the statistical theory of score-based generative models (SGMs) in regimes where data distributions are supported on low-dimensional smooth manifolds $\mathcal{M} \subset [0,1]^D$ with intrinsic dimension $d \ll D$ . It analyzes the non-asymptotic sample complexity of variance-preserving (VP) SGMs in Wasserstein-1 distance, elucidating how the intrinsic geometry and regularity of the data distribution fundamentally govern learning rates.

The need for such an analysis arises from the manifold hypothesis in high-dimensional generative modeling, which posits that data encountered in practice lies near nonlinear, compact manifolds of small intrinsic dimension. Existing SGM bounds in the ambient space are vacuous or suboptimal for singular distributions, as they ignore this intrinsic structure. The present work provides sharp, geometry-aware rates that decouple ambient and intrinsic dimensions, isolating the effects of manifold curvature, smoothness, and density lower bounds.

Main Results and Technical Contributions

The central result establishes that, for a $\beta$ -Hölder density strictly positive on a compact $d$ -dimensional manifold $\mathcal{M}$ , the SGM estimator achieves

$\mathcal{W}_1(P_0, \widehat{P}_{t_0}) \le \tilde{O}\left( \Gamma_{\mathcal{M},q_{\rm geom}}^{C} B_0^C p_{\min}^{-C} D^{O_\beta(d)} n^{-(\beta+1)/(d+2\beta)} \right)$

up to logarithmic and explicit finite-order geometric/density factors, where $n$ is the number of samples, $B_0$ is the Hölder radius, $p_{\min}$ the density infimum, and $\Gamma_{\mathcal{M},q_{\rm geom}}$ captures manifold reach and high-order chart regularity. For $d \ll D$ 0 and suitable choices of finite geometric order $d \ll D$ 1, the sample exponent matches the intrinsic minimax rate for Wasserstein-1 on smooth $d \ll D$ 2-manifolds, with polynomial—rather than exponential—dependence on the ambient dimension $d \ll D$ 3 under controlled geometric conditions.

Two-Regime Score Approximation Strategy

The analysis is structured into two noise regimes, capturing the distinct approximation challenges induced by manifold singularity under different noise scales:

Large-Noise/Tangent-Cell Regime: For moderate to high noise, the perturbed measure is sufficiently regular, and score approximation reduces to intrinsic tangent-cell discretization. The requisite grid is over the $d \ll D$ 4-dimensional manifold, not the ambient space, yielding an intrinsic covering number $d \ll D$ 5 and explicit dependence on geometric chart constants.
Small-Noise/Projection-Centered Laplace Regime: At low noise, the singular normal Gaussian factor dominates, and naive spatial meshing becomes intractable. The key is a local expansion about the nearest point projection $d \ll D$ 6. The score is approximated by centering at $d \ll D$ 7, factorizing normal and tangential behavior, and implementing the necessary coordinate projections using efficient ReLU networks constructed via finite-anchor Gauss--Newton optimization in chart coordinates.

This dichotomous treatment enables the derivation of both nonasymptotic sample complexity rates and dimension-explicit neural network constructions.

Dimension-Explicit and Constructive Aspects

A major technical innovation is the explicit, constructive realization of the projection-network step:

Unlike prior work that treats $d \ll D$ 8 as a generic ambient smooth function, inducing an exponential-in- $d \ll D$ 9 cost in high-order approximation, the developed method obtains nearest-manifold coordinates via Gauss--Newton iterations anchored at intrinsic chart-grid points. This leverages the smooth, band-limited nature of the manifold and avoids ambient-combinatorial explosion.
As a result, all network width and sparsity bounds for the projection, chart extraction, and Laplace coefficients are polynomial in $\beta$ 0 under controlled geometry (e.g., affine spheres, tori, or polynomially regular embeddings), provided the required high-order derivatives are polynomially bounded.

Nonasymptotic Oracle and Minimax Rates

The analysis yields fully nonasymptotic risk bounds, explicitly isolating polynomial geometric, density, ambient, and finite-order terms. For $\beta$ 1 and canonical geometric order, both the approximation and stochastic estimation errors scale as $\beta$ 2, with lower-order correction terms polynomial in $\beta$ 3 and logarithmic in $\beta$ 4. This recovers and refines the intrinsic minimax rate for Wasserstein-1 distance with smooth densities on manifolds, validating the manifold hypothesis for SGM sample complexity in smooth settings.

The paper situates its results within a detailed taxonomy of contemporary theoretical literature:

Previous Manifold-Rate Theories: Prior analyses either focused on linear manifolds, incurring no curvature effects, or established intrinsic exponents with exponential or suboptimal ambient prefactors. Notably, concurrent work obtaining the same intrinsic exponent either leaves projection-step complexity opaque or incurs exponentially large prefactors due to black-box ambient smooth map approximation.
Support-Aware and Structured Approaches: Some recent results achieve sharper ambient factors for support-aware or piecewise affine architectures but restrict estimation to preprocessed local affine subspaces, limiting applicability to general score-based learning.
General Wasserstein Dimension Views: Broader generalization bounds based on empirical Wasserstein theory provide dimension-free rates but do not reach the smoothness-adaptive exponents derived here for H\"older densities on nonlinear manifolds.
Complementarity: In contrast, the present work delivers a direct, theorem-level audit for ambient-object SGM estimators, specifying all geometric, density, and ambient dependencies at the level of neural network architectures.

Methodological Implications

The main implication is that SGMs can achieve near-optimal sample complexity on data distributions supported on nonlinear manifolds, provided the neural network architectures and training pipelines are constructed to respect and exploit intrinsic geometry. The explicit, constructive nature of the projection network enables practical implementation of these models in very high-dimensional settings without incurring hidden exponential costs.

The combinatorial complexity bottleneck posed by ambient smooth-function approximation is eliminated by this intrinsic approach, making it scalable and amenable to further generalization in cases with variable curvature, reach, and density bounds.

Meticulous bookkeeping of all geometric and analytic factors renders the results broadly applicable to a range of high-dimensional, manifold-structured generative tasks.

Limitations, Open Problems, and Future Directions

Primary limitations are:

Conditional Assumptions: The results still rely on exact support on a known compact manifold, strict density positivity, and explicit reach and smoothness bounds.
Idealized Oracle Setting: The analysis targets the learned-score estimator trained by empirical risk minimization over branchwise-constructed function classes, rather than parameter-tying or fully joint neural architectures.

Key directions for future research include:

Relaxing the density lower bound and support exactness assumptions, e.g., including distributions with vanishing or unbounded density.
Extending to settings with ambient observational noise and model misspecification.
Bridging the gap to computational guarantees—bounding optimization and discretization errors in practical SGM training under geometric priors.
Handling empirical measures with unknown geometry and integrating manifold-learning into the estimation process.

Conclusion

This work rigorously establishes sample complexity bounds for score-based generative models on smooth nonlinear manifolds, matching the intrinsic minimax rate in Wassserstein-1 distance for $\beta$ 5 and smooth, positive densities. By separating approximation regimes according to noise scale and employing a dimension-explicit, constructive realization of the projection step, it achieves polynomial ambient complexity and sharp geometric dependence. These findings both advance the theoretical understanding of SGMs in the manifold regime and indicate practical directions for efficient high-dimensional generative modeling on structured supports (2605.15822).