Optimal score function estimation via derivatives constraints

Published 17 Jun 2026 in math.ST and stat.ML | (2606.19084v1)

Abstract: We consider the problem of score function estimation via empirical risk minimization. We first start with the question of inferring the score function of a probability measure $μ$ with density on the flat torus from a sample of distribution $μ$. We show that constraining the hypothesis space to a Sobolev ball is sufficient to prevent overfitting and obtaining minimax estimation rates. We then consider the problem of score function estimation in the context of score-based generative modeling. Again, under a conjecture tying the score estimation rates to the quality of the output of a score-based generative model, we obtain minimax rates for such an approach using score function estimators obtained by constraining the hypothesis class to a Sobolev ball.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper demonstrates that using derivative constraints in Sobolev spaces yields minimax optimal rates for score estimation via ERM.
It rigorously establishes convergence rates for score estimation in both classical flat torus settings and diffusion-based manifold models.
The study bridges theory and practice by showing that neural networks with bounded derivatives can generalize effectively in score matching.

Optimal Score Function Estimation via Derivative Constraints

Problem Formulation and Background

The paper "Optimal score function estimation via derivatives constraints" (2606.19084) tackles the statistical problem of estimating the score function (i.e., the gradient of the log-density) from samples of a distribution $\mu$ , with special focus on applications in score-based generative models (SGMs). The score function is central to SGMs—generative modeling frameworks where sampling from $\mu$ is achieved by simulating stochastic differential equations (SDEs) with a learned score function guiding the reverse-time process.

Three sources of estimation error are distinguished: initialization error, discretization error, and statistical error. This work concentrates on the latter—statistical error in score estimation—and provides rigorous minimax rate analysis for the empirical risk minimization (ERM) approach under derivative constraints. The study encompasses both classical score estimation (density supported on the flat torus) and score estimation in diffusion-based settings (density supported on a manifold, relevant to SGMs).

Score Estimation via Empirical Risk Minimization and Sobolev Constraints

Classical Setting

Under Assumption~1, $\mu$ has a $W^{s,\infty}$ density on the $d$ -dimensional flat torus $\mathcal{T}^d$ , bounded away from zero. The estimation strategy involves minimizing a penalized empirical loss inspired by Hyvärinen’s score matching [hyvarinien05]. For a candidate vector field $g$ , the loss is

$L^\lambda(g) = \int_{\mathcal{T}^d} l_g \, d\mu + \lambda \|\nabla^{s-1} g\|^2_{L^2(\mathcal{T}^d)}$

where $l_g = \|g\|^2 + 2\,\text{div}(g)$ , and the regularization term penalizes higher-order derivatives to ensure smoothness.

A key claim supported by rigorous analysis is that restricting the hypothesis class to a Sobolev ball (functions with derivatives controlled up to order $s+\ell-1$ ) and penalizing $\mu$ 0-th derivatives suffices to achieve minimax optimal rates, thereby preventing overfitting and delivering statistical generalization. Specifically, for density regularity $\mu$ 1, the minimax convergence rate for score function estimation is $\mu$ 2 in $\mu$ 3 norm, matching nonparametric optimal rates for derivative estimation [Stone-82, Stone-83].

Theoretical Guarantees

The main result (Theorem~1), proven via bias-variance decomposition and empirical-process chaining arguments, states that: $\mu$ 4 for properly chosen regularization and bandwidth parameters, with $\mu$ 5 in the constrained Sobolev hypothesis class. The conditions are minimal: no explicit parametric tuning is required beyond regularity, and the estimator generalizes even when the hypothesis class is instantiated as a neural network, provided the network's input derivatives are bounded (potentially via explicit or implicit regularization [Williams2019]).

Score Estimation in Diffusion Models on Manifolds

SGMs typically operate in high-dimensional spaces, but data distributions often concentrate on low-dimensional manifolds (e.g., natural images). The manifold setting is formalized in Assumption~2: $\mu$ 6 has a $\mu$ 7 density on a compact, boundaryless, $\mu$ 8-dimensional submanifold $\mu$ 9, with controlled reach and regularity. The analysis leverages the smoothing properties of the Ornstein-Uhlenbeck semigroup generated by the forward SDE, yielding $\mu$ 0 densities for $\mu$ 1 for $\mu$ 2.

The empirical risk minimization for score estimation uses the loss: $\mu$ 3 with $\mu$ 4 being the OU semigroup, which regularizes the functional, reducing the need for explicit penalization on higher-order derivatives compared to the classical case.

Main Results

The central theorem for the diffusion setting (Theorem~2) establishes that constraining the candidate score functions to a Sobolev ball, with radius scaling as $\mu$ 5 (reflecting the explosion of derivatives as $\mu$ 6), achieves minimax rates for score estimation. The estimator $\mu$ 7 satisfies

$\mu$ 8

where $\mu$ 9 encodes the bias-variance trade-off governed by sample size $W^{s,\infty}$ 0, bandwidth $W^{s,\infty}$ 1, and time parameter $W^{s,\infty}$ 2. For an optimally chosen bandwidth $W^{s,\infty}$ 3, the rate coincides—up to logarithmic factors—with the minimax optimal rate for measure estimation in Wasserstein-1 distance [NilesWeed2022, Divol2022]: $W^{s,\infty}$ 4 as formalized in Corollary~1. This matches the optimal convergence for empirical measure estimation, even under the manifold hypothesis, and extends to neural network-based score estimators under input derivative control.

Technical Contributions

The paper introduces several key advances:

Sharp Minimax Rates via ERM: It rigorously demonstrates that ERM with derivative constraints (Sobolev balls), rather than smoothing via kernels or spline/structured networks, achieves minimax optimal rates for score estimation.
Generalization to Manifolds: The results apply to measures supported on manifolds, with technical controls on density regularity and reach, bridging the gap between prior works that focused on $W^{s,\infty}$ 5 [StephanovitchAaamariLevrard], structured networks [oko], or smoothing [Gabriel2025].
Chaining and Empirical Process Analysis: The variance analysis uses chaining mechanisms and entropy number bounds for function classes with derivative constraints, leveraging sophisticated empirical process theory [Massart].
Regularity Analysis of Score Functions in Diffusion: Careful decomposition of the diffusion-induced score function for manifold-supported data, controlling derivative explosion as noise decays to zero.
Bridging Practical Architectures: The results show that neural network hypothesis classes (ubiquitous in practice) will not overfit if their derivatives are bounded, suggesting practical regularization strategies (explicit penalty or architectural constraints) rooted in statistical theory.

Implications and Future Directions

The theoretical guarantees generalize and formalize the statistical optimality of score estimation in SGMs, closing gaps between practical empirical risk minimization and classical kernel/spline approaches. The findings imply that, for manifold-supported data and properly regularized hypothesis classes, generative models using score matching can achieve optimal rates, both in score function estimation and in measure recovery (in Wasserstein sense).

These results motivate several future research directions:

Development of Practical Regularizers: Designing efficient penalization schemes for neural networks that enforce Sobolev-type derivative constraints, possibly adapting architecture or training protocol to guarantee generalization.
Extension to Non-Euclidean Manifolds: The analysis assumes a compact, boundaryless manifold; further exploration could address noncompact or boundary manifolds relevant to real-world data.
Adaptive Estimation: Investigating adaptive methods that select regularization and hypothesis class parameters based on data, potentially leveraging cross-validation or empirical complexity measures [Comte-Sacko-Duval].
Generalization to Adversarial and High-Dimensional Regimes: Extending the minimax rate analysis to adversarial losses, mixtures, or high-dimensional regimes where intrinsic and ambient dimensions diverge [Tang2023, Stanczuk2024].
Implications for Representation Learning and Intrinsic Dimension Estimation: The regularity properties of the score function and its behavior near the manifold could be leveraged to infer intrinsic geometries or dimensionalities, as proposed in [Stanczuk2024].

Conclusion

The paper provides a comprehensive theoretical foundation for optimal score function estimation via derivative constraints, demonstrating minimax optimality for ERM approaches in both classical and diffusion settings with manifold-supported data. The results inform practical design choices in SGMs, showing the necessity of regularization via Sobolev balls and connecting statistical theory with deep learning architectures. The convergence guarantees and methodological advances pave the way for principled generative modeling and robust statistical estimation in modern AI applications.

Markdown Report Issue