- The paper introduces a hierarchy of denoisers that bridges the identity map and the optimal transport map using higher-order score expansions.
- It leverages explicit Bell polynomial relationships to construct estimable polynomial functions solely from the observable data distribution.
- Theoretical guarantees show that as the expansion order increases, the Wasserstein error contracts at a rate of O(eta^(K+1)) under smoothness assumptions.
Distributional Shrinkage and Hierarchical Optimal Transport Denoisers
Problem Definition and Motivations
The paper "Distributional Shrinkage II: Optimal Transport Denoisers with Higher-Order Scores" (2512.09295) addresses signal denoising from a distributional perspective. The setup consists of recovering a scalar signal X∼P from noisy measurements Y=X+σZ, with Z∼N(0,1) and σ>0 known, where only the distribution Q of Y is observable. The paper frames the goal as distributional recovery: rather than minimizing mean squared error (MSE), focus is placed on distributional proximity between the denoised output and the original distribution P, measured by the Wasserstein distance Wr.
The work motivates the investigation through the inadequacy of classical denoisers—such as the Bayes-optimal and empirical Bayes denoisers—which, while optimal for MSE, induce excessive shrinkage at the distributional level (cf. [liang2025distributional, garcia2024new, jaffe2025constrained]). This motivates a hierarchy of optimal transport (OT) denoisers, constructed to achieve close distributional alignment in Wasserstein distance, with improved performance as more structure is incorporated.
Hierarchy of Distributional Denoisers: Higher-Order Score Expansion
The central contribution is a hierarchy of denoisers, {TK}K=0∞, which interpolate between the identity map (T0(y)=y) and the optimal transport map (T∞), defined via the quantile-matching transformation T∞(y)=F−1∘G(y) (where F−1, G are the quantile and CDF of P and Q, respectively).
Each denoiser TK is constructed via a truncated series expansion
TK(y)=y+k=1∑Kk!ηkhk(y),η=σ2/2,
where the hk(y) are explicitly defined polynomial functions of higher-order score functions of Q (with q=dQ/dy). These higher-order scores, recursively structured through partial Bell polynomials, encode information about the series expansion of the optimal transport map.
Crucially, these hk depend exclusively on Q; knowledge of P is not required. Explicitly:
- h1(y)=q(y)q′(y),
- h2(y) involves q′′(y), q′(y)2, and the derivative of h1,
- Higher orders involve further derivatives and cross-terms, recursively determined via Bell polynomial relations.
This combinatorial structure, characterized in closed form, is a key technical insight, as it links the denoising map to estimable objects from observable data.
Theoretical Guarantees: Distributional Accuracy and Noise Asymptotics
The paper establishes that for each K, TK achieves Wasserstein approximation to P of order O(ηK+1), assuming sufficient smoothness and regularity. Specifically, for compact supp(P) and sufficiently smooth densities,
Wr(TK♯Q,P)≲ηK+1,
and similarly, the sup-norm error between TK and T∞ contracts at the same rate as σ→0. This quantifies a precise tradeoff between the order of expansion and denoising fidelity in Wasserstein space.
In the small-noise limit, higher K results in dramatically improved matching—a property not paralleled by traditional MSE-minimizing denoisers.
Denoiser Estimation: Plug-In and Score Matching Approaches
Implementation of these denoisers from data relies on estimation of higher-order score functions of Q, specifically the ratios q(m)/q.
Two estimation paradigms are studied:
- Gaussian Kernel Smoothing: For each derivative q(m), a plug-in estimator q^(m) is constructed from i.i.d. data, with optimal rates of convergence established (e.g., n−4/(2m+5) MSE rate for the mth derivative at fixed y).
- Higher-Order Score Matching: Direct estimation of the score functions in function space via generalized score matching [hyvarinen2005estimation], with risk rates governed by the Hölder smoothness of the true score function (parametric rate n−1/2 possible for sufficient smoothness).
Both approaches yield empirically tractable procedures, with theoretical performance guarantees, for realizing the entire denoiser hierarchy from observed data.
Relation to Prior Literature
The approach fundamentally differs from classical empirical Bayes and shrinkage methods (Stein, James-Stein, etc.), which either assume prior knowledge or estimate the unobserved P (g-modeling) before denoising. The current paper demonstrates that direct modeling and higher-order estimation in the observation space (Y) can recover the optimal transport map in distributional metrics without explicit prior estimation.
Prior work (e.g., [liang2025distributional]) analyzed lower-order OT denoisers or imposed structural assumptions on the prior. The present work generalizes the analytic framework to arbitrary order, makes explicit the combinatorics underpinning the construction, and characterizes the statistical estimation landscape for these higher-order objects.
Implications and Future Directions
These results provide a principled methodology for nonparametric, data-driven denoising that explicitly targets distributional reconstruction, with rapidly contracting error in Wasserstein space using only observable statistics.
From a practical standpoint, this framework is highly relevant to modern generative modeling, where distributional metrics (rather than pointwise error) are fundamental—for instance, in diffusion-based models, where denoising score estimators play a central role. The characterizations here suggest a principled route to estimator design with performance guarantees at the optimal transport level.
Theoretically, uncovering the combinatorial Bell polynomial structure behind denoising maps opens directions for further exploration, particularly in higher dimensions (where optimal transport maps are more complex), for other noise models, or in non-Euclidean settings.
Conclusion
The paper introduces and analyzes a hierarchy of distributional denoisers, constructed using higher-order score functions of the observed noisy distribution, with explicit characterization via Bell polynomials and rigorous guarantees on denoising quality in Wasserstein distance. It provides both the mathematical underpinnings and practical estimation tools to achieve distribution-matching denoising, representing a significant extension of distributional shrinkage theory and methodology (2512.09295).