Differentiable Smoothers

Updated 1 April 2026

Differentiable smoothers are mathematical operators that convert noisy or discrete data into smooth outputs using techniques like PDE limits, spline models, and neural methods.
They optimize key aspects such as edge preservation, derivative estimation, and state-space inference by effectively balancing smoothing with feature retention.
Recent advances include stable numerical discretizations, adaptive parameter selection, and integration with neural operators to enhance performance in scientific computing and deep learning.

A differentiable smoother is a mathematical or algorithmic operator that produces a smoothed output—typically from discrete, noisy, or non-smooth data—such that the map from inputs to outputs is differentiable (often in the classical, not merely generalized, sense). Differentiable smoothers arise across applied mathematics, statistics, optimization, and deep learning, including as filtering operators in PDEs, spline-based regression and interpolation, variational smoothing, probabilistic state estimation, and neural methods for scientific computing. Recent research has delivered rigorous frameworks for constructing smoothers that are not only differentiable but admit stable numerical implementation, preserve key geometric or statistical properties, and are compatible with modern gradient-based learning systems.

1. Smoothing via Anisotropic PDE Limits of Local Order-p Means

Differentiable smoothers in signal and image processing have been systematized through the theory of limit PDEs induced by iterated local order- $p$ mean filters. For a scalar function $u$ (e.g., a grayscale image), one replaces $u$ at a point by the order- $p$ mean over a ball of radius $\rho$ , then lets $\rho \to 0$ . The Taylor expansion of this operation yields explicit PDEs governing the infinitesimal smoothing evolution:

1D: $u_t = (p-1) u_{xx}$
2D: $u_t = u_{\xi\xi} + (p-1) u_{\eta\eta}$ , where $\eta\parallel\nabla u$ , $\xi\perp\nabla u$
3D: $u$ 0

The parameter $u$ 1 interpolates a continuous family of behaviors:

$u$ 2: isotropic heat flow.
$u$ 3: mean curvature motion, which is edge-preserving.
$u$ 4: forward diffusion along level lines, backward (sharpening) across, enabling both smoothing and mild edge enhancement.
$u$ 5: mode filtering (enhanced edge preservation and even extension of regions).
$u$ 6: includes classical image sharpening flows such as Gabor's.

Numerical discretization is achieved by a four-step explicit finite-difference scheme with splitting into axial and diagonal stencils for both diffusion and curvature terms. Stability (in $u$ 7, i.e., maximum–minimum principle) is guaranteed via a CFL-type condition and selective freezing of the backward-parabolic term at extrema (minmod principle). This unified framework provides both pure smoothing and shape simplification (e.g., mean curvature) as well as edge-preservation/sharpening, simply by tuning $u$ 8 (Welk et al., 2020).

2. Differentiable Smoothers in Spline and Variational Models

Classical smoothing splines, penalized B-splines with derivative-based penalties, and maximum likelihood spline estimators embody differentiability by design:

Penalized B-Spline/D-Spline Smoothers: Given a B-spline basis $u$ 9, solve

$u$ 0

where $u$ 1 penalizes the $u$ 2th derivative, producing globally $u$ 3 or $u$ 4 (with order $u$ 5 B-splines) regularity in the solution. Efficient sparse matrix algorithms and tensor-product extensions scale these approaches to multi-dimensional and irregular data (Wood, 2016).

MLE-Spline Smoothers: For derivative estimation under irregular or coarse sampling, smoothers arise as solutions to penalized likelihood or constrained optimization problems (e.g., with $u$ 6-constraint on $u$ 7). The optimal $u$ 8 is a $u$ 9-order spline with knots at data points, yielding explicit finite-dimensional convex programs. Recursive online update algorithms further support real-time operation (Avrachenkov et al., 29 Jul 2025).
Overlapping Spline (O-spline) Smoothers: The O-spline finite element scheme constructs a computationally efficient, $p$ 0-times continuously differentiable approximation to the $p$ 1-th order Integrated Wiener Process (IWP) prior. The O-spline achieves $p$ 2 covariance convergence (with $p$ 3 elements), derivative-consistent joint inference, and a Markov (diagonal precision) structure, dramatically improving scalability (Zhang et al., 2023).

3. Differentiable Smoothers for Nonsmooth/Non-differentiable Functions

Several frameworks produce differentiable approximations to non-differentiable operators or objective functions, crucial for both theory and computational implementation:

Mollifier-based Smoothing: For a convex, Lipschitz function $p$ 4, the mollified approximation

$p$ 5

inherits $p$ 6 smoothness from the mollifier $p$ 7, with

$p$ 8

and all derivatives converging to those of $p$ 9 (in weak or pointwise sense), as $\rho$ 0. Applications include smoothing nonsmooth loss functions (ReLU, Huber, check function) for gradient-based optimization, statistical theory, and deep learning (Dong et al., 2023).

Delta-smoothing for Concave Functions: A piecewise cubic interpolant attached at a threshold $\rho$ 1 provides a concave, increasing, continuously differentiable surrogate for root-like functions $\rho$ 2, controlling the divergence of the derivative at zero and bounding the approximation error (Xu et al., 2018).
Optimization/Statistical Smoothing and Unbiased Gradient Estimation: Stochastic smoothing techniques for black-box non-differentiable functions define

$\rho$ 3

with unbiased, universally valid score-function gradients under minimal regularity on $\rho$ 4. This enables practical, low-variance gradient estimation strategies for combinatorial and algorithmic problems—differentiable sorting, ranking, shortest-paths, differentiable rendering—via noise-induced smoothing and advanced variance reduction tools (RQMC, antithetic, control variates) (Petersen et al., 2024).

Smoothed Quantile Regression: Convolutional smoothing of the entire quantile regression objective produces a $\rho$ 5 criterion in coefficients, yielding sharper Bahadur–Kiefer expansions, statistical efficiency, and plug-in conditional density estimation that avoids the curse of dimensionality (Fernandes et al., 2019).

4. Differentiable Smoothers in State Estimation and Factor Graph Optimization

In state-space models and graphical probabilistic inference, differentiable smoothers are constructed by unrolling iterative optimization algorithms within factor graphs:

Factor Graph Smoothing: For trajectories $\rho$ 6 and measurements $\rho$ 7, the MAP trajectory estimate solves

$\rho$ 8

with process and measurement factors. Rather than solving to convergence, a fixed number $\rho$ 9 of Gauss–Newton or Levenberg–Marquardt steps is unrolled, maintaining differentiability throughout. Backpropagation passes through all Jacobians, linear solves (sparse Cholesky or CG), and Lie group retractions. Covariance parameterization (via Cholesky factors) enables learning heteroscedastic or manifold-structured uncertainty models. End-to-end learned smoothers achieve significant accuracy gains while retaining uncertainty quantification and efficient inference (Yi et al., 2021).

5. Differentiable Multigrid Smoothers and Neural Operators

Sophisticated differentiable smoothers serve as key components in scientific computing and PDE solvers:

Neural Multigrid Smoothers: Classical smoothers (Jacobi, Gauss–Seidel) are supplanted by neural operators (convolutional NNs or Fourier neural operators), each trained to operate on a specific frequency band via level-wise spectrally-filtered loss. Training is offline and level-by-level, with each operator focusing on damping residual frequencies missed by previous levels. Plug-in integration into the classical multigrid V-cycle yields convergence rates and iteration counts orders of magnitude better than conventional smoothers, especially for convolution-type integral equations and ill-conditioned systems (Li et al., 1 Mar 2026, Huang et al., 2021).
Parameterization and Training: Smoothers as NNs (CNNs or FNOs) are designed to act on local stencils or frequency representations. Level-wise loss functions derived from multigrid convergence theory, including error operator minimization in the $\rho \to 0$ 0-norm and spectral radii, are used for direct adaptive learning, supporting generalization to large-scale or variable-coefficient PDE problems. The full multigrid cycle remains differentiable and compatible with modern auto-diff toolchains, enabling future use in learned solvers or reinforcement learning (Huang et al., 2021).

6. Data-Adaptive and Statistical Differentiable Smoothers

Choosing the degree of smoothing adaptively, particularly in the estimation of non-pathwise differentiable functionals, is increasingly addressed with rigorous statistical methodology:

Oracle and Data-Driven Smoothing Parameter Selection: For a family of approximating smoothers $\rho \to 0$ 1 to a possibly non-smooth target $\rho \to 0$ 2, a data-adaptive choice of $\rho \to 0$ 3 is made to optimize MSE (bias–variance tradeoff), leveraging sample splitting and plug-in estimators for the explosion/decay rates of variance and bias:

$\rho \to 0$ 4

with optimal $\rho \to 0$ 5, where $\rho \to 0$ 6, $\rho \to 0$ 7. Adaptive methods achieve nearly optimal rates and valid confidence intervals even in non-regular/irregular models (Bibaut et al., 2017).

7. Qualitative Summary and Significance

Differentiable smoothers—whether defined via PDE limits, spline variational problems, probabilistic graphical models, or black-box stochastic smoothing—enable the analysis, optimization, and learning of systems where both high-order smoothness and computational/numerical tractability are essential. Contemporary research unifies a diverse set of tools (PDE geometry, splines, neural networks, convolution operators, statistical principles), offering flexible foundations for edge-preserving denoising, efficient derivative estimation under noisy/coarse sampling, plug-and-play learning in hybrid model-based/data-driven state estimation, and rigorous statistical inference for both classical and non-regular functionals.

The breadth of methods reviewed—from explicit $\rho \to 0$ 8 mollifiers (Dong et al., 2023) and piecewise-smooth cubic constructions (Xu et al., 2018), to unrolled optimization for factor graphs (Yi et al., 2021), to Fourier-based neural smoothers for scientific computing (Li et al., 1 Mar 2026)—demonstrates that differentiable smoothers are a fundamental, cross-cutting technology, supporting modern demands for robust, scalable, and interpretable modeling pipelines.