Papers
Topics
Authors
Recent
Search
2000 character limit reached

LS-SGLD: Laplacian Smoothing for Bayesian Sampling

Updated 23 March 2026
  • LS-SGLD is a Bayesian sampling method that applies Laplacian smoothing as a preconditioner to reduce the high variance in stochastic gradient Langevin dynamics.
  • It uses FFT-based computation to efficiently evaluate the smoothing operator, achieving lower discretization error measured in 2-Wasserstein distance.
  • Empirical results demonstrate improved posterior sampling performance in tasks like Bayesian logistic regression and CNN training, allowing for larger stable step sizes.

Laplacian Smoothing Stochastic Gradient Langevin Dynamics (LS-SGLD) is a Markov Chain Monte Carlo (MCMC) method designed to address the high-variance bottleneck of stochastic gradient Langevin dynamics (SGLD) in Bayesian sampling. By introducing Laplacian smoothing (LS) as a preconditioner, it achieves provably reduced discretization error in 2-Wasserstein distance across both log-concave and non-log-concave targets, with negligible computational overhead relative to standard SGLD. The LS-SGLD algorithm demonstrates improved empirical performance in posterior sampling, Bayesian logistic regression, and Bayesian convolutional neural network (CNN) training, with robust variance reduction and larger stable step sizes (Wang et al., 2019).

1. Motivation and Conceptual Foundations

LS-SGLD is motivated by the variance limitations of SGLD, which is based on discretizing the continuous-time overdamped Langevin stochastic differential equation (SDE):

dθt=U(θt)dt+2dBt,d\theta_t = -\nabla U(\theta_t)\, dt + \sqrt{2}\, dB_t,

where the target distribution is π(θ)exp(U(θ))\pi(\theta) \propto \exp(-U(\theta)). In practice, U(θ)\nabla U(\theta) is approximated by a mini-batch stochastic gradient gkg_k, resulting in the update:

θk+1=θkηgk+2ηξk,ξkN(0,I).\theta_{k+1} = \theta_k - \eta g_k + \sqrt{2\eta}\, \xi_k, \quad \xi_k \sim \mathcal{N}(0, I).

High variance in gkg_k compels the use of small step-sizes η\eta. Laplacian smoothing, previously introduced for stochastic gradient descent (SGD) [Osher et al. '18], proposes a preconditioning operator:

H:=(IσΔ)1,H := (I - \sigma \Delta)^{-1},

where Δ\Delta is the Laplacian (or, more generally, a graph Laplacian) and σ0\sigma \ge 0 is the smoothing parameter. This operator enforces local averaging across parameter coordinates, reducing stochastic variance "on the fly" without extra storage requirements.

2. Algorithmic Specification

2.1 Smoothing Operator Construction

For θRd\theta \in \mathbb{R}^d, define LRd×dL \in \mathbb{R}^{d \times d} as the circulant Laplacian matrix with periodic boundary conditions. The smoothing and preconditioning operators are:

  • Aσ:=IσLA_\sigma := I - \sigma L
  • H:=Aσ1H := A_\sigma^{-1}
  • H1/2:=Aσ1/2H^{1/2} := A_\sigma^{-1/2}

Both HH and H1/2H^{1/2} are circulant, admitting efficient evaluation via Fast Fourier Transform (FFT):

Hv=ifft(fft(v)fft(a)),Hv = \text{ifft} \left( \frac{\text{fft}(v)}{\text{fft}(a)} \right),

where aa is the first column of AσA_\sigma, and fft\text{fft}, ifft\text{ifft} denote discrete Fourier (inverse Fourier) transforms. The same transformation applies to H1/2H^{1/2}.

2.2 Continuous and Discrete-Time Dynamics

The LS-Langevin SDE generalizes the drift and diffusion structure:

dΘt=HU(Θt)dt+2HdBt,d\Theta_t = -H \nabla U(\Theta_t)\, dt + \sqrt{2H}\, dB_t,

maintaining π(θ)exp(U(θ))\pi(\theta) \propto \exp(-U(\theta)) as the unique invariant measure.

The discrete-time LS-SGLD Euler–Maruyama update is:

θk+1=θkηHgk+2ηH1/2ξk,\theta_{k+1} = \theta_k - \eta H g_k + \sqrt{2\eta}\, H^{1/2} \xi_k,

with

gk=1BiCkfi(θk),ξkN(0,I).g_k = \frac{1}{B} \sum_{i \in \mathcal{C}_k} \nabla f_i(\theta_k), \qquad \xi_k \sim \mathcal{N}(0, I) .

Efficient implementation uses FFT for both HgkH g_k and H1/2ξkH^{1/2} \xi_k at each step.

LS-SGLD Algorithm at a Glance

Step Operation Notes
1 Sample mini-batch, compute gkg_k gk=1BiCkfig_k = \frac{1}{B} \sum_{i \in \mathcal{C}_k} \nabla f_i
2 Compute HgkH g_k via FFT v1=ifft(fft(gk)/fft(a))v_1 = \text{ifft}(\text{fft}(g_k)/\text{fft}(a))
3 Generate ξk\xi_k, compute H1/2ξkH^{1/2} \xi_k via FFT v2=ifft(fft(ξk)/fft(a))v_2 = \text{ifft}(\text{fft}(\xi_k)/\sqrt{\text{fft}(a)})
4 Update: θk+1=θkηv1+2ηv2\theta_{k+1} = \theta_k - \eta v_1 + \sqrt{2\eta} v_2

3. Theoretical Guarantees

The convergence of LS-SGLD is established under the following assumptions:

  • Dissipativity: U(θ)θmθ2b\nabla U(\theta) \cdot \theta \ge m \|\theta\|^2 - b
  • Smoothness: Each fif_i is MM–Lipschitz gradient
  • Variance bound: Efi(θ)U(θ)2dω2\mathbb{E}\|\nabla f_i(\theta) - \nabla U(\theta)\|^2 \le d \omega^2
  • Log-concavity (optional): U convex

Key metrics are reported in terms of the $2$-Wasserstein distance W2W_2 and log-Sobolev constant λ\lambda.

3.1 Strongly Convex (Log-Concave) Targets

The total error splits as:

W2(Law(θK),π)W2(Law(θK),Law(ΘKη))Discretization+W2(Law(ΘKη),π)Ergodic termW_2(\mathrm{Law}(\theta_K), \pi) \leq \underbrace{W_2(\mathrm{Law}(\theta_K), \mathrm{Law}(\Theta_{K\eta}))}_{\text{Discretization}} + \underbrace{W_2(\mathrm{Law}(\Theta_{K\eta}), \pi)}_{\text{Ergodic term}}

The ergodic term decays exponentially in time, with decay rate constant c0[Aσ1,1]c_0 \in [\|A_\sigma\|^{-1}, 1].

Crucially, LS-SGLD exhibits strictly smaller discretization error than vanilla SGLD due to variance-reducing factors γ1,γ2<1\gamma_1, \gamma_2 < 1, where:

  • γ1[Aσ2,1]\gamma_1 \in [\|A_\sigma\|^{-2}, 1]
  • γ2=1dj=1d[1+2σ2σcos(2πj/d)]1<1\gamma_2 = \frac{1}{d} \sum_{j=1}^d \left[1 + 2\sigma - 2\sigma \cos(2\pi j/d)\right]^{-1} < 1

3.2 General (Non-Log-Concave) Targets

For general non-convex UU, analogous error bounds are derived using a coupled SDE argument and Girsanov’s theorem, showing that LS-SGLD achieves reduced discretization terms (scaled by γ1\gamma_1, γ2\gamma_2) while preserving exponential ergodic decay as above, though with slightly reduced mixing rate.

A plausible implication is that for a broad class of sampling tasks, LS-SGLD offers a stringent trade-off: notably improved accuracy per iteration at the cost of marginally slower mixing in continuous time.

4. Computational Aspects and Hyperparameter Choices

The dominant additional cost over SGLD is two FFTs of length dd per iteration, resulting in O(dlogd)O(d \log d) overhead, which is negligible in high-dimensional settings. The smoothing parameter σ\sigma regulates both the strength of preconditioning and the potential step-size, with larger σ\sigma yielding smaller γ2\gamma_2 and increased maximal stable step-sizes. The effective step-size in practice can be set proportional to (1+4σ)1/4(1+4\sigma)^{1/4} in smooth quadratic landscapes, reflecting robust stability gains observed in experiments. For arbitrary high-dimensional graph Laplacians, sparse FFT or polynomial preconditioning may be employed.

5. Empirical Performance

LS-SGLD demonstrates consistent empirical improvements across multiple paradigms:

2D Gaussian and Gaussian Mixture

  • With a 2D Gaussian target N(0,Σ)N(0, \Sigma) (Σ12=0.9\Sigma_{12}=0.9), LS-SGLD recovers the covariance structure with reduced autocorrelation time and support for step-sizes up to (1+4σ)1/4(1+4\sigma)^{1/4} times larger than SGLD.
  • In a bimodal 2D Gaussian mixture, both LS-SGLD and the LS-preconditioned pSGLD achieve near-constant W2W_2 distance to ground-truth Metropolis-Hastings samples (\sim0.42), with SGLD/pSGLD showing larger bias and fluctuations.

Bayesian Logistic Regression (UCI “a3a”)

  • With batch size 5 and grid-searched η\eta, LS-SGLD yields lower test negative log-likelihood, higher classification accuracy, and reduced stochastic gradient variance compared to baselines.

Bayesian Convolutional Neural Networks

  • On MNIST using a two-layer convnet (batch size 100, σ=0.5\sigma=0.5), LS-SGLD and LS-pSGLD converge more rapidly and marginally outperform SGLD/pSGLD in both training density and generalization accuracy.
Task SGLD LS-SGLD
2D Gaussian, γ2\gamma_2 1.00 0.149–0.268 (σ=1–3)
GMM, W2W_2 to MH samples Variable, large bias \sim0.42 constant
Logistic Reg. (a3a), variance Higher Lower
Bayesian CNN, test accuracy Lower, slower convergence Slightly higher, faster

6. Interpretations and Concluding Principles

Laplacian smoothing functions as a lightweight, variance-reducing preconditioner for SGLD, leading to provable gains in discretization error (in W2W_2) at the expense of a modest reduction in mixing speed. This effect is robust across both simple and deep Bayesian target distributions. The FFT-based implementation ensures negligible computational overhead. Empirical findings across synthetic and real-world machine learning benchmarks corroborate the theoretical results and uptake of LS-SGLD as an effective Bayesian inference technique (Wang et al., 2019).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Laplacian Smoothing SGLD (LS-SGLD).