LS-SGLD: Laplacian Smoothing for Bayesian Sampling

Updated 23 March 2026

LS-SGLD is a Bayesian sampling method that applies Laplacian smoothing as a preconditioner to reduce the high variance in stochastic gradient Langevin dynamics.
It uses FFT-based computation to efficiently evaluate the smoothing operator, achieving lower discretization error measured in 2-Wasserstein distance.
Empirical results demonstrate improved posterior sampling performance in tasks like Bayesian logistic regression and CNN training, allowing for larger stable step sizes.

Laplacian Smoothing Stochastic Gradient Langevin Dynamics (LS-SGLD) is a Markov Chain Monte Carlo (MCMC) method designed to address the high-variance bottleneck of stochastic gradient Langevin dynamics (SGLD) in Bayesian sampling. By introducing Laplacian smoothing (LS) as a preconditioner, it achieves provably reduced discretization error in 2-Wasserstein distance across both log-concave and non-log-concave targets, with negligible computational overhead relative to standard SGLD. The LS-SGLD algorithm demonstrates improved empirical performance in posterior sampling, Bayesian logistic regression, and Bayesian convolutional neural network (CNN) training, with robust variance reduction and larger stable step sizes (Wang et al., 2019).

1. Motivation and Conceptual Foundations

LS-SGLD is motivated by the variance limitations of SGLD, which is based on discretizing the continuous-time overdamped Langevin stochastic differential equation (SDE):

$d\theta_t = -\nabla U(\theta_t)\, dt + \sqrt{2}\, dB_t,$

where the target distribution is $\pi(\theta) \propto \exp(-U(\theta))$ . In practice, $\nabla U(\theta)$ is approximated by a mini-batch stochastic gradient $g_k$ , resulting in the update:

$\theta_{k+1} = \theta_k - \eta g_k + \sqrt{2\eta}\, \xi_k, \quad \xi_k \sim \mathcal{N}(0, I).$

High variance in $g_k$ compels the use of small step-sizes $\eta$ . Laplacian smoothing, previously introduced for stochastic gradient descent (SGD) [Osher et al. '18], proposes a preconditioning operator:

$H := (I - \sigma \Delta)^{-1},$

where $\Delta$ is the Laplacian (or, more generally, a graph Laplacian) and $\sigma \ge 0$ is the smoothing parameter. This operator enforces local averaging across parameter coordinates, reducing stochastic variance "on the fly" without extra storage requirements.

2. Algorithmic Specification

2.1 Smoothing Operator Construction

For $\theta \in \mathbb{R}^d$ , define $L \in \mathbb{R}^{d \times d}$ as the circulant Laplacian matrix with periodic boundary conditions. The smoothing and preconditioning operators are:

$A_\sigma := I - \sigma L$
$H := A_\sigma^{-1}$
$H^{1/2} := A_\sigma^{-1/2}$

Both $H$ and $H^{1/2}$ are circulant, admitting efficient evaluation via Fast Fourier Transform (FFT):

$Hv = \text{ifft} \left( \frac{\text{fft}(v)}{\text{fft}(a)} \right),$

where $a$ is the first column of $A_\sigma$ , and $\text{fft}$ , $\text{ifft}$ denote discrete Fourier (inverse Fourier) transforms. The same transformation applies to $H^{1/2}$ .

2.2 Continuous and Discrete-Time Dynamics

The LS-Langevin SDE generalizes the drift and diffusion structure:

$d\Theta_t = -H \nabla U(\Theta_t)\, dt + \sqrt{2H}\, dB_t,$

maintaining $\pi(\theta) \propto \exp(-U(\theta))$ as the unique invariant measure.

The discrete-time LS-SGLD Euler–Maruyama update is:

$\theta_{k+1} = \theta_k - \eta H g_k + \sqrt{2\eta}\, H^{1/2} \xi_k,$

with

$g_k = \frac{1}{B} \sum_{i \in \mathcal{C}_k} \nabla f_i(\theta_k), \qquad \xi_k \sim \mathcal{N}(0, I) .$

Efficient implementation uses FFT for both $H g_k$ and $H^{1/2} \xi_k$ at each step.

LS-SGLD Algorithm at a Glance

Step	Operation	Notes
1	Sample mini-batch, compute $g_k$	$g_k = \frac{1}{B} \sum_{i \in \mathcal{C}_k} \nabla f_i$
2	Compute $H g_k$ via FFT	$v_1 = \text{ifft}(\text{fft}(g_k)/\text{fft}(a))$
3	Generate $\xi_k$ , compute $H^{1/2} \xi_k$ via FFT	$v_2 = \text{ifft}(\text{fft}(\xi_k)/\sqrt{\text{fft}(a)})$
4	Update: $\theta_{k+1} = \theta_k - \eta v_1 + \sqrt{2\eta} v_2$

3. Theoretical Guarantees

The convergence of LS-SGLD is established under the following assumptions:

Dissipativity: $\nabla U(\theta) \cdot \theta \ge m \|\theta\|^2 - b$
Smoothness: Each $f_i$ is $M$ –Lipschitz gradient
Variance bound: $\mathbb{E}\|\nabla f_i(\theta) - \nabla U(\theta)\|^2 \le d \omega^2$
Log-concavity (optional): U convex

Key metrics are reported in terms of the $2$-Wasserstein distance $W_2$ and log-Sobolev constant $\lambda$ .

3.1 Strongly Convex (Log-Concave) Targets

The total error splits as:

$W_2(\mathrm{Law}(\theta_K), \pi) \leq \underbrace{W_2(\mathrm{Law}(\theta_K), \mathrm{Law}(\Theta_{K\eta}))}_{\text{Discretization}} + \underbrace{W_2(\mathrm{Law}(\Theta_{K\eta}), \pi)}_{\text{Ergodic term}}$

The ergodic term decays exponentially in time, with decay rate constant $c_0 \in [\|A_\sigma\|^{-1}, 1]$ .

Crucially, LS-SGLD exhibits strictly smaller discretization error than vanilla SGLD due to variance-reducing factors $\gamma_1, \gamma_2 < 1$ , where:

$\gamma_1 \in [\|A_\sigma\|^{-2}, 1]$
$\gamma_2 = \frac{1}{d} \sum_{j=1}^d \left[1 + 2\sigma - 2\sigma \cos(2\pi j/d)\right]^{-1} < 1$

3.2 General (Non-Log-Concave) Targets

For general non-convex $U$ , analogous error bounds are derived using a coupled SDE argument and Girsanov’s theorem, showing that LS-SGLD achieves reduced discretization terms (scaled by $\gamma_1$ , $\gamma_2$ ) while preserving exponential ergodic decay as above, though with slightly reduced mixing rate.

A plausible implication is that for a broad class of sampling tasks, LS-SGLD offers a stringent trade-off: notably improved accuracy per iteration at the cost of marginally slower mixing in continuous time.

4. Computational Aspects and Hyperparameter Choices

The dominant additional cost over SGLD is two FFTs of length $d$ per iteration, resulting in $O(d \log d)$ overhead, which is negligible in high-dimensional settings. The smoothing parameter $\sigma$ regulates both the strength of preconditioning and the potential step-size, with larger $\sigma$ yielding smaller $\gamma_2$ and increased maximal stable step-sizes. The effective step-size in practice can be set proportional to $(1+4\sigma)^{1/4}$ in smooth quadratic landscapes, reflecting robust stability gains observed in experiments. For arbitrary high-dimensional graph Laplacians, sparse FFT or polynomial preconditioning may be employed.

5. Empirical Performance

LS-SGLD demonstrates consistent empirical improvements across multiple paradigms:

2D Gaussian and Gaussian Mixture

With a 2D Gaussian target $N(0, \Sigma)$ ( $\Sigma_{12}=0.9$ ), LS-SGLD recovers the covariance structure with reduced autocorrelation time and support for step-sizes up to $(1+4\sigma)^{1/4}$ times larger than SGLD.
In a bimodal 2D Gaussian mixture, both LS-SGLD and the LS-preconditioned pSGLD achieve near-constant $W_2$ distance to ground-truth Metropolis-Hastings samples ( $\sim$ 0.42), with SGLD/pSGLD showing larger bias and fluctuations.

Bayesian Logistic Regression (UCI “a3a”)

With batch size 5 and grid-searched $\eta$ , LS-SGLD yields lower test negative log-likelihood, higher classification accuracy, and reduced stochastic gradient variance compared to baselines.

Bayesian Convolutional Neural Networks

On MNIST using a two-layer convnet (batch size 100, $\sigma=0.5$ ), LS-SGLD and LS-pSGLD converge more rapidly and marginally outperform SGLD/pSGLD in both training density and generalization accuracy.

Task	SGLD	LS-SGLD
2D Gaussian, $\gamma_2$	1.00	0.149–0.268 (σ=1–3)
GMM, $W_2$ to MH samples	Variable, large bias	$\sim$ 0.42 constant
Logistic Reg. (a3a), variance	Higher	Lower
Bayesian CNN, test accuracy	Lower, slower convergence	Slightly higher, faster

6. Interpretations and Concluding Principles

Laplacian smoothing functions as a lightweight, variance-reducing preconditioner for SGLD, leading to provable gains in discretization error (in $W_2$ ) at the expense of a modest reduction in mixing speed. This effect is robust across both simple and deep Bayesian target distributions. The FFT-based implementation ensures negligible computational overhead. Empirical findings across synthetic and real-world machine learning benchmarks corroborate the theoretical results and uptake of LS-SGLD as an effective Bayesian inference technique (Wang et al., 2019).

Markdown Report Issue Upgrade to Chat

References (1)

Laplacian Smoothing Stochastic Gradient Markov Chain Monte Carlo (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Laplacian Smoothing SGLD (LS-SGLD).

LS-SGLD: Laplacian Smoothing for Bayesian Sampling

1. Motivation and Conceptual Foundations

2. Algorithmic Specification

2.1 Smoothing Operator Construction

2.2 Continuous and Discrete-Time Dynamics

3. Theoretical Guarantees

3.1 Strongly Convex (Log-Concave) Targets

3.2 General (Non-Log-Concave) Targets

4. Computational Aspects and Hyperparameter Choices

5. Empirical Performance

2D Gaussian and Gaussian Mixture

Bayesian Logistic Regression (UCI “a3a”)

Bayesian Convolutional Neural Networks

6. Interpretations and Concluding Principles

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

LS-SGLD: Laplacian Smoothing for Bayesian Sampling

1. Motivation and Conceptual Foundations

2. Algorithmic Specification

2.1 Smoothing Operator Construction

2.2 Continuous and Discrete-Time Dynamics

3. Theoretical Guarantees

3.1 Strongly Convex (Log-Concave) Targets

3.2 General (Non-Log-Concave) Targets

4. Computational Aspects and Hyperparameter Choices

5. Empirical Performance

2D Gaussian and Gaussian Mixture

Bayesian Logistic Regression (UCI “a3a”)

Bayesian Convolutional Neural Networks

6. Interpretations and Concluding Principles

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research