LS-SGLD: Laplacian Smoothing for Bayesian Sampling
- LS-SGLD is a Bayesian sampling method that applies Laplacian smoothing as a preconditioner to reduce the high variance in stochastic gradient Langevin dynamics.
- It uses FFT-based computation to efficiently evaluate the smoothing operator, achieving lower discretization error measured in 2-Wasserstein distance.
- Empirical results demonstrate improved posterior sampling performance in tasks like Bayesian logistic regression and CNN training, allowing for larger stable step sizes.
Laplacian Smoothing Stochastic Gradient Langevin Dynamics (LS-SGLD) is a Markov Chain Monte Carlo (MCMC) method designed to address the high-variance bottleneck of stochastic gradient Langevin dynamics (SGLD) in Bayesian sampling. By introducing Laplacian smoothing (LS) as a preconditioner, it achieves provably reduced discretization error in 2-Wasserstein distance across both log-concave and non-log-concave targets, with negligible computational overhead relative to standard SGLD. The LS-SGLD algorithm demonstrates improved empirical performance in posterior sampling, Bayesian logistic regression, and Bayesian convolutional neural network (CNN) training, with robust variance reduction and larger stable step sizes (Wang et al., 2019).
1. Motivation and Conceptual Foundations
LS-SGLD is motivated by the variance limitations of SGLD, which is based on discretizing the continuous-time overdamped Langevin stochastic differential equation (SDE):
where the target distribution is . In practice, is approximated by a mini-batch stochastic gradient , resulting in the update:
High variance in compels the use of small step-sizes . Laplacian smoothing, previously introduced for stochastic gradient descent (SGD) [Osher et al. '18], proposes a preconditioning operator:
where is the Laplacian (or, more generally, a graph Laplacian) and is the smoothing parameter. This operator enforces local averaging across parameter coordinates, reducing stochastic variance "on the fly" without extra storage requirements.
2. Algorithmic Specification
2.1 Smoothing Operator Construction
For , define as the circulant Laplacian matrix with periodic boundary conditions. The smoothing and preconditioning operators are:
Both and are circulant, admitting efficient evaluation via Fast Fourier Transform (FFT):
where is the first column of , and , denote discrete Fourier (inverse Fourier) transforms. The same transformation applies to .
2.2 Continuous and Discrete-Time Dynamics
The LS-Langevin SDE generalizes the drift and diffusion structure:
maintaining as the unique invariant measure.
The discrete-time LS-SGLD Euler–Maruyama update is:
with
Efficient implementation uses FFT for both and at each step.
LS-SGLD Algorithm at a Glance
| Step | Operation | Notes |
|---|---|---|
| 1 | Sample mini-batch, compute | |
| 2 | Compute via FFT | |
| 3 | Generate , compute via FFT | |
| 4 | Update: |
3. Theoretical Guarantees
The convergence of LS-SGLD is established under the following assumptions:
- Dissipativity:
- Smoothness: Each is –Lipschitz gradient
- Variance bound:
- Log-concavity (optional): U convex
Key metrics are reported in terms of the $2$-Wasserstein distance and log-Sobolev constant .
3.1 Strongly Convex (Log-Concave) Targets
The total error splits as:
The ergodic term decays exponentially in time, with decay rate constant .
Crucially, LS-SGLD exhibits strictly smaller discretization error than vanilla SGLD due to variance-reducing factors , where:
3.2 General (Non-Log-Concave) Targets
For general non-convex , analogous error bounds are derived using a coupled SDE argument and Girsanov’s theorem, showing that LS-SGLD achieves reduced discretization terms (scaled by , ) while preserving exponential ergodic decay as above, though with slightly reduced mixing rate.
A plausible implication is that for a broad class of sampling tasks, LS-SGLD offers a stringent trade-off: notably improved accuracy per iteration at the cost of marginally slower mixing in continuous time.
4. Computational Aspects and Hyperparameter Choices
The dominant additional cost over SGLD is two FFTs of length per iteration, resulting in overhead, which is negligible in high-dimensional settings. The smoothing parameter regulates both the strength of preconditioning and the potential step-size, with larger yielding smaller and increased maximal stable step-sizes. The effective step-size in practice can be set proportional to in smooth quadratic landscapes, reflecting robust stability gains observed in experiments. For arbitrary high-dimensional graph Laplacians, sparse FFT or polynomial preconditioning may be employed.
5. Empirical Performance
LS-SGLD demonstrates consistent empirical improvements across multiple paradigms:
2D Gaussian and Gaussian Mixture
- With a 2D Gaussian target (), LS-SGLD recovers the covariance structure with reduced autocorrelation time and support for step-sizes up to times larger than SGLD.
- In a bimodal 2D Gaussian mixture, both LS-SGLD and the LS-preconditioned pSGLD achieve near-constant distance to ground-truth Metropolis-Hastings samples (0.42), with SGLD/pSGLD showing larger bias and fluctuations.
Bayesian Logistic Regression (UCI “a3a”)
- With batch size 5 and grid-searched , LS-SGLD yields lower test negative log-likelihood, higher classification accuracy, and reduced stochastic gradient variance compared to baselines.
Bayesian Convolutional Neural Networks
- On MNIST using a two-layer convnet (batch size 100, ), LS-SGLD and LS-pSGLD converge more rapidly and marginally outperform SGLD/pSGLD in both training density and generalization accuracy.
| Task | SGLD | LS-SGLD |
|---|---|---|
| 2D Gaussian, | 1.00 | 0.149–0.268 (σ=1–3) |
| GMM, to MH samples | Variable, large bias | 0.42 constant |
| Logistic Reg. (a3a), variance | Higher | Lower |
| Bayesian CNN, test accuracy | Lower, slower convergence | Slightly higher, faster |
6. Interpretations and Concluding Principles
Laplacian smoothing functions as a lightweight, variance-reducing preconditioner for SGLD, leading to provable gains in discretization error (in ) at the expense of a modest reduction in mixing speed. This effect is robust across both simple and deep Bayesian target distributions. The FFT-based implementation ensures negligible computational overhead. Empirical findings across synthetic and real-world machine learning benchmarks corroborate the theoretical results and uptake of LS-SGLD as an effective Bayesian inference technique (Wang et al., 2019).