Papers
Topics
Authors
Recent
Search
2000 character limit reached

Smoothed Discrete Sampling (SDS)

Updated 4 February 2026
  • Smoothed Discrete Sampling (SDS) is a framework that smooths discrete data manifolds with Gaussian noise, facilitating continuous optimization and efficient sampling.
  • It combines score matching, contrastive divergence, and MCMC techniques to robustly navigate multimodal discrete spaces in applications like protein sequencing and text-to-3D synthesis.
  • SDS enhances training stability, variance reduction, and sample diversity, outperforming autoregressive and discrete diffusion models with simpler noise scale management.

Smoothed Discrete Sampling (SDS) is a general framework for generative modeling in which a discrete data manifold is smoothed via additive Gaussian noise, enabling continuous optimization, sampling, and denoising strategies. SDS forms the basis of several influential algorithms in both discrete sequence modeling and text-conditioned 3D synthesis, including Discrete Walk-Jump Sampling (dWJS) for protein sequences and the score distillation losses used in DreamFusion-style text-to-3D pipelines. The mathematical principles underlying SDS draw from energy-based modeling, diffusion processes, and scale-space theory, with key methodological distinctions relative to both autoregressive and multi-scale diffusion approaches. This article surveys the foundations, theoretical properties, and empirical performance of SDS in discrete domains, highlighting advances in gradient guidance, denoising, and variance reduction.

1. Formal Definition and Mathematical Framework

Let x{0,1}dx \in \{0,1\}^d denote a sample from a discrete manifold M\mathcal{M}, such as a protein sequence (one-hot representation) or a rendered image generated from a 3D scene parameterization. Smoothed Discrete Sampling (SDS) proceeds by perturbing xx with isotropic Gaussian noise of scale σ\sigma,

y=x+ε,εN(0,σ2Id),y = x + \varepsilon,\quad \varepsilon \sim \mathcal{N}(0, \sigma^2 I_d),

resulting in the smoothed data density

pσ(y)=xMpdata(x)1(2πσ2)d/2exp(12σ2yx2)dx.p_\sigma(y) = \int_{x\in\mathcal{M}} p_{\text{data}}(x) \frac{1}{(2\pi\sigma^2)^{d/2}} \exp\left(-\frac{1}{2\sigma^2} \|y-x\|^2\right) dx.

The associated smoothed energy function is Eσ(y)=logpσ(y)E_\sigma(y) = -\log p_\sigma(y), with gradient (the “score”)

yEσ(y)=ylogpσ(y)=Exp(y)[xyσ2].\nabla_y E_\sigma(y) = -\nabla_y \log p_\sigma(y) = \mathbb{E}_{x\sim p(\cdot|y)}\left[ \frac{x-y}{\sigma^2} \right].

This continuous relaxation enables efficient optimization and Markov Chain Monte Carlo (MCMC) sampling on the smoothed manifold, followed by a projection (denoising) “jump” step to the original discrete space. The score-matching and contrastive divergence objectives can be used to train parametric denoisers gϕg_\phi and energy-based models fθf_\theta directly on yy sampled from pσp_\sigma (Frey et al., 2023).

2. The Discrete Walk-Jump Sampling Algorithm

The Discrete Walk-Jump Sampling (dWJS) algorithm provides a practical realization of SDS in discrete generative modeling. The essential steps are:

  • Walk (Langevin MCMC on smoothed manifold):

yt+1=ytηyEσ(yt)+2ηξt,ξtN(0,Id)y_{t+1} = y_t - \eta \nabla_y E_\sigma(y_t) + \sqrt{2\eta}\,\xi_t,\quad \xi_t \sim \mathcal{N}(0, I_d)

where the gradient may be replaced by a learned denoiser gϕ(yt)g_\phi(y_t) or EBM score yfθ(yt)\nabla_y f_\theta(y_t).

  • Jump (One-step Denoising):

x^=J(yT)=argmaxxMp(xyT)=yT+σ2gϕ(yT)one-hot\hat{x} = J(y_T) = \arg\max_{x\in\mathcal{M}} p(x|y_T) = \left\lfloor y_T + \sigma^2\,g_\phi(y_T) \right\rceil_{\text{one-hot}}

projecting yTy_T back to the nearest valid discrete configuration.

  • Training:

    • Score matching: Trains gϕg_\phi by least-squares denoising,

    LSM(ϕ)=Ex,ε[x(y+σ2gϕ(y))2],y=x+ε.\mathcal{L}_{\text{SM}}(\phi) = \mathbb{E}_{x, \varepsilon}[ \| x - (y+\sigma^2 g_\phi(y)) \|^2 ],\quad y = x + \varepsilon. - Contrastive divergence: Trains fθf_\theta to maximize likelihood on smoothed data and discriminate against negative samples obtained via short-run Langevin dynamics.

Pseudocode for the method is given in (Frey et al., 2023), with core quantities summarized in the following table:

Step Symbolic Formulation Description
Smoothing y=x+εy = x + \varepsilon Add Gaussian noise
Langevin Walk yt+1=ytηEσ+y_{t+1} = y_t - \eta\, \nabla E_\sigma + \cdots MCMC on smoothed manifold
Denoising (Jump) x^=one-hot[yT+σ2gϕ(yT)]\hat{x} = \text{one-hot} \left[ y_T + \sigma^2 g_\phi(y_T) \right] Project to discrete

3. Properties, Stability, and Theoretical Insights

The single-scale smoothing in SDS is instrumental in preventing instabilities common in EBM training, such as energy blow-up, and obviates the need for replay buffers or annealing. Empirical observations indicate stable training and sampling across σ[0.5,4.0]\sigma \in [0.5,4.0], with instability only for very small σ\sigma (undersmoothing regime) (Frey et al., 2023).

  • Noise scale (σ\sigma): Must be large enough to smooth out discrete energy ridges but not so large as to erase multi-modal structure. Typical values are σ0.5\sigma \sim 0.5 (proteins).
  • Mixing: dWJS mixes rapidly across distant modes and produces high-quality, diverse samples, outperforming diffusion and autoregressive baselines (10–100×\times and 40×\times faster, respectively, for protein generation).
  • Score-matching: Only a single noise scale is needed for training, aligning with Neural Empirical Bayes (Frey et al., 2023).

Training and sampling remain robust provided ηf21\eta\|\nabla f\|^2\ll1 and moderate TT (tens to hundreds of Langevin steps).

4. Comparisons to Other Generative and Diffusive Frameworks

SDS occupies a distinct space in the taxonomy of generative models:

  • Autoregressive models require sequential sampling, incurring slow inference and exposure bias.
  • Discrete diffusion models prescribe multi-scale (often hundreds) of noising and denoising iterations, with brittle schedule design and slow sampling.
  • DEEN (Deep Energy Estimator Networks) parameterize scores but lack explicit MCMC mechanisms or the mixing properties conferred by smoothing.

SDS/dWJS combines the flexible sampling properties of energy-based models (via MCMC) with the stability and sample quality of score-based models, requiring only a single noise scale σ\sigma (Frey et al., 2023).

5. Discretization and Scale-Space Axioms in Gaussian Smoothing for SDS

The implementation of the Smoothing step in SDS requires careful consideration of Gaussian kernel discretization, especially when the smoothed manifold is derived from pixelated images or other grid-structured data. Three principal strategies are distinguished (Lindeberg, 2023):

Discretization Method Key Strengths Pitfalls
Sampling Approach Simplicity, direct DL framework implementation Not normalized at fine scales, breaks cascade for small σ\sigma
Integrated (Pixel-Integral) Faithfully models pixel averaging, correct spatial average Constant scale offset, breaks cascade property
Discrete-Analogue (Bessel Kernel) Satisfies scale-space axioms: exact cascade, normalization, monotonicity Needs Bessel function implementation
  • For fine scales (σ1\sigma \lesssim 1) or robust scale-space properties, the discrete-analogue method best preserves theoretical guarantees.
  • For moderate to coarse scales (σ>1\sigma > 1), all methods can suffice, with sampling offering simplicity and pixel-integral kernels better modeling real-world sensor integration.

A key point: SDS emphasizes that digital data (e.g., images) are averages over finite pixel supports, not point samples (Lindeberg, 2023).

6. SDS in Text-to-3D Synthesis and Gradient Variance Reduction

In text-to-3D pipelines such as DreamFusion, SDS provides image-space guidance from frozen 2D diffusion models for 3D representation optimization. The standard SDS guidance gradient is (Lukoianov et al., 2024):

ψLSDS=Et,ε,c[σ(t)(ϵt(xt,y)ε)gψ]\nabla_\psi L_{\text{SDS}} = \mathbb{E}_{t, \varepsilon, c}\left[ \sigma(t)\cdot(\epsilon_t(x_t, y) - \varepsilon) \cdot \frac{\partial g}{\partial \psi} \right]

where g(ψ,c)g(\psi, c) is the rendered image from 3D parameters ψ\psi at random camera cc.

Recent analysis demonstrates that SDS is a high-variance discretization of DDIM, as SDS samples i.i.d. noise at each step rather than tracking prompt-conditioned trajectories as in DDIM. This mismatch creates excessive update variance, leading to over-smoothed, cartoon-like outputs in 3D. The Score Distillation via Inversion (SDI) approach replaces i.i.d. noise with a prompt- and trajectory-matched estimate via DDIM inversion, restoring sample variance to the theoretical minimum and dramatically improving texture fidelity and detail (Lukoianov et al., 2024).

Validation experiments on 3D shape generation show SDI achieves superior CLIP and visual quality scores compared to SDS and other methods (Lukoianov et al., 2024).

7. Guidelines, Hyperparameters, and Practical Recommendations

  • Noise scale (σ\sigma): Select to achieve desirable smoothing tradeoff; σ0.5\sigma\sim0.5 for protein sequences, σ>1\sigma>1 for robust image smoothing.
  • Number of steps (T): Typically 10–200 for adequate manifold exploration.
  • Step size (η\eta): η102\eta\sim10^{-2} to 10310^{-3}, tuned for discretization stability.
  • Denoising projection: Use model-based denoisers or nearest-neighbor projection as appropriate for target manifold.
  • Manifold matching: For continuous domains (pixelated images), pixel-integral or discrete-analogue Gaussian convolutions yield better physical and mathematical fidelity (Lindeberg, 2023).
  • Variance reduction: For text-to-3D or DDIM-style workflows, replace random noise with trajectory- and prompt-conditioned inversion to improve detail preservation (Lukoianov et al., 2024).

SDS and its algorithmic realizations offer a robust, efficient, and theoretically sound paradigm for discrete generative modeling, with demonstrated advantages in mixing, sample quality, and practical simplicity relative to alternative approaches (Frey et al., 2023, Lindeberg, 2023, Lukoianov et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (3)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Smoothed Discrete Sampling (SDS).