Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 58 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 13 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 86 tok/s Pro
Kimi K2 208 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

PDHAMS: Efficient Discrete MCMC

Updated 2 August 2025
  • PDHAMS is a second-order MCMC algorithm that uses quadratic expansion and preconditioning to capture pairwise correlations in high-dimensional, correlated discrete targets.
  • It couples a Gaussian auxiliary variable trick with Hamiltonian momentum updates to enable efficient, rejection-free sampling for exactly quadratic potentials.
  • Careful tuning of parameters such as the preconditioning matrix, diagonal stabilization, and momentum scaling yields superior mixing and lower total variation distances compared to first-order methods.

Preconditioned Discrete-HAMS (PDHAMS) is a second-order, irreversible Markov chain Monte Carlo (MCMC) algorithm for discrete structured distributions. PDHAMS introduces a quadratic preconditioning step and a Hamiltonian momentum augmentation into the family of gradient-based discrete samplers, offering a significant advance in efficiency for high-dimensional or correlated discrete target distributions. The method combines (i) a quadratic expansion of the log-density, (ii) an auxiliary-variable construction based on the Gaussian integral trick to manage complex dependencies, and (iii) a Hamiltonian-based update with generalized detailed balance, yielding a rejection-free sampler for targets with exact quadratic potentials (Zhou et al., 29 Jul 2025).

1. Quadratic Preconditioning and Auxiliary Variable Mechanism

In contrast with first-order discrete samplers such as Norm Constrained Gradient (NCG) and Auxiliary Variable Gradient (AVG), which use only the gradient (linear expansion) of the log-density f(s)f(s), PDHAMS employs a second-order Taylor expansion:

f(s)f(st)+f(st)(sst)+12(sst)W(sst),f(s) \approx f(s_t) + \nabla f(s_t)^\top (s - s_t) + \frac{1}{2} (s - s_t)^\top W (s - s_t),

where WW is a global, positive definite preconditioning matrix representing curvature. This captures pairwise correlations missed by first-order approximations.

Direct sampling of the resulting discrete proposal is computationally intractable for generic WW due to induced pairwise interactions. PDHAMS resolves this by introducing a continuous auxiliary variable zz via the Gaussian integral trick:

π(s,z)exp{f(s)}exp(12(zs)(W+D)(zs)),\pi(s, z) \propto \exp\{f(s)\} \, \exp\left( -\frac{1}{2}(z - s)^\top (W + D)(z - s) \right),

where DD is diagonal, W+DW + D is positive definite, and LL is its Cholesky factor. Conditioning on zz yields factorized discrete proposals, enabling coordinate-wise efficient sampling.

Alternative but equivalent auxiliary variable schemes—“mean”, “variance”, or “momentum”—produce identical state transitions. The construction allows the state proposal ss^* to be efficiently sampled as

Q(szt,st)i=1dSoftmax[12disi2+(f(st)i(Wst)i+(W+D)zt)i)si].Q(s \mid z_t, s_t) \propto \prod_{i=1}^d \operatorname{Softmax}\left[ -\frac{1}{2} d_i s_i^2 + \big(\nabla f(s_t)_i - (W s_t)_i + (W + D) z_t)_i \big) s_i \right].

2. Hamiltonian Dynamics, Momentum Augmentation, and Irreversibility

PDHAMS augments the discrete state variable ss with a Gaussian momentum uu (or a scaled vv), constructing a joint Hamiltonian target,

π(s,u)exp{f(s)12u2}.\pi(s, u) \propto \exp\{f(s) - \tfrac{1}{2} \|u\|^2\}.

The dynamics proceed as follows:

  1. Auto-regression momentum update:

vt+1/2=ϵvt+1ϵ2L1Z,ZN(0,I),v_{t+1/2} = \epsilon v_t + \sqrt{1 - \epsilon^2} L^{-1} Z, \quad Z \sim \mathcal{N}(0, I),

for ϵ[0,1)\epsilon \in [0,1).

  1. Auxiliary variable:

zt=st+L1Z.z_t = s_t + L^{-1} Z.

  1. Discrete proposal sQ(zt,st)s^* \sim Q(\cdot | z_t, s_t) as described above.
  2. Irreversible momentum update with negation and gradient correction:

v=vt+1/2+sts+ϕ[f(s)f(st)+W(sts)],v^* = -v_{t+1/2} + s_t - s^* + \phi\, [\nabla f(s^*) - \nabla f(s_t) + W(s_t - s^*)],

where ϕ0\phi \ge 0 controls the strength of the correction.

  1. A generalized Metropolis–Hastings accept-reject step is performed:

α=min{1,π(s,v)Q(st,vt+1/2s,v)π(st,vt+1/2)Q(s,vst,vt+1/2)},\alpha = \min\left\{ 1, \frac{\pi(s^*, -v^*)\, Q(s_t, -v_{t+1/2} | s^*, -v^*)}{\pi(s_t, v_{t+1/2})\, Q(s^*, v^* | s_t, v_{t+1/2})} \right\},

and transitions satisfy the generalized detailed balance:

π(st,vt+1/2)Kϕ(st+1,vt+1st,vt+1/2)=π(st+1,vt+1)Kϕ(st,vt+1/2st+1,vt+1).\pi(s_t, v_{t+1/2}) K_\phi(s_{t+1}, v_{t+1} | s_t, v_{t+1/2}) = \pi(s_{t+1}, -v_{t+1}) K_\phi(s_t, -v_{t+1/2} | s_{t+1}, -v_{t+1}).

This symmetry mirrors the irreversible dynamics of Hamiltonian Monte Carlo and ensures correct invariance of the target distribution while enabling rapid state space exploration.

3. Rejection-Free Sampling and Preconditioning Effects

A key property is that when f(s)f(s) is exactly quadratic (so the target π(s)\pi(s) is discrete Gaussian), the PDHAMS proposal kernel exactly matches the target, giving an acceptance probability of one—rejection-free sampling. In such cases, all proposals are accepted and the algorithm samples independently given the auxiliary variable structure.

For general f(s)f(s), the quadratic expansion and preconditioning via WW yield highly “informed” proposals that reflect local curvature, and numerical results indicate high acceptance rates and strong mixing even far from the quadratic ideal.

The matrix WW serves both as a global surrogate for the Hessian and as a preconditioner. Proper calibration of WW, along with selection of DD, ϵ\epsilon, and ϕ\phi, is essential to balance efficient traversal of state space with numerically stable, easily invertible auxiliary variable sampling.

4. Performance Comparisons in Numerical Experiments

Several experiments in (Zhou et al., 29 Jul 2025) demonstrate the empirical advantages of PDHAMS over prior state-of-the-art discrete MCMC methods:

Method Approximation Order Auxiliary Variable Momentum Irreversible Rejection-Free (Quadratic) TV Distance ESS
NCG 1st None None No No Higher Low
AVG 1st Yes None No No Higher Low
DHAMS 1st Yes Yes Yes Yes (Linear only) Moderate Moderate
PDHAMS 2nd (Quadratic) Yes (Gaussian) Yes Yes Yes (Quadratic) Lowest High

In all tested cases—including discrete Gaussian, quadratic mixture, and clock Potts models—PDHAMS exhibits significantly lower total variation distance (TV) from the target and higher effective sample size (ESS) compared to NCG, AVG, and DHAMS. Autocorrelation in Markov chain trajectories is suppressed more rapidly, and estimated moments (means, variances) converge faster and with reduced bias.

NCG and AVG restrict their proposals to first-order information, neglecting important state-space dependencies. DHAMS improves on these by introducing auxiliary momentum and irreversible transitions but remains limited to first-order (linear) approximations, yielding rejection-free behavior only for targets with linear f(s)f(s). PDHAMS, by preconditioning with a global WW and using a quadratic expansion encapsulated in the auxiliary-variable framework, overcomes both limitations and achieves rejection-free behavior for quadratic potentials. For general f(s)f(s), the adaptive proposals retain higher fidelity to the local geometry of the target than NCG/AVG/DHAMS, resulting in superior mixing.

6. Implementation Considerations and Parameter Tuning

PDHAMS requires selection of several matrices and parameters:

  • WW: the global curvature approximation; common choices include the true or approximate Hessian of f(s)f(s).
  • DD: diagonal stabilization ensuring that W+DW + D is positive definite; required for Cholesky factorization.
  • ϵ\epsilon: auto-regression parameter for the momentum process; interpolates between independent and persistent momentum.
  • ϕ\phi: magnitude of the gradient correction in the momentum update.
  • LL: the lower Cholesky of (W+D)(W + D), needed for efficient auxiliary variable generation.
  • (For over-relaxed PDHAMS variants) β\beta: controls the degree of negative correlation in state updates.

While parameter calibration can introduce implementation effort, the paper observes that modest tuning suffices to achieve strong performance across diverse discrete sampling scenarios.

7. Outlook and Significance

PDHAMS unifies and generalizes techniques from the discrete and continuous MCMC literature (notably Mirroring ideas from Hamiltonian Monte Carlo and Gaussian auxiliary variable tricks) in a framework that is robust to high-dimensional correlation, scale, and complex potentials. It achieves theoretically optimal mixing for discrete quadratic targets and extends these gains to broader target classes in practice. This makes PDHAMS a foundational methodology for future development of efficient discrete MCMC algorithms, particularly those requiring effective sampling from discrete graphical models, probabilistic combinatorial structures, or high-dimensional Bayesian posteriors with correlated latent variables.

The performance advantages over NCG, AVG, and DHAMS are consistently demonstrated through lower TV distances to target, increased ESS, reduced bias in moment estimation, and suppressed autocorrelation across varied discrete sampling tasks (Zhou et al., 29 Jul 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Preconditioned Discrete-HAMS (PDHAMS).