Papers
Topics
Authors
Recent
2000 character limit reached

Hamiltonian Monte Carlo (HMC)

Updated 6 January 2026
  • Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo method that leverages Hamiltonian dynamics to overcome random-walk behavior in sampling complex, continuous distributions.
  • It augments parameters with auxiliary momentum variables and uses symplectic integrators like the leapfrog scheme along with a Metropolis correction to maintain target invariance.
  • HMC's efficiency critically depends on tuning the step size, trajectory length, and mass matrix, enabling superior mixing and effective sample sizes in high-dimensional Bayesian problems.

Hamiltonian Monte Carlo (HMC) is a Markov Chain Monte Carlo algorithm that exploits the principles of Hamiltonian dynamics to produce efficient proposals for sampling from continuous target densities, especially in high-dimensional Bayesian inference problems. By augmenting parameters with auxiliary momentum variables and simulating the joint evolution according to a Hamiltonian function, HMC avoids the random-walk behavior and poor mixing endemic to classic Metropolis-Hastings or Gibbs schemes in complex geometries. The defining features of HMC include volume-preserving and reversible (symplectic) integrators, a gradient-informed proposal mechanism, and a Metropolis correction that ensures invariance of the target density.

1. Hamiltonian Formulation and Dynamics

HMC samples a target density π(θ)\pi(\theta) on Rd\mathbb{R}^d, typically known up to normalization. The algorithm introduces a potential energy U(θ)=logπ(θ)U(\theta) = -\log\pi(\theta) and augments state space with auxiliary momentum pRdp\in\mathbb{R}^d, distributed as pN(0,M)p\sim\mathcal{N}(0, M) for some mass matrix M0M\succ 0 (often chosen diagonal or as the identity). Kinetic energy is K(p)=12pTM1pK(p) = \frac{1}{2}p^T M^{-1}p, yielding the total Hamiltonian

H(θ,p)=U(θ)+K(p).H(\theta,p) = U(\theta) + K(p) \,.

The deterministic flow in phase space (θ(t),p(t))(\theta(t),p(t)) over time tt is governed by Hamilton's equations:

dθdt=M1p,dpdt=θU(θ),\frac{d\theta}{dt} = M^{-1}p \,, \qquad \frac{dp}{dt} = -\nabla_\theta U(\theta) \,,

which preserve the joint density exp(H(θ,p))\propto\exp(-H(\theta,p)) exactly, exhibiting reversibility (ppp\to-p) and volume preservation (by Liouville's theorem) (Granados et al., 8 Jan 2025, Mukherjee et al., 4 Jan 2026).

2. Symplectic Integrators and Metropolis Correction

Closed-form solutions to Hamilton's equations exist only for limited cases. Practically, HMC uses discrete symplectic (volume-preserving, reversible) integrators, primarily the leapfrog scheme. For small time increment ϵ\epsilon:

p(t+ϵ2)=p(t)ϵ2θU(θ(t)) θ(t+ϵ)=θ(t)+ϵM1p(t+ϵ2) p(t+ϵ)=p(t+ϵ2)ϵ2θU(θ(t+ϵ))\begin{aligned} p\left(t+\frac{\epsilon}{2}\right) &= p(t) - \frac{\epsilon}{2}\nabla_\theta U(\theta(t)) \ \theta(t+\epsilon) &= \theta(t) + \epsilon M^{-1} p\left(t+\frac{\epsilon}{2}\right) \ p(t+\epsilon) &= p\left(t+\frac{\epsilon}{2}\right) - \frac{\epsilon}{2}\nabla_\theta U(\theta(t+\epsilon)) \end{aligned}

LL leapfrog steps trace a trajectory (θ,p)(\theta^*, p^*). Since leapfrog only approximately conserves HH, a Metropolis accept-reject step is applied:

α=min{1,exp(H(θi1,p0)H(θ,p))}\alpha = \min\{1, \exp(H(\theta^{i-1},p^0) - H(\theta^*,p^*))\}

ensuring invariance of the desired posterior and correcting numerical drift (Granados et al., 8 Jan 2025, Vishnoi, 2021).

3. Algorithmic Structure and Implementation

The canonical HMC algorithm proceeds as follows:

  1. Sample p0N(0,M)p^0 \sim \mathcal{N}(0,M).
  2. Run LL leapfrog steps to (θ,p)(\theta^*,p^*).
  3. Negate momentum: ppp^* \mapsto -p^* (for time-reversal symmetry).
  4. Metropolis correction as above.
  5. If accepted, set θi=θ\theta^{i} = \theta^*; else retain previous θi=θi1\theta^{i} = \theta^{i-1}.

Parameter tuning is critical. Step size ϵ\epsilon must balance energy conservation and trajectory length (LL). Optimal acceptance rates are empirically 65%65\% in high dimensions. Mass matrix MM is ideally set to the posterior covariance or its diagonal. Automated adaptation (e.g. in Stan) uses dual averaging for ϵ\epsilon and robust strategies (e.g. NUTS) for path-length (Mukherjee et al., 4 Jan 2026, Granados et al., 8 Jan 2025, Thomas et al., 2020).

4. Geometric and Physical Properties

The theoretical basis of HMC is symplectic geometry. Hamiltonian flows on phase space preserve the symplectic form and volume, ensuring unbiased sampling regardless of the integrator (Betancourt et al., 2011). The kinetic energy can be generalized to position-dependent metrics, resulting in Riemannian Manifold HMC (RMHMC), which adapts to local curvature and further improves mixing but requires specialized integrators for non-separable Hamiltonians (Betancourt et al., 2011, Mukherjee et al., 4 Jan 2026). Volume-preservation and reversibility (up to momentum-flip) are necessary for detailed balance and unbiasedness (Lelièvre et al., 2023).

5. Performance Characteristics and Comparative Analysis

HMC exhibits dramatic improvement in effective sample size, autocorrelation time, and CPU efficiency over Random Walk Metropolis-Hastings (RWMH) and derivative-free samplers (e.g., t-walk), especially in correlated or high-dimensional targets. For unimodal, strongly correlated posteriors (e.g., Gamma(5,1), bivariate normal, hierarchical normal), HMC achieves near-unity acceptance rates (e.g., 99.9%99.9\%, 100%100\%) and low integrated autocorrelation times (IAT 1\approx 1), with orders-of-magnitude more effective samples per unit time. In simple multimodal problems, t-walk samplers may outperform HMC in traversing modes, though HMC can perform well with appropriate augmentations (Granados et al., 8 Jan 2025, Vishnoi, 2021).

Target distribution HMC acceptance (%) IAT Effective sample size Samples/sec
Gamma(5,1), d=1d=1 $99.9$ $1$ $99,934$ $2,272$
Bivariate normal, d=2d=2 $100$ -- -- --
Hierarchical normal, d=10d=10 $93$ $35$ $14,283$ $0.7$s/1000

High-dimensional and correlated problems show clear dominance of HMC over alternatives (Granados et al., 8 Jan 2025).

6. Extensions and Advanced Variants

Numerous extensions address HMC’s limitations with multimodal, spiky, or irregular posteriors:

  • No-U-Turn Sampler (NUTS): Automatically adapts integration time, eliminating manual LL tuning (Mukherjee et al., 4 Jan 2026, Leigh et al., 2022).
  • Stochastic Gradient HMC (SGHMC): Permits efficient inference on massive datasets using minibatched noisy gradients (Mukherjee et al., 4 Jan 2026).
  • Riemannian Manifold HMC: Employs position-dependent mass matrices to align proposals with posterior geometry (Betancourt et al., 2011, Mukherjee et al., 4 Jan 2026).
  • Magnetic HMC: Adds velocity-dependent curl terms to enhance mixing across difficult regions (Tripuraneni et al., 2016, Brofos et al., 2020).
  • Quantum-Inspired and Relativistic HMC: Randomizes mass matrices or introduces velocity bounds, facilitating escape from multimodal barriers and spiky potentials (Liu et al., 2019, Lu et al., 2016).
  • Particle HMC (PHMC): Adapts HMC for state-space models by combining SMC-based marginal likelihood and gradient estimation, addressing intractable latent variable gradients (Amri et al., 14 Apr 2025).
  • Continuously Tempered HMC: Bridges multimodal targets with base distributions via continuous temperature variables, dramatically improving evidence estimation and mode traversal (Graham et al., 2017).
  • Probabilistic Path HMC (PPHMC): Generalizes HMC to spaces with intricate combinatorial structure, exploiting boundary-crossing and surrogate gradients (Dinh et al., 2017).
  • Mix & Match HMC (MMHMC): Uses modified shadow Hamiltonians and partial momentum refreshment for irreversibility and improved mixing, with new ESS diagnostics for importance-weighted samples (Radivojević et al., 2017).
  • Variance Reduction Techniques: Coupled-chain HMC with control variates and antithetic sampling improves effective sample size and estimator variance (Piponi et al., 2020).
  • Adaptive Mass and Entropy-Based Tuning: Gradient-based adaptation schemes maximize proposal entropy for optimal performance (Hirt et al., 2021).

7. Practical Considerations, Diagnostics, and Software

Modern implementations (Stan, PyMC3, TFP, hmclearn) incorporate adaptive step-size tuning, mass-matrix estimation, and trajectory length strategies, often wrapping NUTS or RMHMC as the default engine (Mukherjee et al., 4 Jan 2026, Thomas et al., 2020). Diagnostics include acceptance rate, energy error histograms, trace/autocorrelation plots, divergent transitions, and effective sample size per gradient evaluation or per unit time. GPU acceleration enables efficient handling of large-scale Bayesian inference data (Beam et al., 2014).

Recent advances have introduced unbiased perfect simulation frameworks (via coupled chains and rounding mechanisms) that separate MCMC convergence error from experimental (Monte Carlo) error and enable i.i.d. samples for rigorous uncertainty quantification (Leigh et al., 2022).

In summary, Hamiltonian Monte Carlo delivers state-of-the-art mixing and sampling efficiency in Bayesian computation, hinging critically on symplectic integrators, careful hyperparameter tuning, and sometimes problem-specific geometric or physical augmentations. Its modern variants and diagnostics address a range of contemporary statistical challenges, from high-dimensional correlated inference to non-smooth or multimodal posterior landscapes.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Hamiltonian Monte Carlo (HMC).