Papers
Topics
Authors
Recent
Search
2000 character limit reached

Hamiltonian Monte Carlo (HMC)

Updated 6 January 2026
  • Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo method that leverages Hamiltonian dynamics to overcome random-walk behavior in sampling complex, continuous distributions.
  • It augments parameters with auxiliary momentum variables and uses symplectic integrators like the leapfrog scheme along with a Metropolis correction to maintain target invariance.
  • HMC's efficiency critically depends on tuning the step size, trajectory length, and mass matrix, enabling superior mixing and effective sample sizes in high-dimensional Bayesian problems.

Hamiltonian Monte Carlo (HMC) is a Markov Chain Monte Carlo algorithm that exploits the principles of Hamiltonian dynamics to produce efficient proposals for sampling from continuous target densities, especially in high-dimensional Bayesian inference problems. By augmenting parameters with auxiliary momentum variables and simulating the joint evolution according to a Hamiltonian function, HMC avoids the random-walk behavior and poor mixing endemic to classic Metropolis-Hastings or Gibbs schemes in complex geometries. The defining features of HMC include volume-preserving and reversible (symplectic) integrators, a gradient-informed proposal mechanism, and a Metropolis correction that ensures invariance of the target density.

1. Hamiltonian Formulation and Dynamics

HMC samples a target density π(θ)\pi(\theta) on Rd\mathbb{R}^d, typically known up to normalization. The algorithm introduces a potential energy U(θ)=logπ(θ)U(\theta) = -\log\pi(\theta) and augments state space with auxiliary momentum pRdp\in\mathbb{R}^d, distributed as pN(0,M)p\sim\mathcal{N}(0, M) for some mass matrix M0M\succ 0 (often chosen diagonal or as the identity). Kinetic energy is K(p)=12pTM1pK(p) = \frac{1}{2}p^T M^{-1}p, yielding the total Hamiltonian

H(θ,p)=U(θ)+K(p).H(\theta,p) = U(\theta) + K(p) \,.

The deterministic flow in phase space (θ(t),p(t))(\theta(t),p(t)) over time tt is governed by Hamilton's equations:

Rd\mathbb{R}^d0

which preserve the joint density Rd\mathbb{R}^d1 exactly, exhibiting reversibility (Rd\mathbb{R}^d2) and volume preservation (by Liouville's theorem) (Granados et al., 8 Jan 2025, Mukherjee et al., 4 Jan 2026).

2. Symplectic Integrators and Metropolis Correction

Closed-form solutions to Hamilton's equations exist only for limited cases. Practically, HMC uses discrete symplectic (volume-preserving, reversible) integrators, primarily the leapfrog scheme. For small time increment Rd\mathbb{R}^d3:

Rd\mathbb{R}^d4

Rd\mathbb{R}^d5 leapfrog steps trace a trajectory Rd\mathbb{R}^d6. Since leapfrog only approximately conserves Rd\mathbb{R}^d7, a Metropolis accept-reject step is applied:

Rd\mathbb{R}^d8

ensuring invariance of the desired posterior and correcting numerical drift (Granados et al., 8 Jan 2025, Vishnoi, 2021).

3. Algorithmic Structure and Implementation

The canonical HMC algorithm proceeds as follows:

  1. Sample Rd\mathbb{R}^d9.
  2. Run U(θ)=logπ(θ)U(\theta) = -\log\pi(\theta)0 leapfrog steps to U(θ)=logπ(θ)U(\theta) = -\log\pi(\theta)1.
  3. Negate momentum: U(θ)=logπ(θ)U(\theta) = -\log\pi(\theta)2 (for time-reversal symmetry).
  4. Metropolis correction as above.
  5. If accepted, set U(θ)=logπ(θ)U(\theta) = -\log\pi(\theta)3; else retain previous U(θ)=logπ(θ)U(\theta) = -\log\pi(\theta)4.

Parameter tuning is critical. Step size U(θ)=logπ(θ)U(\theta) = -\log\pi(\theta)5 must balance energy conservation and trajectory length (U(θ)=logπ(θ)U(\theta) = -\log\pi(\theta)6). Optimal acceptance rates are empirically U(θ)=logπ(θ)U(\theta) = -\log\pi(\theta)7 in high dimensions. Mass matrix U(θ)=logπ(θ)U(\theta) = -\log\pi(\theta)8 is ideally set to the posterior covariance or its diagonal. Automated adaptation (e.g. in Stan) uses dual averaging for U(θ)=logπ(θ)U(\theta) = -\log\pi(\theta)9 and robust strategies (e.g. NUTS) for path-length (Mukherjee et al., 4 Jan 2026, Granados et al., 8 Jan 2025, Thomas et al., 2020).

4. Geometric and Physical Properties

The theoretical basis of HMC is symplectic geometry. Hamiltonian flows on phase space preserve the symplectic form and volume, ensuring unbiased sampling regardless of the integrator (Betancourt et al., 2011). The kinetic energy can be generalized to position-dependent metrics, resulting in Riemannian Manifold HMC (RMHMC), which adapts to local curvature and further improves mixing but requires specialized integrators for non-separable Hamiltonians (Betancourt et al., 2011, Mukherjee et al., 4 Jan 2026). Volume-preservation and reversibility (up to momentum-flip) are necessary for detailed balance and unbiasedness (Lelièvre et al., 2023).

5. Performance Characteristics and Comparative Analysis

HMC exhibits dramatic improvement in effective sample size, autocorrelation time, and CPU efficiency over Random Walk Metropolis-Hastings (RWMH) and derivative-free samplers (e.g., t-walk), especially in correlated or high-dimensional targets. For unimodal, strongly correlated posteriors (e.g., Gamma(5,1), bivariate normal, hierarchical normal), HMC achieves near-unity acceptance rates (e.g., pRdp\in\mathbb{R}^d0, pRdp\in\mathbb{R}^d1) and low integrated autocorrelation times (IAT pRdp\in\mathbb{R}^d2), with orders-of-magnitude more effective samples per unit time. In simple multimodal problems, t-walk samplers may outperform HMC in traversing modes, though HMC can perform well with appropriate augmentations (Granados et al., 8 Jan 2025, Vishnoi, 2021).

Target distribution HMC acceptance (%) IAT Effective sample size Samples/sec
Gamma(5,1), pRdp\in\mathbb{R}^d3 pRdp\in\mathbb{R}^d4 pRdp\in\mathbb{R}^d5 pRdp\in\mathbb{R}^d6 pRdp\in\mathbb{R}^d7
Bivariate normal, pRdp\in\mathbb{R}^d8 pRdp\in\mathbb{R}^d9 -- -- --
Hierarchical normal, pN(0,M)p\sim\mathcal{N}(0, M)0 pN(0,M)p\sim\mathcal{N}(0, M)1 pN(0,M)p\sim\mathcal{N}(0, M)2 pN(0,M)p\sim\mathcal{N}(0, M)3 pN(0,M)p\sim\mathcal{N}(0, M)4s/1000

High-dimensional and correlated problems show clear dominance of HMC over alternatives (Granados et al., 8 Jan 2025).

6. Extensions and Advanced Variants

Numerous extensions address HMC’s limitations with multimodal, spiky, or irregular posteriors:

7. Practical Considerations, Diagnostics, and Software

Modern implementations (Stan, PyMC3, TFP, hmclearn) incorporate adaptive step-size tuning, mass-matrix estimation, and trajectory length strategies, often wrapping NUTS or RMHMC as the default engine (Mukherjee et al., 4 Jan 2026, Thomas et al., 2020). Diagnostics include acceptance rate, energy error histograms, trace/autocorrelation plots, divergent transitions, and effective sample size per gradient evaluation or per unit time. GPU acceleration enables efficient handling of large-scale Bayesian inference data (Beam et al., 2014).

Recent advances have introduced unbiased perfect simulation frameworks (via coupled chains and rounding mechanisms) that separate MCMC convergence error from experimental (Monte Carlo) error and enable i.i.d. samples for rigorous uncertainty quantification (Leigh et al., 2022).

In summary, Hamiltonian Monte Carlo delivers state-of-the-art mixing and sampling efficiency in Bayesian computation, hinging critically on symplectic integrators, careful hyperparameter tuning, and sometimes problem-specific geometric or physical augmentations. Its modern variants and diagnostics address a range of contemporary statistical challenges, from high-dimensional correlated inference to non-smooth or multimodal posterior landscapes.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hamiltonian Monte Carlo (HMC).