Hamiltonian Monte Carlo Algorithms

Updated 15 January 2026

Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo method that leverages Hamiltonian dynamics and auxiliary momentum to efficiently sample from complex, high-dimensional distributions.
It employs symplectic integrators like the leapfrog scheme coupled with a Metropolis correction to maintain energy conservation and detailed balance.
Effective HMC implementation requires careful tuning of hyperparameters such as the step size, mass matrix, and integration time to optimize performance and reduce divergent transitions.

Hamiltonian Monte Carlo (HMC) Algorithms

Hamiltonian Monte Carlo is a class of Markov chain Monte Carlo (MCMC) methods that exploit the structure of Hamiltonian dynamics to sample efficiently from complex, high-dimensional probability distributions. By introducing auxiliary momentum variables and simulating deterministic physical trajectories through the joint parameter–momentum space, HMC proposes distant, low-rejection moves that suppress the slow diffusive behavior of random-walk algorithms. The method is central to scalable and exact Bayesian inference in contemporary statistics and machine learning, and forms the backbone of major probabilistic programming engines (Mukherjee et al., 4 Jan 2026, Granados et al., 8 Jan 2025).

1. Mathematical Framework and Hamiltonian Dynamics

HMC augments a d-dimensional target parameter vector $q \in \mathbb{R}^d$ with an auxiliary momentum $p \in \mathbb{R}^d$ and considers the Hamiltonian

$H(q, p) = U(q) + K(p),$

where $U(q) = -\log \pi(q)$ defines the potential energy (negative log-target) and $K(p) = \frac{1}{2} p^\top M^{-1} p$ provides the kinetic energy for some positive definite mass matrix $M$ (Mukherjee et al., 4 Jan 2026, Granados et al., 8 Jan 2025). The joint density on $(q,p)$ is

$\pi_{aug}(q,p) \propto \exp(-H(q,p)) = \pi(q) \cdot \exp(-K(p)).$

Sampling from $\pi_{aug}$ and discarding $p$ yields samples from the desired target $p \in \mathbb{R}^d$ 0.

Hamiltonian evolution is specified by the system

$p \in \mathbb{R}^d$ 1

preserving two crucial invariants:

Energy conservation: The total energy $p \in \mathbb{R}^d$ 2 is constant along trajectories.
Volume-preservation (Liouville's Theorem): The flow is divergence-free; thus, volume in phase-space is preserved. These properties guarantee that the proposal mechanism, if simulated exactly, would accept all proposed states in a Metropolis–Hastings step. In practice, time discretization is necessary (Mukherjee et al., 4 Jan 2026, Granados et al., 8 Jan 2025).

2. Numerical Integrators and Metropolis Correction

Exact simulation is infeasible for general $p \in \mathbb{R}^d$ 3; HMC uses symplectic, reversible integrators—chiefly the leapfrog (Störmer–Verlet) scheme:

$p \in \mathbb{R}^d$ 4
$p \in \mathbb{R}^d$ 5
$p \in \mathbb{R}^d$ 6

Repeated $p \in \mathbb{R}^d$ 7 times, the trajectory approximates a physical Hamiltonian path over total integration time $p \in \mathbb{R}^d$ 8 (Mukherjee et al., 4 Jan 2026, Granados et al., 8 Jan 2025). Local integrator error is $p \in \mathbb{R}^d$ 9, global error over the trajectory is $H(q, p) = U(q) + K(p),$ 0.

Because the leapfrog integrator does not exactly preserve $H(q, p) = U(q) + K(p),$ 1, HMC employs a Metropolis acceptance step:

$H(q, p) = U(q) + K(p),$ 2

to ensure detailed balance with respect to $H(q, p) = U(q) + K(p),$ 3 (Mukherjee et al., 4 Jan 2026).

3. Tuning, Hyperparameters, and Diagnostics

Performance is sensitive to three core hyperparameters:

Mass matrix $H(q, p) = U(q) + K(p),$ 4: Default $H(q, p) = U(q) + K(p),$ 5; setting $H(q, p) = U(q) + K(p),$ 6 (full or diagonal) improves mixing by preconditioning the parameter space (Mukherjee et al., 4 Jan 2026, Hirt et al., 2021).
Step size $H(q, p) = U(q) + K(p),$ 7: Smaller $H(q, p) = U(q) + K(p),$ 8 leads to higher acceptance and better energy conservation but requires more gradient evaluations per unit time. Larger $H(q, p) = U(q) + K(p),$ 9 increases computational efficiency per step but induces lower acceptance and possible integration instability. A typical target acceptance is $U(q) = -\log \pi(q)$ 0– $U(q) = -\log \pi(q)$ 1 (Mukherjee et al., 4 Jan 2026, Granados et al., 8 Jan 2025).
Number of steps $U(q) = -\log \pi(q)$ 2 (or integration time $U(q) = -\log \pi(q)$ 3): Choosing too few steps results in small, random-walk-like moves; too many may lead to trajectories "curling around" or wasted computation. The heuristic is to tune $U(q) = -\log \pi(q)$ 4 to approximately half the period of typical trajectories, or use adaptive schemes such as No-U-Turn Sampler (NUTS), which eliminates manual setting of $U(q) = -\log \pi(q)$ 5 (Mukherjee et al., 4 Jan 2026).

Diagnostics include monitoring the acceptance rate, effective sample size (ESS), traceplots, autocorrelation, the value of $U(q) = -\log \pi(q)$ 6, and energy error statistics (e.g., "divergent transitions" in Stan) (Mukherjee et al., 4 Jan 2026, Granados et al., 8 Jan 2025).

4. Variants, Extensions, and Generalizations

HMC forms the foundation for numerous algorithmic variants, tailored to different model families and computational regimes:

No-U-Turn Sampler (NUTS): Dynamically terminates trajectories before doubling back, removing the need to tune $U(q) = -\log \pi(q)$ 7 (Mukherjee et al., 4 Jan 2026).
Riemann Manifold HMC (RMHMC): Uses a position-dependent mass matrix $U(q) = -\log \pi(q)$ 8 (e.g. Fisher or Hessian metric) to adapt to local curvature, necessitating generalized, often implicit, symplectic integrators (Mukherjee et al., 4 Jan 2026, Hirt et al., 2021). Randomizing trajectory lengths in RMHMC can mitigate slow mixing due to path resonance (Whalley et al., 2022).
Stochastic Gradient HMC (SGHMC): Utilizes noisy minibatch gradients; requires friction/diffusion terms and sacrifices asymptotic exactness (Mukherjee et al., 4 Jan 2026).
Magnetic and Repelling/Attracting HMC: Modifies Hamiltonian dynamics with additional skew-symmetric (magnetic) or bespoke potentials to improve exploration in multimodal, constrained, or manifold-constrained settings (Brofos et al., 2020).
Nonparametric HMC (NP-HMC): Generalizes HMC to infinite-dimensional spaces, as needed for generic probabilistic programming languages and nonparametric models (Mak et al., 2021).
Particle HMC (PHMC): Integrates sequential Monte Carlo to enable HMC-style parameter inference in latent-variable state-space models with intractable posteriors (Amri et al., 14 Apr 2025).
Modified and Irreversible HMC (MMHMC, GHMC): Leverages backward error analysis to sample on a modified Hamiltonian, utilizes partial momentum refreshment and higher-order integrators to suppress random-walk limits and improve ESS, and allows for irreversible proposals (Radivojević et al., 2017).
Entropy-based Adaptive HMC: Selects the mass matrix (via Cholesky factorization) by maximizing an entropy-based proposal objective, promoting high acceptance and exploration of all directions, and surpasses classical ESJD criteria in mixing efficiency (Hirt et al., 2021).
Quantum-Inspired and Relativistic HMC: Randomizes the mass matrix across trajectories to exploit variable time-scale dynamics, improving robustness on spiky, ill-conditioned, or multimodal targets (Liu et al., 2019, Lu et al., 2016).

5. Empirical Performance and Comparative Results

Systematic benchmarking demonstrates the advantages of HMC over random walk Metropolis–Hastings (RWMH), Metropolis-adjusted Langevin algorithms (MALA), and multi-move alternatives in unimodal, highly correlated, or high-dimensional regimes. In a Gamma(5,1) example, HMC achieves a $U(q) = -\log \pi(q)$ 9 acceptance rate and nearly independent effective draws, compared to $K(p) = \frac{1}{2} p^\top M^{-1} p$ 0 for RWMH and $K(p) = \frac{1}{2} p^\top M^{-1} p$ 1 for the t-walk (Granados et al., 8 Jan 2025). In a 10-dimensional hierarchical normal model, HMC delivered 14,283 effective samples per 500,000 iterations, compared to 218 for RWMH in a fraction of the wall-clock time.

For intractable or latent-variable models, PHMC outperforms particle marginal MH, with substantially higher acceptance and ESS, at the cost of running several SMC passes per iteration (Amri et al., 14 Apr 2025).

On GPU architectures, HMC maps efficiently due to its reliance on linear algebraic operations—yielding 50–100× speedups for moderate to large problems (e.g., high-dimensional multinomial regression) and enabling full Bayesian analysis previously limited by computational cost (Beam et al., 2014).

Entropy- and control-variates–based adaptations, as well as perfect simulation frameworks, yield further statistical and computational improvements, especially in terms of effective sample size per gradient evaluation (Piponi et al., 2020, Leigh et al., 2022).

6. Theoretical Guarantees and Limitations

The detailed balance, invariance of the target, ergodicity, and correctness of the HMC chain (with leapfrog or reversible integrators and Metropolis correction) are rigorously established, provided the potential is smooth and the numerical integrator preserves volume and reversibility up to momentum flip (Mukherjee et al., 4 Jan 2026, Lelièvre et al., 2023). For nonseparable or implicit Hamiltonians, unbiasedness is ensured through explicit reversibility checks (Lelièvre et al., 2023). NP-HMC extends correctness and ergodicity to nonparametric, infinite-dimensional trace spaces (Mak et al., 2021).

Limitations of HMC include:

Requirement of differentiable target densities (excluding non-smooth models).
Sensitivity to hyperparameters ( $K(p) = \frac{1}{2} p^\top M^{-1} p$ 2), leading to possible "divergent transitions" and poor exploration if tuning is inadequate (Mukherjee et al., 4 Jan 2026, Granados et al., 8 Jan 2025).
Potential inefficiency or mode trapping in strongly multimodal distributions, sometimes mitigated via tempering, repelling/attracting modifications, or variational/jump-based proposals (Liu et al., 2019, Gu et al., 2019).
Computation overhead: each iteration demands multiple gradient evaluations, particularly expensive for massive data or latent-variable models (Amri et al., 14 Apr 2025).

7. Practical Considerations, Software, and Future Directions

HMC is implemented in major probabilistic programming frameworks such as Stan, PyMC, Pyro, and others, typically in the form of adaptive NUTS with automated mass-matrix and step-size adaptation (Mukherjee et al., 4 Jan 2026). Practitioners are advised to:

Check and validate gradient computations on small examples.
Tune mass matrix during preliminary chains.
Target acceptance rates in the $K(p) = \frac{1}{2} p^\top M^{-1} p$ 3– $K(p) = \frac{1}{2} p^\top M^{-1} p$ 4 range for standard HMC, or higher ( $K(p) = \frac{1}{2} p^\top M^{-1} p$ 5) for advanced control variate or antithetic implementations (Piponi et al., 2020).
Monitor chain diagnostics for "divergences," high autocorrelation, or energy drift, indicative of poor tuning or geometry mismatch.
For non-Gaussian tails, multimodal targets, or nonparametric problems, employ advanced variants (e.g., RMHMC, PHMC, entropy-adaptive, NP-HMC).

Research continues to enhance HMC via adaptive geometric methods, coupling and variance-reduction techniques, unbiased and perfect simulation protocols, hybrid stochastic-deterministic splitting, and efficient implementations for deep probabilistic programming (Leigh et al., 2022, Hirt et al., 2021, Brofos et al., 2020).

Further Reading:

"Hamiltonian Monte Carlo for (Physics) Dummies" (Mukherjee et al., 4 Jan 2026)
"Understanding the Hamiltonian Monte Carlo through its Physics Fundamentals and Examples" (Granados et al., 8 Jan 2025)
"Particle Hamiltonian Monte Carlo" (Amri et al., 14 Apr 2025)
"Entropy-based adaptive Hamiltonian Monte Carlo" (Hirt et al., 2021)
"Perfect simulation of general continuous distributions" (Leigh et al., 2022)