The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo (1111.4246v1)

Published 18 Nov 2011 in stat.CO and cs.LG

Abstract: Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm that avoids the random walk behavior and sensitivity to correlated parameters that plague many MCMC methods by taking a series of steps informed by first-order gradient information. These features allow it to converge to high-dimensional target distributions much more quickly than simpler methods such as random walk Metropolis or Gibbs sampling. However, HMC's performance is highly sensitive to two user-specified parameters: a step size {\epsilon} and a desired number of steps L. In particular, if L is too small then the algorithm exhibits undesirable random walk behavior, while if L is too large the algorithm wastes computation. We introduce the No-U-Turn Sampler (NUTS), an extension to HMC that eliminates the need to set a number of steps L. NUTS uses a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps. Empirically, NUTS perform at least as efficiently as and sometimes more efficiently than a well tuned standard HMC method, without requiring user intervention or costly tuning runs. We also derive a method for adapting the step size parameter {\epsilon} on the fly based on primal-dual averaging. NUTS can thus be used with no hand-tuning at all. NUTS is also suitable for applications such as BUGS-style automatic inference engines that require efficient "turnkey" sampling algorithms.

Citations (4,024)

View on Semantic Scholar

Summary

The paper introduces NUTS, an adaptive algorithm that eliminates the need for manual tuning of the path length in HMC.
It employs a recursive doubling process and dual-averaging to dynamically adjust step sizes for efficient sampling.
Empirical tests on various high-dimensional models show that NUTS outperforms traditional HMC in effective sample size per computational cost.

The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

The paper by Hoffman and Gelman introduces the No-U-Turn Sampler (NUTS), an advanced Markov Chain Monte Carlo (MCMC) algorithm that enhances the Hamiltonian Monte Carlo (HMC) methodology by addressing one of its primary limitations: the need to manually select the path length parameter $L$ . This enhancement significantly improves the usability of HMC, as NUTS eliminates the cumbersome tunings traditionally required for effective implementation.

Hamiltonian Monte Carlo

Hamiltonian Monte Carlo has gained favor in the MCMC community due to its efficiency at sampling from high-dimensional, complex posterior distributions. HMC differentiates itself by avoiding inefficient random walk behavior through leveraging first-order gradient information. This ensures more rapid convergence to the target distribution. However, its performance hinges on the careful selection of two parameters: the step size $\epsilon$ and the path length $L$ . Suboptimal choices for these parameters can severely degrade HMC's efficiency, either by reintroducing random walk behavior or by wasting computational resources.

No-U-Turn Sampler (NUTS)

NUTS is presented as an extension of HMC that adaptively determines an optimal path length without manual intervention. Instead of fixing $L$ , NUTS employs a recursive algorithm to explore the parameter space more effectively. It builds potential candidate points until it detects a U-Turn, indicating the start of retraced steps, thus ceasing further path length expansion. This dynamic stopping criterion ensures that NUTS automatically adjusts the trajectory length for optimal efficiency.

Methodology and Innovation

The core innovation of NUTS lies in its recursive doubling process. The algorithm begins by taking forward and backward leapfrog steps, repeatedly doubling the number of steps until a U-Turn is detected. Each doubling step explores new states in the parameter space, and the algorithm halts once further exploration would be wasteful. A mathematical stopping criterion relies on the dot product of the current momentum and position vector difference. If this dot product changes sign, indicating a reversal in direction, the sampler stops.

Additionally, the paper introduces a dual-averaging method for adaptively tuning the step size $\epsilon$ . This method, rooted in stochastic optimization principles, allows the algorithm to fine-tune $\epsilon$ during the burn-in phase, leading to more efficient sampling with minimal manual tuning.

Empirical Evaluation

The authors perform extensive empirical evaluations using four high-dimensional target distributions:

Multivariate Normal Distribution (MVN);
Bayesian Logistic Regression (LR);
Hierarchical Bayesian Logistic Regression (HLR);
Stochastic Volatility Model (SV).

The results demonstrate that NUTS outperforms traditional HMC in terms of effective sample size (ESS) normalized by computational cost. NUTS consistently achieves superior or comparable efficiency across all tested distributions without prior tuning of $L$ , unlike HMC, which required careful tuning to perform optimally.

Practical and Theoretical Implications

Practically, NUTS offers a robust and user-friendly alternative to HMC, enabling more efficient Bayesian inference without deep expertise in tuning MCMC algorithms. This is particularly beneficial for inclusion in automatic inference engines, making sophisticated MCMC methods more accessible to a broader audience.

Theoretically, the adaptive step size mechanism and recursive trajectory-building in NUTS introduce new avenues for further algorithmic enhancements. Future research could explore extending NUTS with Riemannian Manifold Hamiltonian Monte Carlo (RMHMC) to adapt mass matrices dynamically, potentially increasing sampling efficiency in highly structured parameter spaces.

Conclusion

The NUTS algorithm represents a significant advancement in MCMC methodologies by dynamically adjusting the path length parameter and adaptively tuning the step size $\epsilon$ . This innovation makes HMC more accessible and efficient, empowering practitioners to perform sophisticated Bayesian inference with minimal manual tuning. Given its successful empirical performance and robust theoretical foundation, NUTS is likely to become a mainstay in the toolkit for high-dimensional posterior sampling.

PDF Markdown