Riemannian-Manifold HMC

Updated 11 November 2025

RMHMC is a Markov Chain Monte Carlo method that utilizes a position-dependent metric, derived from the Fisher information or Hessian, to adapt to local curvature.
It integrates geometric insights into Hamiltonian dynamics by adjusting proposal trajectories, thereby reducing random-walk behavior in complex parameter spaces.
The approach employs a semi-implicit symplectic integrator to accurately simulate nonseparable Hamiltonians, offering superior effective sample sizes despite higher computational costs.

Riemannian-Manifold Hamiltonian Monte Carlo (RMHMC) is a Markov Chain Monte Carlo (MCMC) methodology that generalizes the Hybrid/Hamiltonian Monte Carlo (HMC) paradigm to exploit the local Riemannian geometry of the target distribution’s parameter space. RMHMC equips the parameter manifold with a smoothly varying metric tensor, typically derived from the Fisher information or the negative Hessian of the log-posterior, enabling proposal trajectories that are automatically adapted to local curvature. This approach enhances sampling efficiency, particularly in high-dimensional and highly correlated posteriors, by circumventing the need for costly pilot runs to calibrate proposal scales and by reducing random-walk behavior.

1. Mathematical Formulation on a Riemannian Manifold

Let $\theta \in \mathbb{R}^D$ denote the parameter vector of interest, with target posterior density $\pi(\theta) \propto L(\theta)\,\pi_0(\theta)$ , where $L$ is the likelihood and $\pi_0$ the prior. The parameter space is endowed with a position-dependent, symmetric positive-definite metric tensor $G(\theta)\in\mathbb{R}^{D\times D}$ . A momentum variable $p\in\mathbb{R}^D$ (conditionally distributed as $p|\theta \sim \mathcal{N}(0,G(\theta))$ ) augments the parameter space, yielding an extended "Hamiltonian"

$H(\theta, p) = U(\theta) + K(\theta, p)$

with

$U(\theta) = -\log\pi(\theta) + \frac{1}{2}\log\det G(\theta),\qquad K(\theta, p) = \frac{1}{2} p^\top G(\theta)^{-1}p.$

This Hamiltonian structure ensures that the marginal distribution over $\theta$ remains $\pi(\theta)$ under the dynamics, due to the volume correction $+\frac{1}{2}\log\det G$ .

2. Riemannian Manifold Hamiltonian Dynamics

The stochastic dynamics evolve according to the Riemannian generalization of Hamilton's equations: $\dot\theta = \frac{\partial H}{\partial p} = G(\theta)^{-1}p,$

$\dot p = -\frac{\partial H}{\partial\theta} = \nabla_\theta \log\pi(\theta) - \frac{1}{2}\nabla_\theta\log|G(\theta)| + \frac{1}{2}p^\top\nabla_\theta [G(\theta)^{-1}]p.$

The first two terms ( $\nabla_\theta\log\pi(\theta)$ and $-\frac{1}{2}\nabla_\theta\log|G(\theta)|$ ) correspond to the natural gradient and volume correction, while the final term ( $+\frac{1}{2}p^\top\nabla_\theta [G(\theta)^{-1}]p$ ) accounts for the metric’s local variation and is a contraction involving third-order derivatives. The coupled ODEs trace geodesic flows under the metric $G(\theta)$ .

3. Metric Tensor Choices and Geometric Adaptivity

Typical metric choices:

Expected Fisher Information: $G(\theta) = -\mathbb{E}_{y}[\nabla^2_\theta\log L(y|\theta)]$ .
Observed Information (regularized): $G(\theta) = -\nabla^2_\theta\log L(\theta) + \epsilon I$ , with small $\epsilon>0$ for positive definiteness.

The metric tensor $G(\theta)$ re-scales the local geometry: directions with large eigenvalues (high curvature) admit smaller step sizes, enhancing stability and efficiency in regions of strong anisotropy. In flat directions, larger steps promote rapid exploration. The approach removes the need for global scaling heuristics and pilot adaptation runs as in traditional HMC.

4. Symplectic Integrator for Nonseparable Hamiltonians

The Hamiltonian $H(\theta, p)$ is nonseparable due to the position-dependence of $G(\theta)$ , precluding direct use of the explicit leapfrog integrator. The method employs a semi-implicit, time-reversible, second-order symplectic integrator. For each integration step of size $\epsilon$ :

Momentum half-step (explicit):

$p \gets p - \frac{\epsilon}{2}\nabla_\theta U(\theta)$

Position update (typically implicit):

$\theta \gets \theta + \epsilon\,G(\theta)^{-1}p$

This may require a fixed-point or Newton solve due to the nonlinearity from $G(\theta)$ .

Metric update and corresponding derivatives.
Second momentum half-step:

$p \gets p - \frac{\epsilon}{2}\nabla_\theta U(\theta)$

This update is repeated $L$ times per trajectory. All metric derivatives $\nabla_\theta G$ and $\nabla_\theta\log\det G$ must be available in closed form or computed via autodifferentiation.

5. Full RMHMC Algorithm and Practical Implementation

The complete iteration is as follows:

Compute $G(\theta^{(t-1)})$ and $\nabla_\theta U(\theta^{(t-1)})$ .
Sample $p\sim\mathcal{N}(0, G(\theta^{(t-1)}))$ .
Integrate $(\theta, p)$ via $L$ generalized leapfrog steps as above.
Compute the Hamiltonian difference $\Delta H = H(\theta, p) - H(\theta^{(t-1)}, p_\mathrm{initial})$ .
Accept the proposal with probability $\min\{1, \exp(-\Delta H)\}$ ; otherwise retain the previous state.

Key computational aspects:

Computing $G(\theta)$ and $G(\theta)^{-1}$ per step is $O(D^2)$ and $O(D^3)$ .
Calculating metric derivatives ( $\nabla_\theta G$ ) is $O(D^3)$ .
Step 2 (position update) typically requires efficient linear algebra, including Cholesky factorization or parallelization, especially in moderate ( $D \sim 10^2$ ) dimensions.
For very large $D$ , use sparse, low-rank, or block-diagonal approximations to $G(\theta)$ . Cholesky caching, partial metric updates, and structure-exploiting linear algebra are advised.

6. Practical Performance and Empirical Results

Empirical studies by Girolami & Calderhead demonstrate RMHMC’s superiority on a range of models:

Logistic regression: faster mixing in moderate dimensions.
Log-Gaussian Cox process: efficient high-dimensional GP latent variable inference ( $D \sim 100$ ).
Stochastic volatility models: improved performance on latent time-series models.
Bayesian inference on ODE parameters: rapid convergence and exploration.

Reported time-normalized Effective Sample Size (ESS) improvements over standard HMC and Random Walk Metropolis range from a factor of 2 to 10, particularly in strongly anisotropic or highly correlated posteriors.

7. Advantages, Limitations, and Applicability

Advantages:

Automatic adaptation to local curvature and anisotropy.
Suppression of random-walk behavior in challenging geometries.
Greater statistical efficiency (higher ESS per unit time) for targets with rapidly varying or strongly correlated scales.

Limitations:

High per-step computational expense—each step is dominated by $O(D^3)$ metric computation, inversion, and metric derivative evaluation.
Complex implementation—implicit integrator steps and metric derivatives make coding nontrivial.
Scalability is restricted in very high dimensions unless conditional independence, sparsity, or low-rank structure in $G(\theta)$ is exploited.

Applicability: RMHMC is most effective for moderate- to high-dimensional (tens to a few hundreds of dimensions) hierarchical Bayesian models exhibiting severe posterior anisotropy, strong dependence, or curved geometries that challenge conventional HMC or Metropolis approaches.

For algorithmic details, see the provided MATLAB code linked by the original authors, which is organized to replicate all main results and serve as reference for efficient RMHMC implementations (0907.1100).

PDF Markdown Chat (Pro)

References (1)

Riemannian Manifold Hamiltonian Monte Carlo (2009)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Riemannian-Manifold Hamiltonian Monte Carlo (RMHMC).