Microcanonical Langevin MC

Updated 3 July 2026

The paper introduces MCLMC, which achieves exact sampling of the target distribution by evolving dynamics on ℝᵈ × S^(d–1) with a fixed auxiliary kinetic energy.
It leverages a drift–diffusion splitting that preserves the invariant canonical configuration marginal while relaxing full phase-space fluctuation-dissipation requirements.
Empirical results show substantial efficiency gains over traditional methods in applications such as lattice simulations, cosmological inference, and Bayesian neural networks.

Searching arXiv for papers on Microcanonical Langevin Monte Carlo and related variants. Microcanonical Langevin Monte Carlo (MCLMC) is a class of gradient-based Monte Carlo samplers that augment a target distribution $\pi(x)\propto e^{-S(x)}$ with a unit-norm auxiliary velocity and evolve a stochastic process on $\mathbb R^d\times S^{d-1}$ whose stationary phase-space law is microcanonical in the auxiliary variable but whose configuration-space marginal is exactly canonical. Its central conceptual claim is that exact sampling of $\pi(x)$ does not require the full phase-space distribution to satisfy the fluctuation-dissipation theorem; only the marginal over configuration space must match the target. In this formulation, stochastic fluctuation can be introduced without dissipative friction, while the auxiliary kinetic energy remains fixed (Robnik et al., 2023).

1. Conceptual basis

The defining distinction of MCLMC is between two invariance requirements that are often conflated. In standard underdamped Langevin dynamics, the joint law of position and momentum is constructed to be canonical in full phase space, typically of Gibbs form $\rho(x,p)\propto e^{-\beta H(x,p)}$ , and the fluctuation-dissipation theorem enforces the balance between noise injection and friction. MCLMC relaxes that requirement: for Monte Carlo sampling, only the configuration-space marginal must equal the prescribed target $\pi(x)\propto e^{-S(x)}$ . The auxiliary variable may therefore follow a non-Gibbs law, provided that integrating it out recovers the target density (Robnik et al., 2023).

In MCLMC, “microcanonical” refers to the auxiliary dynamics living on a constant-kinetic-energy shell. The auxiliary velocity $u$ is constrained by $u\cdot u=1$ , so the process evolves on the manifold

$\mathcal M=\mathbb R^d\times S^{d-1}.$

The kinetic energy is fixed,

$K(u)=\frac12|u|^2=\frac12,$

and the stationary law in the auxiliary variable is uniform on the unit sphere rather than Gaussian. A common misconception is therefore that Langevin-type exact sampling necessarily requires dissipation; MCLMC was introduced precisely to show that this stronger phase-space requirement is unnecessary when only the $x$ -marginal matters (Robnik et al., 2023).

2. Continuous-time dynamics and invariant measure

The deterministic core of MCLMC is the ODE

$\mathbb R^d\times S^{d-1}$ 0

with

$\mathbb R^d\times S^{d-1}$ 1

Because $\mathbb R^d\times S^{d-1}$ 2, the flow preserves $\mathbb R^d\times S^{d-1}$ 3, so trajectories remain on $\mathbb R^d\times S^{d-1}$ 4. The stochastic extension defining the continuous-time sampler is

$\mathbb R^d\times S^{d-1}$ 5

where $\mathbb R^d\times S^{d-1}$ 6 is the stochasticity strength and the Wiener noise acts only tangentially to the sphere (Robnik et al., 2023).

The corresponding Fokker–Planck equation on $\mathbb R^d\times S^{d-1}$ 7 is

$\mathbb R^d\times S^{d-1}$ 8

where $\mathbb R^d\times S^{d-1}$ 9 is the drift vector field and $\pi(x)$ 0 is the Laplace–Beltrami operator on $\pi(x)$ 1. The stationary density is

$\pi(x)$ 2

or, relative to the intrinsic measure $\pi(x)$ 3,

$\pi(x)$ 4

Equivalently, in shell notation,

$\pi(x)$ 5

Marginalizing over the sphere gives

$\pi(x)$ 6

so the target distribution on configuration space is exactly recovered (Robnik et al., 2023).

A notable structural property is that the deterministic drift and the spherical diffusion each preserve $\pi(x)$ 7 separately. This is unusual among gradient-based samplers and underlies much of the later numerical interest in MCLMC.

3. Ergodicity, convergence, and numerical integration

The continuous-time theory establishes that MCLMC admits a unique stationary distribution for any nonzero stochasticity $\pi(x)$ 8. The ergodicity proof proceeds through a Hörmander argument: diffusion vector fields span the tangent space of the sphere, Lie brackets with the transport field generate the missing configuration-space directions, and the resulting Lie algebra spans the full tangent space of $\pi(x)$ 9. Under smoothness and accessibility conditions, this yields a smooth positive transition density and uniqueness of the invariant measure (Robnik et al., 2023).

For smooth convex targets, the analysis is sharpened to geometric ergodicity. Assuming $\rho(x,p)\propto e^{-\beta H(x,p)}$ 0 is $\rho(x,p)\propto e^{-\beta H(x,p)}$ 1-smooth and $\rho(x,p)\propto e^{-\beta H(x,p)}$ 2-convex, with periodic boundary conditions at large $\rho(x,p)\propto e^{-\beta H(x,p)}$ 3 and bounded gradient $\rho(x,p)\propto e^{-\beta H(x,p)}$ 4, the generator

$\rho(x,p)\propto e^{-\beta H(x,p)}$ 5

admits the Lyapunov function

$\rho(x,p)\propto e^{-\beta H(x,p)}$ 6

for which

$\rho(x,p)\propto e^{-\beta H(x,p)}$ 7

From this, the expectation values of observables dominated by $\rho(x,p)\propto e^{-\beta H(x,p)}$ 8 converge exponentially fast to stationarity (Robnik et al., 2023).

The original formulation emphasized drift–diffusion splitting. If $\rho(x,p)\propto e^{-\beta H(x,p)}$ 9 denotes the deterministic flow and $\pi(x)\propto e^{-S(x)}$ 0 the diffusion-only flow on the sphere, the split density update is

$\pi(x)\propto e^{-S(x)}$ 1

Because each substep preserves the invariant law individually, the split update preserves it as well. In practice, the diffusion step may be implemented by any spherical Markov kernel that preserves the uniform distribution, such as

$\pi(x)\propto e^{-S(x)}$ 2

The deterministic flow is commonly approximated with the Minimal Norm integrator, and in the lattice $\pi(x)\propto e^{-S(x)}$ 3 experiments its step size was tuned to target $\pi(x)\propto e^{-S(x)}$ 4 average single-step squared energy error per dimension (Robnik et al., 2023).

4. Historical development and relation to neighboring methods

MCLMC emerged from the broader microcanonical program initiated by Microcanonical Hamiltonian Monte Carlo (MCHMC). In that construction, the target density $\pi(x)\propto e^{-S(x)}$ 5 is represented as the marginal of a fixed-energy microcanonical ensemble,

$\pi(x)\propto e^{-S(x)}$ 6

rather than as the marginal of a canonical phase-space Gibbs law. For the default $\pi(x)\propto e^{-S(x)}$ 7 model, the Hamiltonian is

$\pi(x)\propto e^{-S(x)}$ 8

and MCLMC appears there as the continuously randomized, energy-conserving analogue of MCHMC, implemented through partial randomization of the momentum direction,

$\pi(x)\propto e^{-S(x)}$ 9

with decorrelation scale $u$ 0 chosen so that

$u$ 1

That earlier formulation framed MCLMC as an energy-conserving underdamped Langevin-like dynamics with non-Gaussian noise (Robnik et al., 2022).

Relative to standard underdamped Langevin Monte Carlo, MCLMC has no friction term, uses projected tangent-space noise, and preserves a microcanonical auxiliary law rather than a canonical Gaussian law in phase space. Relative to HMC, it keeps the auxiliary kinetic energy fixed, explores through continuous stochastic reorientation rather than occasional Gaussian momentum resampling, and in its original form does not require a Metropolis correction. Relative to NUTS, later application papers emphasized its fixed per-step computational cost and simpler parallel load balancing. The 2023 continuous-time SDE treatment also placed MCLMC outside frameworks that classify only samplers with canonical equilibrium in full phase space, because its invariant law is canonical only after marginalizing the auxiliary variable (Robnik et al., 2023).

5. Empirical performance across application domains

The original large-scale benchmark for MCLMC was the lattice $u$ 2 model, where it was compared to HMC at matched accuracy using the relative error in Fourier-space second moments. On an $u$ 3 lattice, MCLMC was reported to converge $u$ 4 times faster than HMC; on a $u$ 5 lattice it was $u$ 6 times faster, and its effective sample size was described as “almost independent” of both $u$ 7 and $u$ 8. The trend was expected to persist to larger lattices of interest for lattice quantum chromodynamics (Robnik et al., 2023).

In field-level cosmological inference, MCLMC was applied to joint sampling of initial conditions together with $u$ 9 and $u\cdot u=1$ 0 in a problem of dimension $u\cdot u=1$ 1. There it was reported to be over an order of magnitude more efficient than HMC overall, to achieve two orders of magnitude gain in the 2LPT setting, and at $u\cdot u=1$ 2 resolution to improve ESS per gradient evaluation by factors of $u\cdot u=1$ 3 for field modes and $u\cdot u=1$ 4 for cosmological parameters (Bayer et al., 2023).

In Bayesian neural networks, the method was embedded in Microcanonical Langevin Ensembles (MILE), which combine deep-ensemble warm starts with short modified MCLMC chains. The practical attraction in that setting was deterministic per-step cost: with the Minimal Norm integrator, each MCLMC step requires two gradient evaluations, and the paper used a fixed budget of $u\cdot u=1$ 5 steps per chain. Reported wall-clock speedups against NUTS-based Bayesian deep ensembles reached about $u\cdot u=1$ 6 on Bikesharing and $u\cdot u=1$ 7 on Protein, with predictive performance and uncertainty quantification maintained or improved (Sommer et al., 10 Feb 2025).

In radiological image reconstruction, MCLMC was used through the BlackJAX implementation to sample Poisson inverse problems with positivity constraints. The paper reported convergence in about $u\cdot u=1$ 8 seconds for images with $u\cdot u=1$ 9– $\mathcal M=\mathbb R^d\times S^{d-1}.$ 0 pixels when run in parallel on a GPU, and on a synthetic case with $\mathcal M=\mathbb R^d\times S^{d-1}.$ 1 samples it reported $\mathcal M=\mathbb R^d\times S^{d-1}.$ 2 s runtime for MCLMC against $\mathcal M=\mathbb R^d\times S^{d-1}.$ 3 s for HMC at essentially identical image quality. The same study also found that real-data reconstructions agreed well with ground truth and that posterior sampling supplied $\mathcal M=\mathbb R^d\times S^{d-1}.$ 4 highest-density-interval uncertainty maps unavailable from ML-EM (Pan et al., 19 Feb 2026).

6. Exactness, stochastic-gradient extensions, and terminological issues

A central later development concerned exactness. The original unadjusted microcanonical samplers were empirically efficient, but subsequent work argued that practical discretized MCLMC is asymptotically biased because the deterministic microcanonical integrator is compressible and not volume-preserving. The Metropolis-Adjusted Microcanonical sampler (MAMS) retains the same microcanonical dynamics and derives an exact Metropolis–Hastings correction even though the flow is not symplectic and not Hamiltonian in the standard sense. For deterministic proposals, the negative log acceptance ratio is

$\mathcal M=\mathbb R^d\times S^{d-1}.$ 5

and for MAMS this reduces to the accumulated energy error. Empirically, MAMS and its Langevin variant were reported to outperform NUTS across benchmark problems, typically by factors of $\mathcal M=\mathbb R^d\times S^{d-1}.$ 6– $\mathcal M=\mathbb R^d\times S^{d-1}.$ 7 in gradient evaluations to reach the target error threshold (Robnik et al., 3 Mar 2025).

A second extension asked whether MCLMC can exploit mini-batch gradient noise. The continuous-time analysis there identified two failure modes: anisotropic mini-batch noise induces a noise-induced drift that perturbs the invariant distribution, and naive fixed-step stochastic microcanonical integrators can become numerically unstable in large neural networks. The proposed remedy was a diagonal gradient-noise preconditioner, based on online estimates of gradient standard deviations, combined with an energy-variance-based adaptive tuner and guardrails driven by the distribution of $\mathcal M=\mathbb R^d\times S^{d-1}.$ 8. The resulting tuned pSMILE samplers were reported to be robust on challenging Bayesian neural network tasks and competitive with or better than SGHMC and cSGLD, while approaching the performance of full-batch MCLMC in equal epoch-budget comparisons (Sommer et al., 6 Feb 2026).

The terminology surrounding “microcanonical Monte Carlo” also requires care. The paper “General method to perform Microcanonical Monte Carlo Simulations” does address microcanonical sampling, but it does not define a microcanonical Langevin SDE, does not project dynamics onto an energy shell, and does not provide an invariant-measure derivation of a Langevin-type process. Instead, it performs canonical Monte Carlo at adaptively tuned temperatures and filters configurations into a thin energy shell. It is therefore relevant to microcanonical sampling in a broad sense but not to MCLMC in the precise dynamical sense established by the later microcanonical Hamiltonian and microcanonical Langevin literature (Palma et al., 2019).

Taken together, these developments place MCLMC at the intersection of noncanonical phase-space design, high-dimensional geometric sampling, and structure-preserving stochastic dynamics. Its distinctive contribution is the replacement of canonical phase-space equilibrium by a microcanonical auxiliary law with an exact canonical configuration marginal; its main methodological tensions concern discretization bias, exact correction, and the handling of anisotropic stochastic-gradient noise.