Information-Geometric Adaptive Sampling
- Information-geometric adaptive sampling is a set of algorithms that use the geometry of probability distributions to enable efficient exploration of complex targets.
- It leverages metrics such as the Fisher–Rao metric and KL divergence to inform proposal adjustments in methods like MCMC, Langevin dynamics, and importance sampling.
- This adaptive framework enhances acceptance rates, accelerates convergence, and improves robustness when sampling high-dimensional or multimodal distributions.
Information-geometric adaptive sampling refers to a class of algorithms that exploit the geometry of probability distributions—specifically, information-theoretic or Riemannian structures—to enable adaptive, efficient exploration of complex or high-dimensional target distributions. These methods systematically adjust proposal distributions, drift/diffusion terms, or sampling schedules in response to information-geometric quantities such as the Kullback–Leibler (KL) divergence, Fisher–Rao metric, or weighted Wasserstein distances. The resulting framework unifies diverse approaches—ranging from Markov chain Monte Carlo (MCMC) to adaptive Langevin dynamics, importance sampling, and discovery-driven decision design—under a shared principle of geometric adaptation for accelerated, robust sampling.
1. Foundations in Information Geometry
Information geometry studies the structure of statistical manifolds where each point corresponds to a parametric probability distribution. Two pillars are central:
- Fisher–Rao Metric: Embeds the parametric family with a Riemannian metric defined by the Fisher information matrix, enabling local measurements of statistical distance:
This metric underpins natural-gradient updates and allows for geodesic flows that adapt to the intrinsic curvature of the target.
- KL Divergence and I-Projections: KL divergence acts as a measure of distance between probability distributions, with the forward KL (I-projection) focusing proposals on local modes and geometric features of the target. Minimizing KL divergence, or measuring step sizes in terms of KL, yields movement along natural geodesics of the manifold (Barp et al., 2022, Dharamshi et al., 2021).
These geometric elements unify the design of samplers (e.g., information-geometric MCMC, natural-gradient-based optimization, Riemannian Langevin processes) and inform the construction of adaptive, information-efficient sampling strategies.
2. Divergence Minimization and Regional Adaptation in MCMC
A canonical example of information-geometric adaptive sampling is divergence-minimizing MCMC (Dharamshi et al., 2021). Here, at each state , the algorithm optimizes the proposal distribution within a parametric family (e.g., Gaussian covariances) to locally minimize
while simultaneously accounting for the expected Metropolis acceptance rate. This results in a score
with and a user-controlled trade-off parameter .
Following a stochastic gradient step in the Cholesky factor of the proposal covariance at each iteration,
the algorithm achieves rapid, "regional" adaptation: every visited state has its own optimally tuned proposal, rather than a global estimate. This regionality enables accurate tracking of complex or non-Gaussian geometries, yielding high acceptance rates (e.g., 70% vs. 8% for adaptive random-walk proposals on "banana" distributions) and efficient multimodal exploration, especially when embedded within tempering or scout-based frameworks.
The information-geometric rationale is that the surrogate performs a local natural-gradient ascent in the space of Gaussian proposals, following the Fisher metric induced by the statistical manifold.
3. Weighted-Wasserstein Adaptive Diffusions
For continuous-time sampling, information-geometric adaptive sampling manifests as state-dependent diffusions governed by weighted-Wasserstein gradient flows of the KL divergence (Engquist et al., 2024). The family of SDEs,
with 0, and appropriate 1, interpolates between standard overdamped Langevin dynamics and a derivative-free regime. The geometry is specified by the weighted-Wasserstein metric 2, where 3 controls the local "length" of steps in distribution space.
Crucially, by choosing 4, one obtains a pure diffusion with zero drift and state-dependent variance. This adaptive-variance sampling accelerates mixing in multimodal or non-log-concave settings: mean exit times between local minima transition from exponential 5 (for classical Langevin) to algebraic 6. Theoretical analysis proves uniform exponential convergence in both KL and 7 divergence under mild functional inequalities, delineating precisely how the adaptive geometry "flattens" energy barriers and ameliorates the curse of nonconvexity.
This geometric view generalizes directly to high-dimensional or non-Euclidean settings by endowing the domain with an adaptive, potentially anisotropic metric tensor 8.
4. Adaptive Importance Sampling and Optimization
Information-geometric principles also underpin adaptive importance sampling and black-box optimization. In gradient-based adaptive importance samplers such as GRAMIS (Elvira et al., 2022), proposals are maintained as a mixture of parametrized distributions (e.g., Gaussians). Each proposal's mean is updated using a natural-gradient–style step on the log target, preconditioned by the local covariance (Laplace/Fisher approximation), and augmented with a repulsion term to foster coverage of distinct modes:
9
The covariance is reset to the local negative inverse Hessian when possible, further matching the local geometry. Empirical results indicate that this natural-gradient adaptation with repulsion both accelerates convergence and outperforms traditional adaptive mixture samplers on high-dimensional, multimodal, or non-Gaussian targets.
In the context of information geometric optimization (IGO), adaptive sampling is leveraged through reusing past samples via importance sampling (without biasing the natural gradient). This is achieved by forming a mixture proposal across prior search distributions, lowering Monte Carlo variance, and directly preserving the Fisher–Rao structure of the manifold (Shirakawa et al., 2018).
5. Adaptive Schedulers and Complexity-Efficient Sampling
Balancing geometric adaptation with computational cost is critical, particularly in high-dimensional settings. Geometric adaptive Monte Carlo (GAMC) (Papamarkou et al., 2016) addresses this by randomly alternating between expensive, geometry-aware proposals (e.g., manifold Langevin) and cost-effective adaptive proposals (adaptive Metropolis). The switch schedule is governed by a decaying probability,
0
ensuring frequent exploitation of local geometry in the transient phase, but transitioning to an asymptotically adaptive regime for long-run efficiency. This strategy optimizes the effective sample size per unit computational time, interpolating gracefully between the extremes of full geometric versus fully adaptive proposals. The transition kernel and empirical covariance updates preserve ergodicity and allow practical tuning of the "geometry versus cost" trade-off.
6. Information-Geometric Adaptive Time-Stepping
In the context of diffusion models for generative tasks such as graph and molecule generation, information-geometric adaptive sampling realizes adaptive time-stepping by enforcing constant informational speed along the sampling trajectory (Lu et al., 30 Apr 2026). The evolution is parametrized as a curve 1 on a statistical manifold endowed with the Fisher–Rao metric, with the key quantity being the Drift Variation Score (DVS):
2
where 3 measures the instantaneous change in the drift field. Step sizes are chosen so that each discretization covers equal Fisher–Rao arc-length:
4
This approach dynamically refines the time grid in regions of high geometric stiffness or curvature (high DVS), while coarsening in flat regions. Experimental results demonstrate substantial gains in sample quality and efficiency compared to fixed or heuristic step-size schedules, confirming the operational relevance of information-geometric time adaptation in practical machine learning tasks.
7. Information-Directed Sampling and Active Inference Connections
Information-geometric adaptive sampling extends beyond classical MCMC or importance sampling to sequential experimental design and discovery. In adaptive discovery frameworks using Information-Directed Sampling (IDS) (Xu et al., 2022), the sampling policy at each step minimizes an information-ratio objective:
5
balancing immediate expected regret with expected information gain, the latter quantified as mutual information about optimal actions (a KL divergence on posterior distributions). This design ensures that each sampling action moves efficiently in the posterior manifold, minimizing a Bregman divergence per unit cumulative loss. The approach inherits information-geometric properties by implicitly adapting to the local Fisher-information in the Bayesian posterior, and yields near-optimal regret rates in structured models such as linear, graph-structured, and low-rank reward domains.
References
- "Sampling by Divergence Minimization" (Dharamshi et al., 2021)
- "Geometric Methods for Sampling, Optimisation, Inference and Adaptive Agents" (Barp et al., 2022)
- "Information-geometric adaptive sampling for graph diffusion" (Lu et al., 30 Apr 2026)
- "Sampling with Adaptive Variance for Multimodal Distributions" (Engquist et al., 2024)
- "Sample Reuse via Importance Sampling in Information Geometric Optimization" (Shirakawa et al., 2018)
- "Adaptive Sampling for Discovery" (Xu et al., 2022)
- "Geometric adaptive Monte Carlo in random environment" (Papamarkou et al., 2016)
- "Gradient-based Adaptive Importance Samplers" (Elvira et al., 2022)