Papers
Topics
Authors
Recent
Search
2000 character limit reached

Information-Geometric Adaptive Sampling

Updated 5 May 2026
  • Information-geometric adaptive sampling is a set of algorithms that use the geometry of probability distributions to enable efficient exploration of complex targets.
  • It leverages metrics such as the Fisher–Rao metric and KL divergence to inform proposal adjustments in methods like MCMC, Langevin dynamics, and importance sampling.
  • This adaptive framework enhances acceptance rates, accelerates convergence, and improves robustness when sampling high-dimensional or multimodal distributions.

Information-geometric adaptive sampling refers to a class of algorithms that exploit the geometry of probability distributions—specifically, information-theoretic or Riemannian structures—to enable adaptive, efficient exploration of complex or high-dimensional target distributions. These methods systematically adjust proposal distributions, drift/diffusion terms, or sampling schedules in response to information-geometric quantities such as the Kullback–Leibler (KL) divergence, Fisher–Rao metric, or weighted Wasserstein distances. The resulting framework unifies diverse approaches—ranging from Markov chain Monte Carlo (MCMC) to adaptive Langevin dynamics, importance sampling, and discovery-driven decision design—under a shared principle of geometric adaptation for accelerated, robust sampling.

1. Foundations in Information Geometry

Information geometry studies the structure of statistical manifolds where each point corresponds to a parametric probability distribution. Two pillars are central:

  • Fisher–Rao Metric: Embeds the parametric family with a Riemannian metric defined by the Fisher information matrix, enabling local measurements of statistical distance:

gij(θ)=logp(xθ)θilogp(xθ)θjp(xθ)dxg_{ij}(\theta) = \int \frac{\partial \log p(x|\theta)}{\partial \theta^i} \frac{\partial \log p(x|\theta)}{\partial \theta^j} p(x|\theta) dx

This metric underpins natural-gradient updates and allows for geodesic flows that adapt to the intrinsic curvature of the target.

  • KL Divergence and I-Projections: KL divergence acts as a measure of distance between probability distributions, with the forward KL (I-projection) focusing proposals on local modes and geometric features of the target. Minimizing KL divergence, or measuring step sizes in terms of KL, yields movement along natural geodesics of the manifold (Barp et al., 2022, Dharamshi et al., 2021).

These geometric elements unify the design of samplers (e.g., information-geometric MCMC, natural-gradient-based optimization, Riemannian Langevin processes) and inform the construction of adaptive, information-efficient sampling strategies.

2. Divergence Minimization and Regional Adaptation in MCMC

A canonical example of information-geometric adaptive sampling is divergence-minimizing MCMC (Dharamshi et al., 2021). Here, at each state xx, the algorithm optimizes the proposal distribution q(x)q(\cdot|x) within a parametric family (e.g., Gaussian covariances) to locally minimize

DKL(qp)=q(y)log[q(y)p(y)]dy,D_{\text{KL}}(q \| p) = \int q(y) \log\left[\frac{q(y)}{p(y)}\right]dy,

while simultaneously accounting for the expected Metropolis acceptance rate. This results in a score

s(x)=exp[βDKL(qp)]α(x,y)q(yx)dy,s(x) = \exp[-\beta D_{\text{KL}}(q\|p)] \cdot \int \alpha(x, y) q(y|x) dy,

with α(x,y)=min{1,p(y)/p(x)}\alpha(x,y) = \min\{1, p(y)/p(x)\} and a user-controlled trade-off parameter β\beta.

Following a stochastic gradient step in the Cholesky factor of the proposal covariance at each iteration,

Lt+1=Lt+γLJ(x),L_{t+1} = L_t + \gamma \nabla_L \mathcal{J}(x),

the algorithm achieves rapid, "regional" adaptation: every visited state has its own optimally tuned proposal, rather than a global estimate. This regionality enables accurate tracking of complex or non-Gaussian geometries, yielding high acceptance rates (e.g., 70% vs. 8% for adaptive random-walk proposals on "banana" distributions) and efficient multimodal exploration, especially when embedded within tempering or scout-based frameworks.

The information-geometric rationale is that the surrogate J(x)\mathcal{J}(x) performs a local natural-gradient ascent in the space of Gaussian proposals, following the Fisher metric induced by the statistical manifold.

3. Weighted-Wasserstein Adaptive Diffusions

For continuous-time sampling, information-geometric adaptive sampling manifests as state-dependent diffusions governed by weighted-Wasserstein gradient flows of the KL divergence (Engquist et al., 2024). The family of SDEs,

dXt=H(Xt)dt+2D(Xt)dWt,dX_t = -\nabla H(X_t)dt + \sqrt{2 D(X_t)}\,dW_t,

with xx0, and appropriate xx1, interpolates between standard overdamped Langevin dynamics and a derivative-free regime. The geometry is specified by the weighted-Wasserstein metric xx2, where xx3 controls the local "length" of steps in distribution space.

Crucially, by choosing xx4, one obtains a pure diffusion with zero drift and state-dependent variance. This adaptive-variance sampling accelerates mixing in multimodal or non-log-concave settings: mean exit times between local minima transition from exponential xx5 (for classical Langevin) to algebraic xx6. Theoretical analysis proves uniform exponential convergence in both KL and xx7 divergence under mild functional inequalities, delineating precisely how the adaptive geometry "flattens" energy barriers and ameliorates the curse of nonconvexity.

This geometric view generalizes directly to high-dimensional or non-Euclidean settings by endowing the domain with an adaptive, potentially anisotropic metric tensor xx8.

4. Adaptive Importance Sampling and Optimization

Information-geometric principles also underpin adaptive importance sampling and black-box optimization. In gradient-based adaptive importance samplers such as GRAMIS (Elvira et al., 2022), proposals are maintained as a mixture of parametrized distributions (e.g., Gaussians). Each proposal's mean is updated using a natural-gradient–style step on the log target, preconditioned by the local covariance (Laplace/Fisher approximation), and augmented with a repulsion term to foster coverage of distinct modes:

xx9

The covariance is reset to the local negative inverse Hessian when possible, further matching the local geometry. Empirical results indicate that this natural-gradient adaptation with repulsion both accelerates convergence and outperforms traditional adaptive mixture samplers on high-dimensional, multimodal, or non-Gaussian targets.

In the context of information geometric optimization (IGO), adaptive sampling is leveraged through reusing past samples via importance sampling (without biasing the natural gradient). This is achieved by forming a mixture proposal across prior search distributions, lowering Monte Carlo variance, and directly preserving the Fisher–Rao structure of the manifold (Shirakawa et al., 2018).

5. Adaptive Schedulers and Complexity-Efficient Sampling

Balancing geometric adaptation with computational cost is critical, particularly in high-dimensional settings. Geometric adaptive Monte Carlo (GAMC) (Papamarkou et al., 2016) addresses this by randomly alternating between expensive, geometry-aware proposals (e.g., manifold Langevin) and cost-effective adaptive proposals (adaptive Metropolis). The switch schedule is governed by a decaying probability,

q(x)q(\cdot|x)0

ensuring frequent exploitation of local geometry in the transient phase, but transitioning to an asymptotically adaptive regime for long-run efficiency. This strategy optimizes the effective sample size per unit computational time, interpolating gracefully between the extremes of full geometric versus fully adaptive proposals. The transition kernel and empirical covariance updates preserve ergodicity and allow practical tuning of the "geometry versus cost" trade-off.

6. Information-Geometric Adaptive Time-Stepping

In the context of diffusion models for generative tasks such as graph and molecule generation, information-geometric adaptive sampling realizes adaptive time-stepping by enforcing constant informational speed along the sampling trajectory (Lu et al., 30 Apr 2026). The evolution is parametrized as a curve q(x)q(\cdot|x)1 on a statistical manifold endowed with the Fisher–Rao metric, with the key quantity being the Drift Variation Score (DVS):

q(x)q(\cdot|x)2

where q(x)q(\cdot|x)3 measures the instantaneous change in the drift field. Step sizes are chosen so that each discretization covers equal Fisher–Rao arc-length:

q(x)q(\cdot|x)4

This approach dynamically refines the time grid in regions of high geometric stiffness or curvature (high DVS), while coarsening in flat regions. Experimental results demonstrate substantial gains in sample quality and efficiency compared to fixed or heuristic step-size schedules, confirming the operational relevance of information-geometric time adaptation in practical machine learning tasks.

7. Information-Directed Sampling and Active Inference Connections

Information-geometric adaptive sampling extends beyond classical MCMC or importance sampling to sequential experimental design and discovery. In adaptive discovery frameworks using Information-Directed Sampling (IDS) (Xu et al., 2022), the sampling policy at each step minimizes an information-ratio objective:

q(x)q(\cdot|x)5

balancing immediate expected regret with expected information gain, the latter quantified as mutual information about optimal actions (a KL divergence on posterior distributions). This design ensures that each sampling action moves efficiently in the posterior manifold, minimizing a Bregman divergence per unit cumulative loss. The approach inherits information-geometric properties by implicitly adapting to the local Fisher-information in the Bayesian posterior, and yields near-optimal regret rates in structured models such as linear, graph-structured, and low-rank reward domains.

References

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Information-Geometric Adaptive Sampling.