Anchored Posterior Sampling (APS)

Updated 22 May 2026

Anchored Posterior Sampling (APS) is a class of Bayesian methodologies that anchors prior or pseudo-prior samples to construct efficient, weighted posterior approximations.
APS incorporates diverse paradigms such as importance sampling, stochastic optimization, and neural ensemble methods to tackle challenges like slow mixing and tuning in MCMC.
Practical implementations, including LIPS, RML, and Anchor-TS, demonstrate APS's versatility in applications from high-dimensional inverse problems to safe bandit learning under distribution shifts.

Anchored Posterior Sampling (APS) is a class of methodologies for Bayesian posterior inference that replaces traditional Markov chain Monte Carlo (MCMC) with strategies where samples are “anchored” to draws from the prior or pseudo-priors, and then either reweighted or coupled to optimization. APS encompasses multiple algorithmic paradigms spanning importance sampling, stochastic optimization, discrete generative modeling, neural network posterior approximation, empirical Bayes inversion, and bandit algorithms. These approaches exploit the tractable structure of prior samples while circumventing challenges inherent to MCMC, such as difficult tuning, slow mixing, and limited parallelizability.

1. Foundational Algorithms and Theoretical Guarantees

A foundational APS scheme is the Likelihood-Importance Posterior Sampling (LIPS) approach, which samples $N$ i.i.d. parameter vectors $\theta_i \sim \pi(\theta)$ from the prior and computes their unnormalized likelihoods $\ell_i = f(\theta_i)$ . The approximation to the posterior $\Pi$ is then a discrete measure

$\Pi_N = \sum_{i=1}^N \hat{w}_i \, \delta_{\theta_i}$

where the normalized weights are $\hat{w}_i = \ell_i / \sum_j \ell_j$ . This yields a weighted empirical posterior, and as $N \to \infty$ , the empirical measure converges in distribution and uniformly over Glivenko–Cantelli classes of test functions to the true posterior $\Pi$ (Shalizi, 2022). The approach is simple (one line of code per step), trivially parallelizable, requires no tuning, burn-in, or convergence diagnostics, and permits both weighted or unweighted posterior approximations via breeding (discrete replication) or stratified resampling.

Finite- $N$ error analysis yields $O(1/\sqrt{N})$ convergence in metrics of interest, and the asymptotic variance of $\theta_i \sim \pi(\theta)$ 0 for any measurable set $\theta_i \sim \pi(\theta)$ 1 is controlled by the variability of $\theta_i \sim \pi(\theta)$ 2 under the prior. If the prior and posterior are poorly matched (most probability mass for $\theta_i \sim \pi(\theta)$ 3 falls where $\theta_i \sim \pi(\theta)$ 4), efficiency degrades. Extensions—such as population Monte Carlo, particle filters, and “go with the winners” schemes—address these inefficiencies by adapting proposals or early-pruning low-weight samples (Shalizi, 2022).

2. Optimization-based APS: Randomized Maximum Likelihood and Neural Ensembles

Randomized maximum likelihood (RML), also known as APS in high-dimensional inverse problems, generates posterior samples by randomly anchoring optimization problems. Draws from the prior ( $\theta_i \sim \pi(\theta)$ 5) and the noise-perturbed data ( $\theta_i \sim \pi(\theta)$ 6) define a stochastic objective

$\theta_i \sim \pi(\theta)$ 7

where $\theta_i \sim \pi(\theta)$ 8 is a possibly nonlinear forward map (Ba et al., 2021). The minimizer $\theta_i \sim \pi(\theta)$ 9 serves as a posterior proposal. For linear $\ell_i = f(\theta_i)$ 0, these samples are exact from $\ell_i = f(\theta_i)$ 1; for nonlinear $\ell_i = f(\theta_i)$ 2, the induced proposal law must be corrected by importance weights involving the Jacobian determinant of the critical-point map, typically approximated via a low-rank Gauss–Newton expansion.

This methodology is scalable to high-dimensional and multimodal posteriors, provided efficient optimization (e.g., Newton-CG) and adjoint computations for $\ell_i = f(\theta_i)$ 3 are feasible. Performance is optimal when all critical points and accurate weights are used, and substantial empirical success has been reported in geoscientific inverse problems and nonlinear toy models. The main limitations are computational expense in multimodal or nonlinear regimes (many local minima with low weight) and the need to approximate high-dimensional determinants and Hessians.

Anchored ensembles and Sequential Anchored Ensembles (SAE) extend the APS paradigm to neural network Bayesian inference. Each ensemble member is trained with a distinct “anchor” $\ell_i = f(\theta_i)$ 4 drawn from the prior, modifying the loss to favor parameters near $\ell_i = f(\theta_i)$ 5:

$\ell_i = f(\theta_i)$ 6

(Delaunoy et al., 2021). SAE exploits the autocorrelation between sequential anchors to warm-start optimization and significantly increase the number of usable posterior samples under a fixed computational budget, maintaining the ergodic distribution of anchors while accelerating convergence.

3. APS in Structured and Discrete Generative Models

Anchored Posterior Sampling has been adapted to discrete diffusion generative models for inverse problems involving categorical data (e.g., image reconstruction, inpainting) (Rout et al., 2 Oct 2025). Here, APS incorporates two innovations: quantized expectation guidance and anchored remasking. Quantized expectation constructs a differentiable surrogate of the posterior likelihood score in discrete token embedding space, leveraging straight-through estimators and lookup-free quantizers. Anchored remasking adaptively selects “anchor” tokens with high posterior confidence to remain fixed while re-masking uncertain tokens, balancing exploration and exploitation across reverse diffusion steps.

The APS procedure iteratively updates token logits via inner-loop Adam optimization, quantizes the expected embeddings, and propagates measurement/perceptual loss gradients. Anchoring is dynamic: tokens are fixed based on confidence exceeding a stepwise threshold (cosine schedule). The approach achieves state-of-the-art metrics on classical inverse tasks versus both discrete and continuous diffusion baselines, with marked gains in LPIPS and PSNR, and scales effectively to high resolutions in few steps. Notably, APS is entirely training-free—no finetuning of the diffusion model is required.

Limitations are primarily determined by the quality of discrete tokenization and prior model (e.g., VQ codebook), with possible degradation under extreme distribution shifts or highly nonlinear forward operators.

4. APS for Empirical Bayes Inverse Problems via Anchor Parameterization

In the context of Gaussian process-driven random fields and ill-posed inverse problems, APS introduces the anchor parameterization to provide a tractable posterior over lower-dimensional summary statistics (“anchors”) rather than the intractable full state space (Zhang, 2011). The field $\ell_i = f(\theta_i)$ 7 (potentially in thousands of dimensions) is parameterized as the output of a Gaussian process with structural hyperparameters $\ell_i = f(\theta_i)$ 8 and anchor variables $\ell_i = f(\theta_i)$ 9, where $\Pi$ 0 is typically a low-rank projection or selection matrix.

APS distinguishes between type-A (measured, linear) and type-B (observed, nonlinear) data, informing anchors directly or via the outcome of a forward model $\Pi$ 1. The posterior over $\Pi$ 2 is constructed via marginalization and, crucially, approximated by a kernel-Gaussian mixture obtained from samples under the type-A likelihood, combined with conditional Gaussian field realizations and forward evaluations for the type-B targets. Analytic conditioning of the fitted mixture components on type-B data delivers the high-dimensional posterior approximation. This two-stage sampling-plus-conditioning stratagem offers massive dimensionality reduction and computational saving over state-space or pilot point inversion: for fixed anchors, field draws are trivial; for each posterior sample, only a single forward simulation is needed.

The accuracy and convergence can be assessed empirically via marginal posteriors and comparison of kernel-mixture fits. The approach is sensitive to anchor selection; anchor dimension must remain modest, and high-dimensional fits require careful bandwidth/covariance choices.

5. APS in Sequential Decision Under Distribution Shift

Anchored Posterior Sampling is instantiated as Sample-Mean Anchored Thompson Sampling (Anchor-TS) for safe offline-to-online bandit learning under distribution shift (Li et al., 11 May 2026). Here, offline logs with known or bounded bias supplement online samples. Anchor-TS constructs the index for each arm as the median of (i) the online sample mean, (ii) an online posterior sample, and (iii) a hybrid posterior sample (shifted to account for the worst-case bias):

$\Pi$ 3

where $\Pi$ 4 and $\Pi$ 5 are drawn from conjugate posteriors, the latter incorporating the offline data and a bias shift.

Anchor-TS achieves strong theoretical regret guarantees. If the offline data are unbiased, cumulative regret is reduced proportionally to the amount of offline data, and when bias is present, regret matches that of pure online Thompson sampling up to constants. Unlike UCB-type approaches, the median anchoring ensures no harm from offline data and confers additional benefit for the arm with maximal offline support. Empirically, Anchor-TS outperforms both pure online and other hybrid TS/UCB schemes, especially when offline data covers the optimal arm.

6. Comparative Table of Major APS Methodologies

APS Variant	Core Mechanism	Notable Applications
LIPS (Shalizi) (Shalizi, 2022)	Weighted prior samples by likelihood	General Bayesian inference, diagnostics
RML (APS) (Ba et al., 2021)	Optimization over random anchor pairs	High-d equivariant inverse problems
SAE (Delaunoy et al., 2021)	Neural optimization w/ prior anchor loss	Neural net posteriors, ensembling
Discrete Diffusion APS (Rout et al., 2 Oct 2025)	Quantized guidance + anchored remasking	Bayesian inverse with diffusion models
Anchor-TS (Li et al., 11 May 2026)	Median of online, hybrid, mean indices	Bandits under offline-to-online shift
Anchor GP inversion (Zhang, 2011)	Anchor-param. + Gaussian conditioning	Random field inversion, geostatistics

Each approach targets a distinct inference or decision setting, but all share the archetype: anchor to prior or synthetic samples, then reweight, push-forward, or condition to approximate the true Bayesian posterior.

7. Advantages, Limitations, and Research Directions

APS methods offer conceptual and computational advantages, including zero-tuning operation, embarrassingly parallel implementation, clear convergence properties (in both the classic sense and for specific structured approximations), and flexible applicability to high-dimensional and nonlinear models. Their performance hinges on alignment between the prior and the true posterior; strong prior-posterior mismatch, high variance of the likelihood, or large anchor dimension reduces sample efficiency. Remedies include adaptive proposals, population Monte Carlo, reweighting, and hybridization with MCMC (“particle MCMC”).

Limitations specific to each variant—such as the need to evaluate Jacobian determinants (RML), quality of codebook for diffusion models, or anchor design in inversion—are the focus of ongoing research (Shalizi, 2022, Rout et al., 2 Oct 2025, Zhang, 2011). The APS paradigm continues to expand, encompassing extensions to video, multimodal editing, improved tokenization, and novel hybridization under distribution shift.

Anchored Posterior Sampling provides a versatile, robustly justified methodology for Bayesian inference in regimes where classic MCMC and filtering are computationally prohibitive or poorly suited to model structure or hardware constraints.