- The paper introduces a novel gradient-informed grid technique that significantly reduces computational cost for Bayesian inference in MRFs.
- It employs adaptive step-sizing and Hermite interpolation to accurately approximate intractable normalizing constants.
- Experimental results on hidden Potts and autologistic models demonstrate orders-of-magnitude runtime improvements over traditional methods.
Introduction and Motivation
Bayesian inference in Markov random fields (MRFs) is severely impeded by doubly intractable likelihoods, where the normalizing constants depend on model parameters and are infeasible to compute explicitly for realistic data sizes. The primary inferential challenge is the need to either perform computationally prohibitive perfect sampling (as in the exchange algorithm) or to resort to device-specific approximations that can introduce bias. Existing approaches (e.g., approximate exchange algorithm, path sampling via thermodynamic integration, and surrogate-based methods such as PFAB and warped GP surrogates) each strike distinct trade-offs between computational complexity, generality, and inferential accuracy, but none fully resolve the issue of scalable, amortized inference for general MRFs over higher-dimensional parameter spaces.
This paper proposes a gradient-informed, grid-based approach for amortized MCMC inference in MRFs. The core innovation is the efficient placement of grid points using gradient and curvature (Hessian) information, together with Hermite interpolation, to yield smooth, accurate surrogates for the posterior, drastically reducing computational costs while maintaining high fidelity to the posterior as estimated by exact methods like the exchange algorithm.
Methodological Contributions
The framework focuses on MRFs with exponential family structure, including the hidden Potts and autologistic models. The model likelihood is
p(z∣β)=C(β)exp(β⊤S(z)),
where S(z) are sufficient statistics and C(β) is the intractable normalizing constant. Direct evaluation of the likelihood for MCMC is thus infeasible.
Existing Approaches and Limitations
- Exchange Algorithm: Requires perfect sampling from the intractable model at every MCMC step, yielding unbiased posteriors but with untenable runtime for realistic instances.
- Approximate Exchange Algorithm (AEA): Substitutes with a swept Markov chain (e.g., Gibbs), reducing cost but introducing bias.
- Path Sampling/Thermodynamic Integration: Precomputes sufficient statistics on a grid, using interpolation to approximate the path sampling identity. Quality depends on grid coverage and interpolation fidelity.
- Surrogate-Based Methods (PFAB, Warped GP): Use parametric or GP models to approximate the log normalizing constant or sufficient statistics, but generalization to higher-dimensional parameter spaces and dependence on specific assumptions (e.g., independence of sufficient statistics) limit their applicability.
Proposed Framework
The proposed method introduces several technical novelties:
- Gradient-Informed Grid Construction: Grid points are not selected uniformly; instead, directions and step sizes are determined using local gradient and Hessian information of the expectation map Ez∣β[S(z)] at an initialization near the posterior mode (obtained via stochastic approximation).
- Adaptive Step Sizing: Step sizes in grid directions decrease with the gradient norm (controlled by a κ parameter), ensuring denser coverage near high-posterior-density regions, while keeping the grid sparse elsewhere.
- Hermite Interpolation: Hermite coordinate interpolation (using both function values and gradients) is employed to approximate Ez∣β[S(z)]. This ensures smoothness and high accuracy, particularly beneficial for nonlinear mappings.
The combination of these elements yields a surrogate for the normalizing constant across the parameter space that can be used for evaluating the Metropolis-Hastings acceptance ratio, amortizing expensive simulation effort.



Figure 1: Linear interpolation of an equidistant grid for approximating the expectation map in the hidden Potts model.
Experimental Results
Simulation Study: Hidden Potts Model
Simulation experiments were conducted on a 100×100 hidden Potts model with six labels. Coverage and accuracy of the interpolated sufficient statistics and the resulting posteriors were assessed as a function of grid size and type. Hermite interpolation on gradient-based grids yielded the lowest root mean square error, outperforming both linear interpolation and equidistant grids, particularly when the number of grid points is small. This demonstrates efficiency in adapting to the geometry of the parameter space.

Figure 2: Equidistant grid with 169 points compared to gradient-based grid with dramatically fewer points for the two-parameter autologistic model.
Figure 3: Log-log plot of RMSE for different grid sizes in the hidden Potts model, demonstrating rapid error decay as grid density increases with gradient-guided placement.
Table 1 (in the original paper) quantifies RMSE reduction for Hermite (versus linear) interpolation and for gradient-based (versus equidistant) grid construction. Posterior mean and KL-divergence to AEA runs further confirm the benefits: with only 10 grid points, Hermite interpolation with a gradient-based grid achieves negligible KL divergence to AEA.
Application: Lake Menteith Satellite Data
Posterior inference for the inverse temperature parameter in the Potts model for the Lake Menteith satellite image (100×100, k=6) was conducted, comparing AEA, traditional thermodynamic integration (TI), equidistant-grid-based interpolation, and the proposed gradient-based Hermite grid. Posterior estimates using the gradient-based Hermite grid were indistinguishable from those of the AEA, whereas equidistant/linear grid approaches and classical TI exhibited significant bias or KL divergence. Notably, amortized inference with grid precomputation reduced runtime by orders of magnitude (6.17 hours for AEA vs. 0.03 hours for the proposed approach).

Figure 4: Lake Menteith data—benchmark dataset for hidden Potts model experiments.
Application: Autologistic (Ice Floe) Model
A spatial binarized satellite image (40×40) was analyzed using the autologistic model. Grid-based inference was compared with the AEA and the approach by Boland et al. The gradient-based and equidistant grids yielded posteriors nearly identical to AEA, as visualized by volcano plots and marginal posterior sections. Computation times show dramatic reductions: from hours (AEA) to seconds (proposed method), highlighting the method’s scalability and practical value.
Figure 5: Volcano plot of the AEA posterior estimate, with contour overlays for posteriors obtained with equidistant and gradient-based grids in the autologistic example.
Figure 6: Ice floe data used for benchmarking autologistic posterior inference and marginal posterior recovery with multiple grid strategies.
Implications and Future Directions
This research advances amortized Bayesian inference for MRFs by providing a generic, model-agnostic mechanism for constructing sparse grids that focus computational effort where it matters for posterior accuracy. The gradient-informed approach is not tailored to specific exponential family models, making it applicable to a wide class of MRFs, including Ising models for public opinion survey analysis or exponential random graph models.
Practical implications include the ability to perform accurate Bayesian inference for spatial or network models with discrete variables and intractable normalizing constants, scaling to higher-dimensional parameter spaces that would be infeasible with existing MCMC-based approaches. Theoretical implications concern the convergence and bias control obtained by grid adaptation and Hermite interpolation—future research should systematize the trade-offs between grid sparsity, computational resources, and posterior error in much higher-dimensional settings.
Avenues for extending this work include (1) leveraging automated selection of grid initialization points using closed-form biased posterior estimators, (2) combining with sparse grid or adaptive mesh refinement routines, or (3) generalizing grid construction to non-Euclidean parameter spaces or latent graphical structures.
Conclusion
Efficient amortized Bayesian inference in MRFs with intractable normalizing constants can be achieved by gradient-informed grid selection combined with Hermite interpolation. The resulting approach achieves comparable accuracy to computationally demanding exact methods such as AEA, but at a fraction of the computational cost, and generalizes robustly to a wide variety of models in spatial statistics, network analysis, and beyond.