Differentiating Metropolis-Hastings to Optimize Intractable Densities (2306.07961v3)

Published 13 Jun 2023 in stat.ML, cs.LG, stat.CO, and stat.ME

Abstract: We develop an algorithm for automatic differentiation of Metropolis-Hastings samplers, allowing us to differentiate through probabilistic inference, even if the model has discrete components within it. Our approach fuses recent advances in stochastic automatic differentiation with traditional Markov chain coupling schemes, providing an unbiased and low-variance gradient estimator. This allows us to apply gradient-based optimization to objectives expressed as expectations over intractable target densities. We demonstrate our approach by finding an ambiguous observation in a Gaussian mixture model and by maximizing the specific heat in an Ising model.

References (23)

Citations (3)

View on Semantic Scholar

Summary

The paper presents a novel unbiased gradient estimator for Metropolis-Hastings samplers, enabling differentiation through discrete accept/reject steps.
It leverages Monte Carlo coupling schemes to control variance and maintain an O(1) computational overhead across both discrete and continuous distributions.
Empirical results on Gaussian mixture and Ising models validate its potential to enhance probabilistic model training and optimization.

Differentiating Metropolis-Hastings to Optimize Intractable Densities

This paper presents a novel approach to differentiating through the Metropolis-Hastings (MH) algorithm, a staple in probabilistic inference, particularly when dealing with probability distributions that possess intractable normalizing constants. The work leverages recent advancements in stochastic automatic differentiation to overcome the traditional barriers posed by the discrete accept/reject steps inherent in MH samplers, thereby providing a mechanism to apply gradient-based optimization to objectives expressed as expectations over intractable target densities.

Methodological Contributions

The authors propose an unbiased gradient estimator for MH samplers, challenging the common assumption that MH is inherently non-differentiable due to its discontinuous nature. Their method couples two MH chains with perturbed targets, utilizing a stochastic derivative-based estimation approach. The key contributions of their work are:

Unbiased Algorithm with Low Computational Overhead: The algorithm provides an unbiased estimate of the gradient of expectations over the target density, while incurring only a $\mathcal{O}(1)$ multiplicative computational overhead. This is achieved through a clever application of smoothed perturbation analysis and stochastic automatic differentiation, making it applicable to both discrete and continuous target distributions.
Use of Monte Carlo Coupling Schemes: The incorporation of coupling schemes to efficiently and effectively control variance in the estimation of gradients. These schemes allow the authors to propose an efficient low-variance single-chain MH gradient estimator.
Empirical Validation and Applications: The authors demonstrate the utility of their approach through optimization problems involving Gaussian mixture models and Ising models. For example, they identify scenarios with ambiguous observations in Gaussian mixtures and maximize the specific heat in Ising models, illustrating the practical implications of their method.

Implications and Observations

This work significantly impacts the landscape of probabilistic modeling and inference. By differentiating the MH algorithm, the authors unlock the potential for gradient-based optimization in settings previously limited by computational barriers. This advancement permits enhanced fine-tuning of model parameters directly through stochastic sampling procedures, a feature previously underutilized in models with discrete components.

Enhanced Model Training: This method introduces possibilities for improving training of probabilistic models by optimizing hyperparameters and model structures directly, without the need for surrogate or approximate models.
Improved Estimation for Scientific Modeling: In scientific domains such as physics, biology, and cognitive science, the ability to differentiate through MH samplers can refine computational models of phenomena, leading to more nuanced and accurate predictions.

Future Directions

Future work could explore the integration of reverse-mode automatic differentiation, facilitating applications in high-dimensional parameter spaces typical in deep learning contexts. An engaging avenue would be the application of this approach to energy-based models and nested models, which often present intricate inference landscapes. Furthermore, there is potential for unbiased differentiation of samplers combining discrete and continuous dynamics, expanding the applicability of the method to a wider array of probabilistic models. Extending this work to support broader optimization objectives, such as derivative-based hyperparameter tuning or enhancing the autocorrelation properties of MH chains, could also yield significant advancements.

Overall, this paper delivers key insights into overcoming the historically perceived limitations in differentiating through sampling-based inference algorithms, furnishing new methodologies and understanding in automatic stochastic differentiation and probabilistic optimization.

PDF Markdown

Related Papers

Tweets

https://twitter.com/ChrisRackauckas/status/1779522881317150860