2000 character limit reached
Differentiating Metropolis-Hastings to Optimize Intractable Densities (2306.07961v3)
Published 13 Jun 2023 in stat.ML, cs.LG, stat.CO, and stat.ME
Abstract: We develop an algorithm for automatic differentiation of Metropolis-Hastings samplers, allowing us to differentiate through probabilistic inference, even if the model has discrete components within it. Our approach fuses recent advances in stochastic automatic differentiation with traditional Markov chain coupling schemes, providing an unbiased and low-variance gradient estimator. This allows us to apply gradient-based optimization to objectives expressed as expectations over intractable target densities. We demonstrate our approach by finding an ambiguous observation in a Gaussian mixture model and by maximizing the specific heat in an Ising model.
- Replacing neural networks by optimal analytical predictors for the detection of phase transitions. Phys. Rev. X, 12:031044, Sep 2022. doi: 10.1103/PhysRevX.12.031044. URL https://link.aps.org/doi/10.1103/PhysRevX.12.031044.
- Automatic differentiation of programs with discrete randomness. Advances in Neural Information Processing Systems, 35:10435–10447, 2022.
- A gradient based strategy for Hamiltonian Monte Carlo hyperparameter optimization. In International Conference on Machine Learning, pp. 1238–1248. PMLR, 2021.
- Designing perceptual puzzles by differentiating probabilistic programs. In ACM SIGGRAPH 2022 Conference Proceedings, pp. 1–9, 2022.
- Acting as inverse inverse planning. In Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings (SIGGRAPH ’23 Conference Proceedings), August 2023. doi: 10.1145/3588432.3591510.
- Score-based diffusion meets annealed importance sampling. Advances in Neural Information Processing Systems, 35:21482–21494, 2022.
- Differentiable samplers for deep latent variable models. Philosophical Transactions of the Royal Society A, 381(2247):20220147, 2023.
- Improved contrastive divergence training of energy based models. arXiv preprint arXiv:2012.01316, 2020.
- Conditional Monte Carlo gradient estimation. Conditional Monte Carlo: Gradient Estimation and Optimization Applications, 1997.
- Some guidelines and guarantees for common random numbers. Management Science, 38(6):884–908, 1992.
- Measure-valued differentiation for Markov chains. Journal of Optimization Theory and Applications, 136(2):187–209, 2008.
- Optimization and sensitivity analysis of computer simulation models by the score function method. European Journal of Operational Research, 88(3):413–427, 1996.
- Storchastic: A framework for general stochastic automatic differentiation. Advances in Neural Information Processing Systems, 34:7574–7587, 2021.
- ADEV: Sound automatic differentiation of expected values of probabilistic programs. Proceedings of the ACM on Programming Languages, 7(POPL):121–153, 2023.
- Ergodic prorerty of n-dimensional recurrent Markov processes. Memoirs of the Faculty of Science, Kyushu University. Series A, Mathematics, 13(2):157–172, 1959.
- Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures and Algorithms, 9(1):223–252, 1996.
- Seyer, R. Differentiable Monte Carlo samplers with piecewise deterministic Markov processes. Master’s thesis, Chalmers University of Technology & University of Gothenburg, 2023.
- Monte carlo variational auto-encoders. In International Conference on Machine Learning, pp. 10247–10257. PMLR, 2021.
- Vaserstein, L. N. Markov processes over denumerable products of spaces, describing large systems of automata. Problemy Peredachi Informatsii, 5(3):64–72, 1969.
- Maximal couplings of the Metropolis-Hastings algorithm. In International Conference on Artificial Intelligence and Statistics, pp. 1225–1233. PMLR, 2021.
- Differentiable annealed importance sampling and the perils of gradient noise. Advances in Neural Information Processing Systems, 34:19398–19410, 2021.
- Reasoning about “reasoning about reasoning”: semantics and contextual equivalence for probabilistic programs with nested queries and recursion. Proceedings of the ACM on Programming Languages, 6(POPL):1–28, 2022.
- Slice sampling reparameterization gradients. Advances in Neural Information Processing Systems, 34:23532–23544, 2021.