Delta-AI: Local objectives for amortized inference in sparse graphical models (2310.02423v2)
Abstract: We present a new algorithm for amortized inference in sparse probabilistic graphical models (PGMs), which we call $\Delta$-amortized inference ($\Delta$-AI). Our approach is based on the observation that when the sampling of variables in a PGM is seen as a sequence of actions taken by an agent, sparsity of the PGM enables local credit assignment in the agent's policy learning objective. This yields a local constraint that can be turned into a local loss in the style of generative flow networks (GFlowNets) that enables off-policy training but avoids the need to instantiate all the random variables for each parameter update, thus speeding up training considerably. The $\Delta$-AI objective matches the conditional distribution of a variable given its Markov blanket in a tractable learned sampler, which has the structure of a Bayesian network, with the same conditional distribution under the target PGM. As such, the trained sampler recovers marginals and conditional distributions of interest and enables inference of partial subsets of variables. We illustrate $\Delta$-AI's effectiveness for sampling from synthetic PGMs and training latent variable models with sparse factor structure.
- Distributions most nearly compatible with given families ofconditional distributions. Test, 7(2):377--390, 1998.
- Matthew J. Beal. Variational algorithms for approximate Bayesian inference, 2003. URL https://cse.buffalo.edu/faculty/mbeal/papers/beal03.pdf.
- Flow network based generative models for non-iterative diversecandidate generation. Neural Information Processing Systems (NeurIPS), 2021.
- GFlowNet foundations. Journal of Machine Learning Research, (24):--76, 2023.
- Julian Besag. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B,36(2):192--236, 1974.
- Reweighted wake-sleep. International Conference on Learning Representations (ICLR),2015.
- Importance weighted autoencoders. International Conference on Learning Representations (ICLR),2016.
- Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39(1):1--38, 1977.
- Li Deng. The MNIST database of handwritten digit images for machine learningresearch. IEEE Signal Processing Magazine, 29(6):--142, 2012.
- Improved contrastive divergence training of energy based models. International Conference on Machine Learning (ICML), 2021.
- Amortized inference in probabilistic reasoning. Cognitive Science, 36, 2014.
- Oops i took a gradient: Scalable sampling for discrete distributions. In International Conference on Machine Learning, pp.\3831--3841. PMLR, 2021.
- Markov random fields on finite graphs and lattices. 1971.
- Learning to learn generative programs with memoised wake-sleep. Uncertainty in Artificial Intelligence (UAI), 2020.
- Geoffrey E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14:1771--1800, 2002.
- The ‘‘wake-sleep’’ algorithm for unsupervised neural networks. Science, 268 5214:1158--61, 1995.
- Denoising diffusion probabilistic models. Neural Information Processing Systems (NeurIPS), 2020.
- GFlowNet-EM for learning compositional latent variable models. International Conference on Machine Learning (ICML), 2023.
- An introduction to variational methods for graphical models. Machine Learning, 37:183--233, 2004.
- Auto-encoding variational Bayes. International Conference on Learning Representations (ICLR),2014.
- Probabilistic graphical models: principles and techniques. MIT press, 2009.
- Revisiting reweighted wake-sleep for models with stochastic controlflow. Neural Information Processing Systems (NeurIPS),2019a.
- Revisiting reweighted wake-sleep for models with stochastic controlflow. Uncertainty in Artificial Intelligence (UAI),2019b.
- Hybrid memoised wake-sleep: Approximate inference at thediscrete-continuous interface. International Conference on Learning Representations (ICLR),2022.
- GEN: Pushing the limits of softmax-based out-of-distributiondetection. Computer Vision and Pattern Recognition (CVPR), 2023.
- Learning GFlowNets from partial episodes for improved convergenceand stability. International Conference on Machine Learning (ICML), 2023.
- Trajectory balance: Improved credit assignment in GFlowNets. Neural Information Processing Systems (NeurIPS), 2022.
- GFlowNets and variational inference. International Conference on Learning Representations (ICLR),2023.
- Concrete score matching: Generalized score matching for discretedata. Neural Information Processing Systems (NeurIPS), 2022.
- A view of the EM algorithm that justifies incremental, sparse, andother variants. In Learning in graphical models, pp. 355--368. Springer,1998.
- Better training of GFlowNets with local credit and incompletetrajectories. International Conference on Machine Learning (ICML), 2023.
- Tighter variational bounds are not necessarily better. International Conference on Machine Learning (ICML), 2018.
- Mean field theory for sigmoid belief networks. Journal of Artificial Intelligence Research, 4:--76, 1996.
- Variational Gibbs inference for statistical model estimation fromincomplete data. Journal of Machine Learning Research, (24):--72, 2023.
- Generative modeling by estimating gradients of the data distribution. Neural Information Processing Systems (NeurIPS), 2019.
- Learning stochastic inverses. Neural Information Processing Systems (NIPS), 2013.
- Tijmen Tieleman. Training restricted Boltzmann machines using approximations to thelikelihood gradient. International Conference on Machine Learning (ICML), 2008.
- Generative flow networks for discrete probabilistic modeling. International Conference on Machine Learning (ICML), 2022.
- A variational perspective on generative flow networks. Transactions on Machine Learning Research (TMLR), 2023.