Forward $χ^2$ Divergence Based Variational Importance Sampling (2311.02516v2)
Abstract: Maximizing the log-likelihood is a crucial aspect of learning latent variable models, and variational inference (VI) stands as the commonly adopted method. However, VI can encounter challenges in achieving a high log-likelihood when dealing with complicated posterior distributions. In response to this limitation, we introduce a novel variational importance sampling (VIS) approach that directly estimates and maximizes the log-likelihood. VIS leverages the optimal proposal distribution, achieved by minimizing the forward $\chi2$ divergence, to enhance log-likelihood estimation. We apply VIS to various popular latent variable models, including mixture models, variational auto-encoders, and partially observable generalized linear models. Results demonstrate that our approach consistently outperforms state-of-the-art baselines, both in terms of log-likelihood and model parameter estimation.
- Convergence rates for optimised adaptive importance samplers. Statistics and Computing, 31:1–17, 2021.
- Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518):859–877, 2017.
- Importance weighted autoencoders. arXiv preprint arXiv:1509.00519, 2015.
- Variational inference via χ𝜒\chiitalic_χ upper bound minimization. Advances in Neural Information Processing Systems, 30, 2017.
- Importance weighting and variational inference. Advances in neural information processing systems, 31, 2018.
- On importance-weighted autoencoders. arXiv preprint arXiv:1907.10477, 2019.
- Statistics. w. w, 1998.
- On the difficulty of unbiased alpha divergence minimization. arXiv preprint arXiv:2010.09541, 2020.
- Black-box alpha divergence minimization. In International conference on machine learning, pp. 1511–1520. PMLR, 2016.
- Variational refinement for importance sampling using the forward kullback-leibler divergence. In Uncertainty in Artificial Intelligence, pp. 1819–1829. PMLR, 2021.
- Hiroshi Kajino. A differentiable point process with its application to spiking neural networks. In International Conference on Machine Learning, pp. 5226–5235. PMLR, 2021.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Bayesian estimates of equation system parameters: an application of integration by monte carlo. Econometrica: Journal of the Econometric Society, pp. 1–19, 1978.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Variational inference with rényi divergence. Statistics, 1050, 2016.
- On relations between the relative entropy and χ𝜒\chiitalic_χ 2-divergence, generalizations and applications. Entropy, 22(5):563, 2020.
- Gary W Oehlert. A note on the delta method. The American Statistician, 46(1):27–29, 1992.
- Neural characterization in partially observed populations of spiking neurons. Advances in Neural Information Processing Systems, 20, 2007.
- Fully bayesian inference for neural models with negative-binomial spiking. Advances in neural information processing systems, 25, 2012.
- Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature, 454(7207):995–999, 2008.
- Challenges in computing and optimizing upper bounds of marginal likelihood based on chi-square divergences. In Second Symposium on Advances in Approximate Bayesian Inference, 2019.
- Stochastic variational learning in recurrent spiking networks. Frontiers in computational neuroscience, 8(ARTICLE):38, 2014.
- Ram Naresh Saraswat. Chi square divergence measure and their bounds. In 3rd International Conference on “Innovative Approach in Applied Physical, Mathematical/Statistical”, Chemical Sciences and Emerging Energy Technology for Sustainable Development, pp. 55, 2014.
- f𝑓fitalic_f-divergence inequalities. IEEE Transactions on Information Theory, 62(11):5973–6006, 2016.
- Gradient estimation using stochastic computation graphs. Advances in neural information processing systems, 28, 2015.
- Bounding evidence and estimating log-likelihood in vae. arXiv preprint arXiv:2206.09453, 2022.
- Variational approximation for importance sampling. Computational Statistics, 36(3):1901–1930, 2021.
- Yes, but did it work?: Evaluating variational inference. In International Conference on Machine Learning, pp. 5581–5590. PMLR, 2018.