Accelerating Convergence of Stein Variational Gradient Descent via Deep Unfolding (2402.15125v1)
Abstract: Stein variational gradient descent (SVGD) is a prominent particle-based variational inference method used for sampling a target distribution. SVGD has attracted interest for application in machine-learning techniques such as Bayesian inference. In this paper, we propose novel trainable algorithms that incorporate a deep-learning technique called deep unfolding,into SVGD. This approach facilitates the learning of the internal parameters of SVGD, thereby accelerating its convergence speed. To evaluate the proposed trainable SVGD algorithms, we conducted numerical simulations of three tasks: sampling a one-dimensional Gaussian mixture, performing Bayesian logistic regression, and learning Bayesian neural networks. The results show that our proposed algorithms exhibit faster convergence than the conventional variants of SVGD.
- G. E. Box and G. C. Tiao, “Bayesian Inference in Statistical Analysis,” John Wiley & Sons, 2011.
- D. M. Blei, A. Kucukelbir, and J. D. McAuliffe, “Variational Inference: A Review for Statisticians,” Journal of the American Statistics Association, vol. 112, no. 518, pp. 859-877, 2017.
- D. Luengo, L. Martino, M. Bugallo, V. Elvira, and S. Särkkä, “A Survey of Monte Carlo Methods for Parameter Estimation,” EURASIP Journal of Advances in Signal Processing, vol. 2020, p. 25, 2020
- S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth, “Hybrid Monte Carlo,” Physics Letters B, vol. 195, no. 2, pp. 216–222, 1987.
- M. D. Hoffman and A. Gelman, “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1593–1623, 2014.
- Q. Liu and D. Wang, “Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm,” Advances In Neural Information Processing Systems, pp. 2378-2386, 2016.
- C. Liu, J. Zhuo, P. Cheng, R. Zhang, and J. Zhu, “Understanding and Accelerating Particle-based Variational Inference," International Conference on Machine Learning, pp. 4082-4092, 2019.
- K. Gregor and Y. LeCun, “Learning Fast Approximations of Sparse Coding," International Conference on Machine Learning, pp. 399-406, 2010.
- V. Monga, Y. Li, and Y. C. Eldar, “Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing," IEEE Signal Processing Magazine, vol. 38, no. 2, pp. 18-44, 2021.
- N. Shlezinger, Y. C. Eldar, and S. P. Boyd, “Model-Based Deep Learning: On the Intersection of Deep Learning and Optimization," IEEE Access, vol. 10, pp. 115384-115398, 2022.
- D. Ito, S. Takabe and T. Wadayama, “Trainable ISTA for Sparse Signal Recovery," IEEE Transactions on Signal Processing, vol. 67, no. 12, pp. 3113-3125,2019.
- S. Takabe, M. Imanishi, T. Wadayama, R. Hayakawa, and K. Hayashi, “Trainable Projected Gradient Detector for Massive Overloaded MIMO Channels: Data-Driven Tuning Approach,” IEEE Access, vol. 7, pp. 93326-93338, 2019.
- S. Takabe and T. Wadayama, “Chebyshev Periodical Successive Over Relaxation for Accelerating Fixed-Point Iterations,” IEEE Signal Processing Letters, vol. 28, pp. 907-911, 2021.
- J. Zhuo, C. Liu, J. Shi, J. Zhu, N. Chen, and B. Zhang, “Message Passing Stein Variational Gradient Descent,” International Conference on Machine Learning, pp. 6018-6027, 2018.
- F. D’Angelo and V. Fortuin, “Annealed Stein Variational Gradient Descent,” Symposium on Advances in Approximate Bayesian Inference,, 1-10, 2020.
- S. Takabe and T. Wadayama, “Convergence Acceleration via Chebyshev Step: Plausible Interpretation of Deep-Unfolded Gradient Descent,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E105.A, no. 8, pp. 1110-1120, 2022.
- L. L. di Langosco, V. Fortuin, and H. Strathmann, “Nueral Variational Gradient Descent,” Symposium on Advances in Approximate Bayesian Inference,, 1-17, 2020.
- PyTorch: https://pytorch.org/.
- D. P. Kingma and J. L. Ba, “Adam: A Method for Stochastic Optimization,” arXiv:1412.6980, 2014.
- T. Tieleman and G. Hinton, “Lecture 6.5-rmsprop: Divide the Gradient by a Running Average of its Recent Magnitude,” COURSERA: Neural Networks for Machine Learning, 2012.
- A. Gretton, K. M. Borgwardt, M. Rasch, B. Schoelkopf, and A. J. Smola, “A Kernel Method for the Two-Sample-Problem,” Advances in Neural Information Processing Systems, pp. 513-520, 2006.
- S. Gershman, M. Hoffman, and D. Blei, “Nonparametric Variational Inference,” International Conference on Machine Learning, pp.235-242, 2012.
- T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, “Meta-Learning in Neural Networks: A Survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 5149-5169, 2022.