Preferential Subsampling for Stochastic Gradient Langevin Dynamics (2210.16189v3)
Abstract: Stochastic gradient MCMC (SGMCMC) offers a scalable alternative to traditional MCMC, by constructing an unbiased estimate of the gradient of the log-posterior with a small, uniformly-weighted subsample of the data. While efficient to compute, the resulting gradient estimator may exhibit a high variance and impact sampler performance. The problem of variance control has been traditionally addressed by constructing a better stochastic gradient estimator, often using control variates. We propose to use a discrete, non-uniform probability distribution to preferentially subsample data points that have a greater impact on the stochastic gradient. In addition, we present a method of adaptively adjusting the subsample size at each iteration of the algorithm, so that we increase the subsample size in areas of the sample space where the gradient is harder to estimate. We demonstrate that such an approach can maintain the same level of accuracy while substantially reducing the average subsample size that is used.
- Control variates for stochastic gradient MCMC. Statistics and Computing, 29(3):599–615, 2019.
- Comparative Accuracies of Neural Networks and Discriminant Analysis in Predicting Forest Cover Types from Cartographic Variables. In Second Southern Forestry GIS Conference, pages 189–199, 1998.
- Optimization Methods for Large-Scale Machine Learning. SIAM Review, 60(2):223–311, 2018.
- On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 764–773. PMLR, 10–15 Jul 2018.
- A Convergence Analysis for A Class of Practical Variance-Reduction Stochastic Gradient MCMC. Science China Information Sciences, 62(12101), 2019.
- Stochastic Gradient Hamiltonian Monte Carlo. In Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pages 1683–1691. PMLR, 2014.
- User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient. Stochastic Processes and their Applications, 129(12):5278–5311, 2019.
- Variance Reduction in Stochastic Gradient Langevin Dynamics. In Advances in Neural Information Processing Systems, volume 29, 2016.
- High-dimensional Bayesian inference via the unadjusted Langevin algorithm. Bernoulli, 25(4A):2854–2882, 2019.
- Log-concave sampling: Metropolis-Hastings algorithms are fast! In Proceedings of the 31st Conference On Learning Theory, volume 75 of Proceedings of Machine Learning Research, pages 793–797. PMLR, 2018.
- CPSG-MCMC: Clustering-Based Preprocessing method for Stochastic Gradient MCMC. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 841–850. PMLR, 2017.
- Measuring Sample Quality with Kernels. In Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1292–1301. PMLR, 2017.
- W.K. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1):97–109, 1970.
- Tamás Kern and A Gyorgy. SVRG++ with Non-uniform Sampling. In Proceedings of the 9th NIPS Workshop on Optimization for Machine Learning, 2016.
- Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), 2016.
- Improving Sampling Accuracy of Stochastic Gradient MCMC Methods via Non-uniform Subsampling of Gradients. Discrete and Continuous Dynamical Systems - S, 0, 2021.
- A Kernelized Stein Discrepancy for Goodness-of-fit Tests. In Proceedings of the 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 276–284. PMLR, 2016.
- Adam with Bandit Sampling for Deep Learning. In Advances in Neural Information Processing Systems, volume 33, pages 5393–5404, 2020.
- Equation of State Calculations by Fast Computing Machines. Journal of Chemical Physics, 21(6):1087–1092, 1953.
- Radford M Neal. MCMC using Hamiltonian Dynamics. In Handbook of Markov Chain Monte Carlo, chapter 5. Chapman and Hall/CRC, 2011.
- Stochastic gradient Markov chain Monte Carlo. Journal of the American Statistical Association, 116(533):433–450, 2021.
- Giorgio Parisi. Correlation functions and computer simulations. Nuclear Physics B, 180(3):378–384, 1981.
- Monte Carlo Statistical Methods. Springer Science & Business Media, 2nd edition, 2004.
- Optimal scaling of discrete approximations to Langevin diffusions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(1):255–268, 1998.
- Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli, pages 341–363, 1996.
- Stochastic Optimization with Bandit Sampling. arXiv preprint arXiv:1708.02544, 2017.
- Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, volume 38 of Proceedings of Machine Learning Research, pages 819–828. PMLR, 2015.
- Minimizing Finite Sums with the Stochastic Average Gradient. Mathematical Programming, 162:83–112, 2017.
- Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics. Journal of Machine Learning Research, 17(7):1–33, 2016.
- Exploration of the (Non-)Asymptotic Bias and Variance of Stochastic Gradient Langevin Dynamics. Journal of Machine Learning Research, 17(159):1–48, 2016.
- Bayesian Learning via Stochastic Gradient Langevin Dynamics. In Proceedings of the 28th International Conference on Machine Learning, pages 681–688, 2011.
- Determinantal Point Processes for Mini-Batch Diversification. In Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI), pages 1–13, 2017.
- Accelerating Minibatch Stochastic Gradient Descent using Stratified Sampling. arXiv preprint arXiv:1405.3080, 2014a.
- Stochastic Optimization with Importance Sampling. arXiv preprint arXiv:1401.2753, 2014b.
- Stochastic Optimization with Importance Sampling for Regularized Loss Minimization. In Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1–9. PMLR, 2015.