Multiple importance sampling for stochastic gradient estimation (2407.15525v1)
Abstract: We introduce a theoretical and practical framework for efficient importance sampling of mini-batch samples for gradient estimation from single and multiple probability distributions. To handle noisy gradients, our framework dynamically evolves the importance distribution during training by utilizing a self-adaptive metric. Our framework combines multiple, diverse sampling distributions, each tailored to specific parameter gradients. This approach facilitates the importance sampling of vector-valued gradient estimation. Rather than naively combining multiple distributions, our framework involves optimally weighting data contribution across multiple distributions. This adapted combination of multiple importance yields superior gradient estimates, leading to faster training convergence. We demonstrate the effectiveness of our approach through empirical evaluations across a range of optimization tasks like classification and regression on both image and point cloud datasets.
- Variance reduction in sgd by distributed importance sampling. arXiv preprint arXiv:1511.06481, 2015.
- Adaptive learning of the optimal batch size of sgd, 2021.
- Coupling adaptive batch sizes with learning rates. In Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI), pp. ID 141, August 2017. URL http://auai.org/uai2017/proceedings/papers/141.pdf.
- Fast kernel classifiers with online and active learning. Journal of Machine Learning Research, 6(54):1579–1619, 2005. URL http://jmlr.org/papers/v6/bordes05a.html.
- Deng, L. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE signal processing magazine, 29(6):141–142, 2012.
- One backward from ten forward, subsampling for large-scale deep learning. arXiv preprint arXiv:2104.13114, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- A study of gradient variance in deep learning. arXiv preprint arXiv:2007.04532, 2020.
- Variance-aware multiple importance sampling. ACM Trans. Graph., 38(6), nov 2019. ISSN 0730-0301. doi: 10.1145/3355089.3356515. URL https://doi.org/10.1145/3355089.3356515.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Kahn, H. Random sampling (monte carlo) techniques in neutron attenuation problems–i. Nucleonics, 6(5):27, passim, 1950.
- Methods of reducing sample size in monte carlo computations. Journal of the Operations Research Society of America, 1(5):263–278, 1953.
- Biased importance sampling for deep neural network training. ArXiv, abs/1706.00043, 2017. URL https://api.semanticscholar.org/CorpusID:38367260.
- Not all samples are created equal: Deep learning with importance sampling. In Dy, J. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 2525–2534. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/katharopoulos18a.html.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Optimal multiple importance sampling. ACM Transactions on Graphics (TOG), 38(4):37, 2019.
- Learning multiple layers of features from tiny images. Technical report, Toronto, ON, Canada, 2009.
- Langley, P. Crafting papers on machine learning. In Langley, P. (ed.), Proceedings of the 17th International Conference on Machine Learning (ICML 2000), pp. 1207–1216, Stanford, CA, 2000. Morgan Kaufmann.
- Online batch selection for faster training of neural networks. arXiv preprint arXiv:1511.06343, 2015.
- Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., and Weinberger, K. (eds.), Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceedings.neurips.cc/paper_files/paper/2014/file/f29c21d4897f78948b91f03172341b7b-Paper.pdf.
- Safe and effective importance sampling. Journal of the American Statistical Association, 95(449):135–143, 2000. doi: 10.1080/01621459.2000.10473909. URL https://www.tandfonline.com/doi/abs/10.1080/01621459.2000.10473909.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660, 2017.
- Adaptive antithetic sampling for variance reduction. In International Conference on Machine Learning, pp. 5420–5428. PMLR, 2019.
- Learning representations by back-propagating errors. nature, 323(6088):533–536, 1986.
- Low: Training deep neural networks by learning optimal sample weights. Pattern Recognition, 110:107585, 2021.
- Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015.
- Implicit neural representations with periodic activation functions. Advances in neural information processing systems, 33:7462–7473, 2020.
- Veach, E. Robust Monte Carlo methods for light transport simulation, volume 1610. Stanford University PhD thesis, 1997.
- Accelerating deep neural network training with inconsistent stochastic gradient descent. Neural Networks, 93:219–229, 2017.
- 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1912–1920, 2015.
- Active mini-batch sampling using repulsive point processes. In Proceedings of the AAAI conference on Artificial Intelligence, volume 33, pp. 5741–5748, 2019.
- Adaselection: Accelerating deep learning training through data subsampling. arXiv preprint arXiv:2306.10728, 2023.
- Stochastic optimization with importance sampling for regularized loss minimization. In Bach, F. and Blei, D. (eds.), Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pp. 1–9, Lille, France, 07–09 Jul 2015. PMLR. URL https://proceedings.mlr.press/v37/zhaoa15.html.
- Corentin Salaün (6 papers)
- Xingchang Huang (7 papers)
- Iliyan Georgiev (21 papers)
- Niloy J. Mitra (83 papers)
- Gurprit Singh (18 papers)