Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Gradient Estimation via Adaptive Sampling and Importance Sampling (2311.14468v2)

Published 24 Nov 2023 in cs.LG

Abstract: Machine learning problems rely heavily on stochastic gradient descent (SGD) for optimization. The effectiveness of SGD is contingent upon accurately estimating gradients from a mini-batch of data samples. Instead of the commonly used uniform sampling, adaptive or importance sampling reduces noise in gradient estimation by forming mini-batches that prioritize crucial data points. Previous research has suggested that data points should be selected with probabilities proportional to their gradient norm. Nevertheless, existing algorithms have struggled to efficiently integrate importance sampling into machine learning frameworks. In this work, we make two contributions. First, we present an algorithm that can incorporate existing importance functions into our framework. Second, we propose a simplified importance function that relies solely on the loss gradient of the output layer. By leveraging our proposed gradient estimation techniques, we observe improved convergence in classification and regression tasks with minimal computational overhead. We validate the effectiveness of our adaptive and importance-sampling approach on image and point-cloud datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Variance reduction in sgd by distributed importance sampling. arXiv preprint arXiv:1511.06481, 2015.
  2. Fast kernel classifiers with online and active learning. Journal of Machine Learning Research, 6(54):1579–1619, 2005. URL http://jmlr.org/papers/v6/bordes05a.html.
  3. One backward from ten forward, subsampling for large-scale deep learning. arXiv preprint arXiv:2104.13114, 2021.
  4. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  5. A study of gradient variance in deep learning. arXiv preprint arXiv:2007.04532, 2020.
  6. Variance-reduced methods for machine learning. Proceedings of the IEEE, 108(11):1968–1983, 2020.
  7. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  8. Richard Zou Horace He. functorch: Jax-like composable function transforms for pytorch. https://github.com/pytorch/functorch, 2021.
  9. Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems, 26, 2013.
  10. Not all samples are created equal: Deep learning with importance sampling. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp.  2525–2534. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/katharopoulos18a.html.
  11. Biased importance sampling for deep neural network training. ArXiv, abs/1706.00043, 2017. URL https://api.semanticscholar.org/CorpusID:38367260.
  12. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  13. Learning multiple layers of features from tiny images. 2009.
  14. Online batch selection for faster training of neural networks. arXiv preprint arXiv:1511.06343, 2015.
  15. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
  16. Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger (eds.), Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceedings.neurips.cc/paper_files/paper/2014/file/f29c21d4897f78948b91f03172341b7b-Paper.pdf.
  17. Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pp.  722–729. IEEE, 2008.
  18. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  652–660, 2017.
  19. Low: Training deep neural networks by learning optimal sample weights. Pattern Recognition, 110:107585, 2021.
  20. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015.
  21. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  22. Implicit neural representations with periodic activation functions. Advances in neural information processing systems, 33:7462–7473, 2020.
  23. Lieven Vandenberghe. The cvxopt linear and quadratic cone program solvers. Online: http://cvxopt. org/documentation/coneprog. pdf, 2010.
  24. Accelerating deep neural network training with inconsistent stochastic gradient descent. Neural Networks, 93:219–229, 2017.
  25. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  1912–1920, 2015.
  26. Determinantal point processes for mini-batch diversification. arXiv preprint arXiv:1705.00607, 2017.
  27. Active mini-batch sampling using repulsive point processes. In Proceedings of the AAAI conference on Artificial Intelligence, volume 33, pp.  5741–5748, 2019.
  28. Adaselection: Accelerating deep learning training through data subsampling. arXiv preprint arXiv:2306.10728, 2023.
  29. Stochastic optimization with importance sampling for regularized loss minimization. In Francis Bach and David Blei (eds.), Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pp.  1–9, Lille, France, 07–09 Jul 2015. PMLR. URL https://proceedings.mlr.press/v37/zhaoa15.html.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Corentin Salaün (6 papers)
  2. Xingchang Huang (7 papers)
  3. Iliyan Georgiev (21 papers)
  4. Niloy J. Mitra (83 papers)
  5. Gurprit Singh (18 papers)

Summary

We haven't generated a summary for this paper yet.