Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Convergence of Loss and Uncertainty-based Active Learning Algorithms (2312.13927v4)

Published 21 Dec 2023 in cs.LG and cs.AI

Abstract: We investigate the convergence rates and data sample sizes required for training a machine learning model using a stochastic gradient descent (SGD) algorithm, where data points are sampled based on either their loss value or uncertainty value. These training methods are particularly relevant for active learning and data subset selection problems. For SGD with a constant step size update, we present convergence results for linear classifiers and linearly separable datasets using squared hinge loss and similar training loss functions. Additionally, we extend our analysis to more general classifiers and datasets, considering a wide range of loss-based sampling strategies and smooth convex training loss functions. We propose a novel algorithm called Adaptive-Weight Sampling (AWS) that utilizes SGD with an adaptive step size that achieves stochastic Polyak's step size in expectation. We establish convergence rate results for AWS for smooth convex training loss functions. Our numerical experiments demonstrate the efficiency of AWS on various datasets by using either exact or estimated loss values.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Alekh Agarwal. Selective sampling algorithms for cost-sensitive multiclass prediction. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 1220–1228, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR.
  2. Algorithms for hyper-parameter optimization. Advances in Neural Information Processing Systems, 24, 2011.
  3. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the 30th International Conference on Machine Learning, pages 115–123. PMLR, 2013.
  4. Beating the hold-out: Bounds for k-fold and progressive cross-validation. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory (COLT), pages 203–208, 1999.
  5. Sébastien Bubeck. Convex optimization: Algorithms and complexity. Found. Trends Mach. Learn., 8(3–4):231–357, nov 2015.
  6. Learning noisy linear classifiers via adaptive and selective sampling. Machine Learning, 83(1):71–102, 2011.
  7. Worst-case analysis of selective sampling for linear classification. Journal of Machine Learning Research, 7(44):1205–1230, 2006.
  8. Robust bounds for classification via selective sampling. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, page 121–128, New York, NY, USA, 2009. Association for Computing Machinery.
  9. Libsvm: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2(3):1–27, 2011.
  10. On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res., 2:265–292, mar 2002. ISSN 1532-4435.
  11. Analysis of perceptron-based active learning. Journal of Machine Learning Research, 10(11):281–299, 2009.
  12. Selective sampling and active learning from single and multiple teachers. J. Mach. Learn. Res., 13(1):2655–2697, sep 2012. ISSN 1532-4435.
  13. Selective sampling using the query by committee algorithm. Machine Learning, 28(2):133–168, 1997.
  14. Query by committee made real. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems, volume 18. MIT Press, 2005.
  15. Unifying approaches in active learning and active sampling via fisher information and information-theoretic quanties. Transactions on Machine Learning Research, 2022.
  16. Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  17. Deup: Direct epistemic uncertainty prediction. Transactions on Machine Learning Research, 2022.
  18. A sequential algorithm for training text classifiers. In Bruce W. Croft and C. J. van Rijsbergen, editors, SIGIR ’94, pages 3–12, London, 1994. Springer London.
  19. Understanding uncertainty sampling, 2023. URL https://arxiv.org/abs/2307.02719.
  20. Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. In Arindam Banerjee and Kenji Fukumizu, editors, Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, pages 1306–1314. PMLR, 13–15 Apr 2021.
  21. Online active learning in data stream regression using uncertainty sampling based on evolving generalized fuzzy models. IEEE Transactions on Fuzzy Systems, 26(1):292–309, 2018.
  22. Loss prediction: End-to-end active learning approach for speech recognition. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–7. IEEE, 2021.
  23. Uncertainty sampling is preconditioned stochastic gradient descent on zero-one loss. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  24. Loss-based active learning for named entity recognition. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2021.
  25. How to measure uncertainty in uncertainty sampling for active learning. Machine Learning, 111(1):89–122, 2022.
  26. Better algorithms for selective sampling. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, page 433–440, Madison, WI, USA, 2011. Omnipress. ISBN 9781450306195.
  27. Convergence of uncertainty sampling for active learning. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 18310–18331. PMLR, 17–23 Jul 2022.
  28. Jason Rennie. Smooth hinge classification, 2005. URL http://qwone.com/ jason/writing/smoothHinge.pdf.
  29. Loss functions for preference levels: Regression with discrete ordered labels. Proceedings of the IJCAI Multidisciplinary Workshop on Advances in Preference Handling, 01 2005.
  30. F. Rosenblatt. he perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6):386–408, 1958.
  31. Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the Eighteenth International Conference on Machine Learning, page 441–448, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc.
  32. Less is more: Active learning with support vector machines. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML ’00, page 839–846, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc.
  33. Burr Settles. Active learning: Synthesis lectures on artificial intelligence and machine learning. Springer Cham, 2012.
  34. Multiple-instance active learning. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2007.
  35. Query by committee. COLT ’92, page 287–294, New York, NY, USA, 1992. Association for Computing Machinery. ISBN 089791497X.
  36. Jie Shen. On the power of localized perceptron for label-optimal learning of halfspaces with adversarial noise. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 9503–9514, 18–24 Jul 2021.
  37. Ambiguity-based multiclass active learning. IEEE Transactions on Fuzzy Systems, 24(1):242–248, 2016.
  38. Revisiting perceptron: Efficient and label-optimal learning of halfspaces. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  39. Y. Yang and M. Loog. Active learning using uncertainty information. In Proceedings of the International Conference on Pattern Recoginition (ICPR), page 2646–2651, 2016.
  40. Multi-class active learning by uncertainty sampling with diversity maximization. International Journal of Computer Vision, 113(2):113–127, 2015.
  41. Learning loss for active learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  42. Active learning with sampling by uncertainty and density for data annotations. IEEE Transactions on Audio, Speech, and Language Processing, 18(6):1323–1331, 2010.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Daniel Haimovich (6 papers)
  2. Dima Karamshuk (5 papers)
  3. Fridolin Linder (3 papers)
  4. Niek Tax (27 papers)
  5. Milan Vojnovic (25 papers)

Summary

We haven't generated a summary for this paper yet.