Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Expanded Benchmark that Rediscovers and Affirms the Edge of Uncertainty Sampling for Active Learning in Tabular Datasets (2306.08954v3)

Published 15 Jun 2023 in cs.LG

Abstract: Active Learning (AL) addresses the crucial challenge of enabling machines to efficiently gather labeled examples through strategic queries. Among the many AL strategies, Uncertainty Sampling (US) stands out as one of the most widely adopted. US queries the example(s) that the current model finds uncertain, proving to be both straightforward and effective. Despite claims in the literature suggesting superior alternatives to US, community-wide acceptance remains elusive. In fact, existing benchmarks for tabular datasets present conflicting conclusions on the continued competitiveness of US. In this study, we review the literature on AL strategies in the last decade and build the most comprehensive open-source AL benchmark to date to understand the relative merits of different AL strategies. The benchmark surpasses existing ones by encompassing a broader coverage of strategies, models, and data. Through our investigation of the conflicting conclusions in existing tabular AL benchmarks by evaluation under broad AL experimental settings, we uncover fresh insights into the often-overlooked issue of using machine learning models--model compatibility in the context of US. Specifically, we notice that adopting the different models for the querying unlabeled examples and learning tasks would degrade US's effectiveness. Notably, our findings affirm that US maintains a competitive edge over other strategies when paired with compatible models. These findings have practical implications and provide a concrete recipe for AL practitioners, empowering them to make informed decisions when working with tabular classifications with limited labeled data. The code for this project is available on https://github.com/ariapoy/active-learning-benchmark.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Deep batch active learning by diverse, uncertain gradient lower bounds. arXiv preprint arXiv:1906.03671, 2019.
  2. Effective evaluation of deep active learning on image classification tasks. arXiv preprint arXiv:2106.15324, 2021.
  3. Active content popularity learning via query-by-committee for edge caching. In 2019 53rd Asilomar Conference on Signals, Systems, and Computers, pages 301–305. IEEE, 2019.
  4. Active learning for deep object detection. arXiv preprint arXiv:1809.09875, 2018.
  5. Batch mode active learning for regression with expected model change. IEEE transactions on neural networks and learning systems, 28(7):1668–1681, 2016.
  6. G. C. Cawley. Baseline methods for active learning. In Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, pages 47–57. JMLR Workshop and Conference Proceedings, 2011.
  7. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/c̃jlin/libsvm.
  8. Batch mode active sampling based on marginal probability distribution matching. KDD : proceedings. International Conference on Knowledge Discovery & Data Mining, 2012:741–749, 2012.
  9. F. Cheng and J. Dong. Data-driven online detection of tip wear in tip-based nanomachining using incremental adaptive support vector machine. Journal of Manufacturing Processes, 69:412–421, 2021.
  10. S. Dasgupta and D. Hsu. Hierarchical sampling for active learning. In Proceedings of the 25th international conference on Machine learning, pages 208–215, 2008.
  11. Unsupervised Active Learning For Video Annotation. In ICML Active Learning Workshop 2015, Lille, France, July 2015. URL https://hal.science/hal-01350092.
  12. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. doi: 10.1109/CVPR.2009.5206848.
  13. L. Desreumaux and V. Lemaire. Learning active learning at the crossroads? evaluation and discussion. arXiv e-prints, art. arXiv:2012.09631, Dec. 2020.
  14. S. Diamond and S. Boyd. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016.
  15. S. Dong. Multi class svm algorithm with active learning for network traffic classification. Expert Systems with Applications, 176:114885, 2021.
  16. D. Dua and C. Graff. UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
  17. Ralf: A reinforced active learning formulation for object class recognition. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 3626–3633, 2012. doi: 10.1109/CVPR.2012.6248108.
  18. Vissl. https://github.com/facebookresearch/vissl, 2021.
  19. A survey of deep learning techniques for autonomous driving. Journal of Field Robotics, 37(3):362–386, 2020.
  20. Design and analysis of the wcci 2010 active learning challenge. In The 2010 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2010. doi: 10.1109/IJCNN.2010.5596506.
  21. Results of the active learning challenge. In I. Guyon, G. Cawley, G. Dror, V. Lemaire, and A. Statnikov, editors, Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, volume 16 of Proceedings of Machine Learning Research, pages 19–45, Sardinia, Italy, 16 May 2011. PMLR. URL https://proceedings.mlr.press/v16/guyon11a.html.
  22. Randomness is the root of all evil: More reliable evaluation of deep active learning. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 3932–3941, 2023. doi: 10.1109/WACV56688.2023.00393.
  23. Mind your outliers! investigating the negative impact of outliers on active learning for visual question answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7265–7281, Online, Aug. 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.564. URL https://aclanthology.org/2021.acl-long.564.
  24. Using active learning for assisted short answer grading. 2020.
  25. Learning active learning from data. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/file/8ca8da41fe1ebc8d3ca31dc14f5fc56c-Paper.pdf.
  26. Challenges of reliable, realistic and comparable active learning evaluation. In Proceedings of the Workshop and Tutorial on Interactive Adaptive Learning, pages 2–14, 2017.
  27. Active learning with support vector machines. Wiley Int. Rev. Data Min. and Knowl. Disc., 4(4):313–326, jul 2014. ISSN 1942-4787. doi: 10.1002/widm.1132. URL https://doi.org/10.1002/widm.1132.
  28. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017.
  29. Active learning using hint information. Neural computation, 27(8):1738–1765, 2015.
  30. A survey on deep learning in medical image analysis. Medical image analysis, 42:60–88, 2017.
  31. Decal: Deployable clinical active learning. arXiv preprint arXiv:2206.10120, 2022.
  32. A more robust baseline for active learning by injecting randomness to uncertainty sampling. In Proceedings of the AI and HCI Workshop @ ICML, July 2023.
  33. Toward realistic evaluation of deep active learning algorithms in image classification. arXiv preprint arXiv:2301.10625, 2023.
  34. Towards robust and reproducible active learning using neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 223–232, 2022.
  35. S. Mussmann and P. Liang. On the relationship between data efficiency and error for uncertainty sampling. In International Conference on Machine Learning, pages 3674–3682. PMLR, 2018.
  36. Active learning with expected error reduction. arXiv preprint arXiv:2211.09283, 2022.
  37. Active learning for air quality station deployment. 2020.
  38. Diminishing uncertainty within the training pool: Active learning for medical image segmentation. IEEE Transactions on Medical Imaging, 40:2534–2547, 2020.
  39. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  40. Deepcave: An interactive analysis tool for automated machine learning, 2022. URL https://arxiv.org/abs/2206.03493.
  41. O. Sener and S. Savarese. Active learning for convolutional neural networks: A core-set approach. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=H1aIuk-RW.
  42. B. Settles. Active Learning, volume 6. Springer, 7 2012. doi: 10.2200/S00429ED1V01Y201207AIM018. URL https://www.morganclaypool.com/doi/abs/10.2200/S00429ED1V01Y201207AIM018?ai=1ge&mi=6e3g68&af=R.
  43. Query by committee. In Proceedings of the fifth annual workshop on Computational learning theory, pages 287–294, 1992.
  44. Deep active learning: Unified and principled method for query and training. In International Conference on Artificial Intelligence and Statistics, pages 1308–1318. PMLR, 2020.
  45. Variational adversarial active learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5972–5981, 2019.
  46. ALiPy: Active learning in python. Technical report, Nanjing University of Aeronautics and Astronautics, 1 2019. URL https://github.com/NUAA-AL/ALiPy. available as arXiv preprint https://arxiv.org/abs/1901.03802.
  47. Uniform versus uncertainty sampling: When being active is less efficient than staying passive. arXiv preprint arXiv:2212.00772, 2022.
  48. An overview and a benchmark of active learning for outlier detection with one-class classifiers. Expert Systems with Applications, 168:114372, 2021. ISSN 0957-4174. doi: https://doi.org/10.1016/j.eswa.2020.114372. URL https://www.sciencedirect.com/science/article/pii/S0957417420310496.
  49. Evidential query-by-committee active learning for pedestrian detection in high-density crowds. International Journal of Approximate Reasoning, 104:166–184, 2019.
  50. Enhancing personalized recommendation by transductive support vector machine and active learning. Security and Communication Networks, 2022, 2022a.
  51. USB: A unified semi-supervised learning benchmark for classification. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022b. URL https://openreview.net/forum?id=QeuwINa96C.
  52. Z. Wang and J. Ye. Querying discriminative and representative samples for batch mode active learning. ACM Trans. Knowl. Discov. Data, 9(3), feb 2015. ISSN 1556-4681. doi: 10.1145/2700408. URL https://doi.org/10.1145/2700408.
  53. Active learning for regression using greedy sampling. Information Sciences, 474:90–105, 2019.
  54. Y. Yang and M. Loog. A benchmark and comparison of active learning for logistic regression. Pattern Recognition, 83:401–415, 2018. ISSN 0031-3203. doi: https://doi.org/10.1016/j.patcog.2018.06.004. URL https://www.sciencedirect.com/science/article/pii/S0031320318302140.
  55. Multi-class active learning by uncertainty sampling with diversity maximization. International Journal of Computer Vision, 113:113–127, 2015.
  56. libact: Pool-based active learning in python. Technical report, National Taiwan University, 10 2017. URL https://github.com/ntucllab/libact. available as arXiv preprint https://arxiv.org/abs/1710.00379.
  57. r. Yilei “Dolee” Yang. Active learning playground. https://github.com/google/active-learning, 2017. URL https://github.com/google/active-learning.
  58. D. Yoo and I. S. Kweon. Learning loss for active learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 93–102, 2019.
  59. Multiple instance active learning for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5330–5339, 2021.
  60. A comparative survey: Benchmarking for pool-based active learning. In Z.-H. Zhou, editor, Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 4679–4686. International Joint Conferences on Artificial Intelligence Organization, 8 2021. doi: 10.24963/ijcai.2021/634. URL https://doi.org/10.24963/ijcai.2021/634. Survey Track.
  61. A comparative survey of deep active learning. arXiv preprint arXiv:2203.13450, 2022.
  62. A graph-based approach for active learning in regression. In Proceedings of the 2020 SIAM International Conference on Data Mining, pages 280–288. SIAM, 2020.
  63. WRENCH: A comprehensive benchmark for weak supervision. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021. URL https://openreview.net/forum?id=Q9SKS5k8io.
  64. A survey of active learning for natural language processing. arXiv preprint arXiv:2210.10109, 2022.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com