Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ASPEST: Bridging the Gap Between Active Learning and Selective Prediction (2304.03870v3)

Published 7 Apr 2023 in cs.LG

Abstract: Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. These predictions can then be deferred to humans for further evaluation. As an everlasting challenge for machine learning, in many real-world scenarios, the distribution of test data is different from the training data. This results in more inaccurate predictions, and often increased dependence on humans, which can be difficult and expensive. Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples. Selective prediction and active learning have been approached from different angles, with the connection between them missing. In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain while increasing accuracy and coverage. For this new paradigm, we propose a simple yet effective approach, ASPEST, that utilizes ensembles of model snapshots with self-training with their aggregated outputs as pseudo labels. Extensive experiments on numerous image, text and structured datasets, which suffer from domain shifts, demonstrate that ASPEST can significantly outperform prior work on selective prediction and active learning (e.g. on the MNIST$\to$SVHN benchmark with the labeling budget of 100, ASPEST improves the AUACC metric from 79.36% to 88.84%) and achieves more optimal utilization of humans in the loop.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. Deep batch active learning by diverse, uncertain gradient lower bounds. arXiv preprint arXiv:1906.03671, 2019.
  2. Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. Advances in neural information processing systems, 32, 2019.
  3. The power of ensembles for active learning in image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  9368–9377, 2018.
  4. Wendy Kan Benjamin Bossan, Josef Feigl. Otto group product classification challenge, 2015.
  5. Lof: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp.  93–104, 2000.
  6. Detecting errors and estimating accuracy on unlabeled data with self-training ensembles. Advances in Neural Information Processing Systems, 34:14980–14992, 2021.
  7. Functional map of the world. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  6172–6180, 2018.
  8. Estimating generalization under distribution shifts via domain-invariant representations. arXiv preprint arXiv:2007.03511, 2020.
  9. Cinic-10 is not imagenet or cifar-10. arXiv preprint arXiv:1810.03505, 2018.
  10. Sanjoy Dasgupta. Two faces of active learning. Theoretical computer science, 412(19):1767–1781, 2011.
  11. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  12. Adversarial active learning for deep networks: a margin based approach. arXiv preprint arXiv:1802.09841, 2018.
  13. Ran El-Yaniv et al. On the foundations of noise-free selective classification. Journal of Machine Learning Research, 11(5), 2010.
  14. Deep ensembles: A loss landscape perspective. arXiv preprint arXiv:1912.02757, 2019.
  15. Transferable query selection for active domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  7272–7281, 2021.
  16. Support vector machines with embedded reject option. In International Workshop on Support Vector Machines, pp. 68–82. Springer, 2002.
  17. Deep bayesian active learning with image data. In International Conference on Machine Learning, pp. 1183–1192. PMLR, 2017.
  18. Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096–2030, 2016.
  19. Selective classification for deep neural networks. Advances in neural information processing systems, 30, 2017.
  20. Selectivenet: A deep neural network with an integrated reject option. In International conference on machine learning, pp. 2151–2159. PMLR, 2019.
  21. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  22. Semi-supervised learning by entropy minimization. Advances in neural information processing systems, 17, 2004.
  23. Doctor: A simple method for detecting misclassification errors. Advances in Neural Information Processing Systems, 34:5669–5681, 2021.
  24. Active learning on a budget: Opposite strategies suit high and low budgets. arXiv preprint arXiv:2202.02794, 2022.
  25. Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567, 2014.
  26. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016a.
  27. Identity mappings in deep residual networks. In European conference on computer vision, pp.  630–645. Springer, 2016b.
  28. Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  41–50, 2019.
  29. Martin E Hellman. The nearest neighbor classification rule with a reject option. IEEE Transactions on Systems Science and Cybernetics, 6(3):179–185, 1970.
  30. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136, 2016.
  31. Snapshot ensembles: Train 1, get m for free. arXiv preprint arXiv:1704.00109, 2017a.
  32. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  4700–4708, 2017b.
  33. Active learning by querying informative and representative examples. Advances in neural information processing systems, 23, 2010.
  34. Support and invertibility in domain-invariant representations. In The 22nd International Conference on Artificial Intelligence and Statistics, pp.  527–536. PMLR, 2019.
  35. Selective question answering under domain shift. arXiv preprint arXiv:2006.09462, 2020.
  36. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  37. Test distribution-aware active learning: A principled approach against distribution shift and outliers. arXiv preprint arXiv:2106.11719, 2021.
  38. Diverse lottery tickets boost ensemble from a single pretrained model. arXiv preprint arXiv:2205.11833, 2022.
  39. Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pp. 5637–5664. PMLR, 2021.
  40. Learning multiple layers of features from tiny images. 2009.
  41. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems, 30, 2017.
  42. Yann LeCun. The MNIST database of handwritten digits. 1998. URL http://yann.lecun.com/exdb/mnist/.
  43. Handwritten digit recognition with a back-propagation network. Advances in neural information processing systems, 2, 1989.
  44. Dong-Hyun Lee et al. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, volume 3, pp.  896, 2013.
  45. Deep unsupervised domain adaptation: A review of recent advances and perspectives. APSIPA Transactions on Signal and Information Processing, 11(1), 2022.
  46. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  47. Conditional adversarial domain adaptation. Advances in neural information processing systems, 31, 2018.
  48. Employing em and pool-based active learning for text classification. In ICML, volume 98, pp.  350–358. Citeseer, 1998.
  49. Boosted convolutional neural networks. In BMVC, volume 5, pp.  6, 2016.
  50. Consistent estimators for learning to defer to an expert. In International Conference on Machine Learning, pp. 7076–7087. PMLR, 2020.
  51. Reading digits in natural images with unsupervised feature learning. 2011.
  52. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp.  188–197, 2019.
  53. Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. Advances in neural information processing systems, 32, 2019.
  54. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  55. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision, pp.  1406–1415, 2019.
  56. Active domain adaptation via clustering uncertainty-weighted embeddings. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  8505–8514, 2021.
  57. Selective classification via neural network training dynamics. arXiv preprint arXiv:2205.13532, 2022.
  58. Extending the wilds benchmark for unsupervised adaptation. arXiv preprint arXiv:2112.05090, 2021.
  59. Semi-supervised domain adaptation via minimax entropy. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  8050–8058, 2019.
  60. A unified survey on anomaly, novelty, open-set, and out-of-distribution detection: Solutions and future challenges. arXiv preprint arXiv:2110.14051, 2021.
  61. Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489, 2017.
  62. Burr Settles. Active learning literature survey. 2009.
  63. Variational adversarial active learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  5972–5981, 2019.
  64. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in neural information processing systems, 33:596–608, 2020.
  65. Active adversarial domain adaptation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp.  739–748, 2020.
  66. Vladimir Vapnik. Statistical learning theory. Wiley, 1998. ISBN 978-0-471-03003-4.
  67. Boost neural networks by checkpoints. Advances in Neural Information Processing Systems, 34:19719–19729, 2021.
  68. Theoretical analysis of self-training with deep networks on unlabeled data. In International Conference on Learning Representations, 2020.
  69. Wanqian Yang. Making Decisions Under High Stakes: Trustworthy and Expressive Bayesian Deep Learning. PhD thesis, 2020.
  70. Wild-time: A benchmark of in-the-wild distribution shift over time. arXiv preprint arXiv:2211.14238, 2022.
  71. David Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In 33rd annual meeting of the association for computational linguistics, pp.  189–196, 1995.
  72. Cold-start active learning through self-supervised language modeling. arXiv preprint arXiv:2010.09535, 2020.
  73. Active learning under label shift. In International Conference on Artificial Intelligence and Statistics, pp.  3412–3420. PMLR, 2021.
  74. Knowledge distillation by on-the-fly native ensemble. Advances in neural information processing systems, 31, 2018.
Citations (1)

Summary

We haven't generated a summary for this paper yet.