Benchmarking Multi-Domain Active Learning on Image Classification (2312.00364v1)
Abstract: Active learning aims to enhance model performance by strategically labeling informative data points. While extensively studied, its effectiveness on large-scale, real-world datasets remains underexplored. Existing research primarily focuses on single-source data, ignoring the multi-domain nature of real-world data. We introduce a multi-domain active learning benchmark to bridge this gap. Our benchmark demonstrates that traditional single-domain active learning strategies are often less effective than random selection in multi-domain scenarios. We also introduce CLIP-GeoYFCC, a novel large-scale image dataset built around geographical domains, in contrast to existing genre-based domain datasets. Analysis on our benchmark shows that all multi-domain strategies exhibit significant tradeoffs, with no strategy outperforming across all datasets or all metrics, emphasizing the need for future research.
- Active domain adaptation via clustering uncertainty-weighted embeddings. In Proceedings of the IEEE International Conference on Computer Vision, 2021.
- A survey of deep active learning. ACM Computing Surveys, 54, 2022.
- Active sampling for min-max fairness. In Proceedings of Machine Learning Research, 2022.
- Adaptive sampling to reduce disparate performance. CoRR, abs/2006.06879, 2020.
- Fair active learning. Expert Systems with Applications, 199, 2022.
- Deep batch active learning by diverse, uncertain gradient lower bounds. In 8th International Conference on Learning Representations, ICLR 2020, 2020.
- Is margin all you need? an extensive empirical study of active learning on tabular data, 2022.
- Multinomial adversarial networks for multi-domain text classification. arXiv preprint arXiv:1802.05694, 2018.
- Batch active learning at scale. In Advances in Neural Information Processing Systems, 2021.
- Active learning for bert: an empirical study. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7949–7962, 2020.
- Adaptive methods for real-world domain generalization. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2021.
- Active learning at the imagenet scale, 2021.
- Unsupervised domain adaptation by backpropagation. In International conference on machine learning, pages 1180–1189. PMLR, 2015.
- Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096–2030, 2016.
- Leveraging unlabeled data to predict out-of-distribution performance. arXiv preprint arXiv:2201.04234, 2022.
- Discriminative active learning, 2019.
- Geodesic flow kernel for unsupervised domain adaptation. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2066–2073, 2012.
- Leveraging hierarchical structure for multi-domain active learning with theoretical guarantees, 2023.
- Multi-domain active learning: Literature review and comparative study, 2022.
- What makes imagenet good for transfer learning?, 2016.
- Multi-class active learning for image classification. In 2009 ieee conference on computer vision and pattern recognition, pages 2372–2379. IEEE, 2009.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- David D Lewis. A sequential algorithm for training text classifiers: Corrigendum and additional data. In Acm Sigir Forum, pages 13–19. ACM New York, NY, USA, 1995.
- Heterogeneous uncertainty sampling for supervised learning. In Machine learning proceedings 1994, pages 148–156. Elsevier, 1994.
- Minimax pareto fairness: A multi objective perspective. In 37th International Conference on Machine Learning, ICML 2020, 2020.
- On the relationship between data efficiency and error for uncertainty sampling. In 35th International Conference on Machine Learning, ICML 2018, 2018.
- On the importance of adaptive data collection for extremely imbalanced pairwise tasks. In Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020, 2020.
- Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4293–4302, 2016.
- Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011.
- Moment matching for multi-source domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision, 2019.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- A survey of deep active learning. ACM computing surveys (CSUR), 54(9):1–40, 2021.
- Margin-based active learning for structured output spaces. In Machine Learning: ECML 2006: 17th European Conference on Machine Learning Berlin, Germany, September 18-22, 2006 Proceedings 17, pages 413–424. Springer, 2006.
- Imagenet large scale visual recognition challenge, 2015.
- Active learning for convolutional neural networks: A core-set approach. In 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings, 2018.
- Burr Settles. Active learning literature survey. Machine Learning, 15, 2010.
- Promoting fairness in learned models by learning to active learn under parity constraints. In ACM International Conference Proceeding Series, 2022.
- Adaptive sampling for minimax fair classification. In Advances in Neural Information Processing Systems, 2021.
- Active adversarial domain adaptation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 739–748, 2020.
- Active learning helps pretrained models learn the intended task. In Advances in Neural Information Processing Systems, 2022.
- Yfcc100m: The new data in multimedia research. Communications of the ACM, 59(2):64–73, 2016.
- Deep hashing network for unsupervised domain adaptation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5385–5394. IEEE Computer Society, 2017.
- A new active labeling method for deep learning. In 2014 International joint conference on neural networks (IJCNN), pages 112–119. IEEE, 2014.
- Conditional adversarial networks for multi-domain text classification. arXiv preprint arXiv:2102.10176, 2021.