Inconsistency-Based Data-Centric Active Open-Set Annotation (2401.04923v1)
Abstract: Active learning is a commonly used approach that reduces the labeling effort required to train deep neural networks. However, the effectiveness of current active learning methods is limited by their closed-world assumptions, which assume that all data in the unlabeled pool comes from a set of predefined known classes. This assumption is often not valid in practical situations, as there may be unknown classes in the unlabeled data, leading to the active open-set annotation problem. The presence of unknown classes in the data can significantly impact the performance of existing active learning methods due to the uncertainty they introduce. To address this issue, we propose a novel data-centric active learning method called NEAT that actively annotates open-set data. NEAT is designed to label known classes data from a pool of both known and unknown classes unlabeled data. It utilizes the clusterability of labels to identify the known classes from the unlabeled pool and selects informative samples from those classes based on a consistency criterion that measures inconsistencies between model predictions and local feature distribution. Unlike the recently proposed learning-centric method for the same problem, NEAT is much more computationally efficient and is a data-centric active open-set annotation method. Our experiments demonstrate that NEAT achieves significantly better performance than state-of-the-art active learning methods for active open-set annotation.
- Deep batch active learning by diverse, uncertain gradient lower bounds. arXiv preprint arXiv:1906.03671.
- Pre-trained language model based active learning for sentence matching. arXiv preprint arXiv:2010.05522.
- Agnostic active learning. In Proceedings of the 23rd international conference on Machine learning, 65–72.
- Towards open set deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1563–1572.
- Importance weighted active learning. In Proceedings of the 26th annual international conference on machine learning, 49–56.
- Bishop, C. M. 2006. Pattern Recognition and Machine Learning, volume 4 of Information science and statistics. Springer. ISBN 9780387310732.
- Active learning for deep object detection. arXiv preprint arXiv:1809.09875.
- Improving generalization with active learning. Machine learning, 15: 201–221.
- Active learning with statistical models. Journal of artificial intelligence research, 4: 129–145.
- Dasgupta, S. 2005. Coarse sample complexity bounds for active learning. Advances in neural information processing systems, 18.
- Dasgupta, S. 2011. Two faces of active learning. Theoretical computer science, 412(19): 1767–1781.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 248–255. Ieee.
- Adversarial active learning for deep networks: a margin based approach. arXiv preprint arXiv:1802.09841.
- Deep bayesian active learning with image data. In International conference on machine learning, 1183–1192. PMLR.
- On the resistance of nearest neighbor to random noisy labels. arXiv preprint arXiv:1607.07526.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
- Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario.
- Tiny imagenet visual recognition challenge. CS 231N, 7(7): 3.
- A Sequential Algorithm for Training Text Classifiers. Proceedings of the 17th annual international ACM SIGIR conference on Research and development in Information Retrieval, 29.
- Active learning for open-set annotation. In CVPR, 41–49.
- Meta-Query-Net: Resolving Purity-Informativeness Dilemma in Open-set Active Learning. In Proceedings of the 35th Neural Information Processing Systems (NeurIPS) Conference.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748–8763. PMLR.
- Learning from crowds. Journal of machine learning research, 11(4).
- Active hidden markov models for information extraction. In Advances in Intelligent Data Analysis: 4th International Conference, IDA 2001 Cascais, Portugal, September 13–15, 2001 Proceedings 4, 309–318. Springer.
- Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489.
- Active learning on pre-trained language model with task-independent triplet loss. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 11276–11284.
- Settles, B. 2009. Active Learning Literature Survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison.
- Settles, B. 2011. From theories to queries: Active learning in practice. In Active learning and experimental design workshop in conjunction with AISTATS 2010, 1–18. JMLR Workshop and Conference Proceedings.
- Shannon, C. E. 1948. A mathematical theory of communication. The Bell system technical journal, 27(3): 379–423.
- Cheap and fast–but is it good? evaluating non-expert annotations for natural language tasks. In Proceedings of the 2008 conference on empirical methods in natural language processing, 254–263.
- Utility data annotation with amazon mechanical turk. In 2008 IEEE computer society conference on computer vision and pattern recognition workshops, 1–8. IEEE.
- Bayesian generative active deep learning. In International Conference on Machine Learning, 6295–6304. PMLR.
- Visualizing data using t-SNE. Journal of machine learning research, 9(11).
- Cost-effective active learning for deep image classification. IEEE Transactions on Circuits and Systems for Video Technology, 27(12): 2591–2600.
- The multidimensional wisdom of crowds. Advances in neural information processing systems, 23.
- Detecting corrupted labels without training a model to predict. In International Conference on Machine Learning, 27412–27427. PMLR.
- Clusterability as an alternative to anchor points when learning with noisy labels. In International Conference on Machine Learning, 12912–12923. PMLR.
- Ruiyu Mao (3 papers)
- Ouyang Xu (1 paper)
- Yunhui Guo (36 papers)