TiDAL: Learning Training Dynamics for Active Learning (2210.06788v3)
Abstract: Active learning (AL) aims to select the most useful data samples from an unlabeled data pool and annotate them to expand the labeled dataset under a limited budget. Especially, uncertainty-based methods choose the most uncertain samples, which are known to be effective in improving model performance. However, AL literature often overlooks training dynamics (TD), defined as the ever-changing model behavior during optimization via stochastic gradient descent, even though other areas of literature have empirically shown that TD provides important clues for measuring the sample uncertainty. In this paper, we propose a novel AL method, Training Dynamics for Active Learning (TiDAL), which leverages the TD to quantify uncertainties of unlabeled data. Since tracking the TD of all the large-scale unlabeled data is impractical, TiDAL utilizes an additional prediction module that learns the TD of labeled data. To further justify the design of TiDAL, we provide theoretical and empirical evidence to argue the usefulness of leveraging TD for AL. Experimental results show that our TiDAL achieves better or comparable performance on both balanced and imbalanced benchmark datasets compared to state-of-the-art AL methods, which estimate data uncertainty using only static information after model training.
- Dana Angluin. Queries and concept learning. Machine learning, 2(4):319–342, 1988.
- Unsupervised label noise modeling and loss correction. In International conference on machine learning, pages 312–321. PMLR, 2019.
- A convergence analysis of gradient descent for deep linear neural networks. In International Conference on Learning Representations, 2018.
- Deep batch active learning by diverse, uncertain gradient lower bounds. arXiv preprint arXiv:1906.03671, 2019.
- Training connectionist networks with queries and selective sampling. In Advances in neural information processing systems, pages 566–573. Citeseer, 1990.
- The power of ensembles for active learning in image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9368–9377, 2018.
- Class-balanced active learning for image classification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1536–1545, 2022.
- Learning imbalanced datasets with label-distribution-aware margin loss. Advances in Neural Information Processing Systems, 32:1567–1578, 2019.
- Active bias: Training more accurate neural networks by emphasizing high variance samples. Advances in Neural Information Processing Systems, 30, 2017.
- Improving generalization with active learning. Machine learning, 15(2):201–221, 1994.
- Addressing failure prediction by learning model confidence. Advances in Neural Information Processing Systems, 32, 2019.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Deep bayesian active learning with image data. In International Conference on Machine Learning, pages 1183–1192. PMLR, 2017.
- Discriminative active learning. arXiv preprint arXiv:1907.06347, 2019.
- Meta agent teaming active learning for pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11079–11089, 2022.
- On calibration of modern neural networks. In International conference on machine learning, pages 1321–1330. PMLR, 2017.
- The local elasticity of neural networks. In International Conference on Learning Representations, 2019.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Disentangling label distribution for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6626–6636, 2021.
- Semi-supervised active learning with temporal output discrepancy. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3447–3456, 2021.
- Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, 31, 2018.
- To trust or not to trust a classifier. Advances in neural information processing systems, 31, 2018.
- Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In International Conference on Machine Learning, pages 2304–2313. PMLR, 2018.
- Kenji Kawaguchi. Deep learning without poor local minima. Advances in neural information processing systems, 29, 2016.
- Task-aware variational adversarial active learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8166–8175, 2021.
- Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
- Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
- The cifar-10 dataset. online: http://www.cs.toronto.edu/kriz/cifar.html, 55(5), 2014.
- Temporal ensembling for semi-supervised learning. In International Conference on Learning Representations (ICLR). OpenReview.net, 2017.
- Deep neural networks as gaussian processes. In International Conference on Learning Representations, 2018.
- A sequential algorithm for training text classifiers. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 3–12, 1994.
- Visualizing the loss landscape of neural nets. Advances in neural information processing systems, 31, 2018.
- Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2537–2546, 2019.
- Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011.
- Confident learning: Estimating uncertainty in dataset labels. Journal of Artificial Intelligence Research, 70:1373–1411, 2021.
- Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning. arXiv preprint arXiv:1803.04765, 2018.
- A data cartography based mixup for pre-trained language models. arXiv preprint arXiv:2205.03403, 2022.
- Active learning by feature mixing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12237–12246, 2022.
- Identifying mislabeled data using the area under the margin ranking. Advances in Neural Information Processing Systems, 33:17044–17056, 2020.
- A survey of deep active learning. ACM Computing Surveys (CSUR), 54(9):1–40, 2021.
- Margin-based active learning for structured output spaces. In European Conference on Machine Learning, pages 413–424. Springer, 2006.
- Active learning for convolutional neural networks: A core-set approach. In International Conference on Learning Representations, 2018.
- Claude Elwood Shannon. A mathematical theory of communication. The Bell system technical journal, 27(3):379–423, 1948.
- Deep active learning: Unified and principled method for query and training. In International Conference on Artificial Intelligence and Statistics, pages 1308–1318. PMLR, 2020.
- Variational adversarial active learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5972–5981, 2019.
- Selfie: Refurbishing unclean samples for robust deep learning. In International Conference on Machine Learning, pages 5907–5915. PMLR, 2019.
- Dataset cartography: Mapping and diagnosing datasets with training dynamics. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9275–9293, 2020.
- An empirical study of example forgetting during deep neural network learning. In International Conference on Learning Representations, 2018.
- Bayesian generative active deep learning. In International Conference on Machine Learning, pages 6295–6304. PMLR, 2019.
- The inaturalist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8769–8778, 2018.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
- Learning loss for active learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 93–102, 2019.
- Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.
- Imitating deep learning dynamics via locally elastic stochastic differential equations. Advances in Neural Information Processing Systems, 34, 2021.
- Cartography active learning. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 395–406, 2021.
- Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9719–9728, 2020.
- Curriculum learning by dynamic instance hardness. Advances in Neural Information Processing Systems, 33:8602–8613, 2020.
- Generative adversarial active learning. arXiv preprint arXiv:1702.07956, 2017.
- Seong Min Kye (7 papers)
- Kwanghee Choi (27 papers)
- Hyeongmin Byun (3 papers)
- Buru Chang (21 papers)