Towards Efficient Active Learning in NLP via Pretrained Representations (2402.15613v1)
Abstract: Fine-tuning LLMs is now a common approach for text classification in a wide range of applications. When labeled documents are scarce, active learning helps save annotation efforts but requires retraining of massive models on each acquisition iteration. We drastically expedite this process by using pretrained representations of LLMs within the active learning loop and, once the desired amount of labeled data is acquired, fine-tuning that or even a different pretrained LLM on this labeled data to achieve the best performance. As verified on common text classification benchmarks with pretrained BERT and RoBERTa as the backbone, our strategy yields similar performance to fine-tuning all the way through the active learning loop but is orders of magnitude less computationally expensive. The data acquired with our procedure generalizes across pretrained networks, allowing flexibility in choosing the final model or updating it as newer versions get released.
- Active learning: A survey. In Data Classification, pages 599–634. Chapman and Hall/CRC.
- Impact of batch size on stopping active learning for text classification. In 2018 IEEE 12th International Conference on Semantic Computing (ICSC). IEEE.
- The power of ensembles for active learning in image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9368–9377.
- Klaus Brinker. 2003. Incorporating diversity in active learning with support vector machines. In Proceedings of the Twentieth International Conference on International Conference on Machine Learning, ICML’03, page 59–66. AAAI Press.
- Batch active learning at scale. Advances in Neural Information Processing Systems, 34:11933–11944.
- ELECTRA: Pre-training text encoders as discriminators rather than generators. In ICLR.
- Selection via proxy: Efficient data selection for deep learning. In ICLR 2020.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Active learning for bert: an empirical study. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7949–7962.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations.
- Melanie Ducoffe and Frederic Precioso. 2018. Adversarial active learning for deep networks: a margin based approach. arXiv preprint arXiv:1802.09841.
- William Falcon and The PyTorch Lightning team. 2019. PyTorch Lightning.
- Linton C. Freeman. 1965. Elementary applied statistics: for students in behavioral science. Wiley, New York.
- A survey on instance selection for active learning. Knowledge and information systems, 35(2):249–283.
- Deep bayesian active learning with image data. In International Conference on Machine Learning, pages 1183–1192. PMLR.
- Daniel Gissin and Shai Shalev-Shwartz. 2019. Discriminative active learning. arXiv preprint arXiv:1907.06347.
- Yuhong Guo and Dale Schuurmans. 2007. Discriminative batch mode active learning. In Advances in Neural Information Processing Systems, volume 20. Curran Associates, Inc.
- Bayesian active learning for classification and preference learning. arXiv preprint arXiv:1112.5745.
- Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146.
- Active learning for speech recognition: the power of gradients. CoRR, abs/1612.03226.
- What makes imagenet good for transfer learning?
- Deep embeddings and logistic regression for rapid active learning in histopathological images. Computer Methods and Programs in Biomedicine, 212:106464.
- Nitin Jindal and Bing Liu. 2007. Review spam detection. In Proceedings of the 16th International Conference on World Wide Web, WWW ’07, page 1189–1190, New York, NY, USA. Association for Computing Machinery.
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
- Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning. Advances in neural information processing systems, 32.
- Recurrent convolutional neural networks for text classification. In Twenty-ninth AAAI Conference on Artificial Intelligence.
- David D. Lewis and William A. Gale. 1994. A sequential algorithm for training text classifiers. In SIGIR ’94, pages 3–12, London. Springer London.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA. Association for Computational Linguistics.
- A scoping review of transfer learning research on medical image analysis using imagenet. Computers in Biology and Medicine, 128:104115.
- Hieu T Nguyen and Arnold Smeulders. 2004. Active learning using pre-clustering. In Proceedings of the Twenty-first International Conference on Machine Learning, page 79.
- FAMIE: A fast active learning framework for multilingual information extraction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations, pages 131–139, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc.
- Cluster-based active learning. arXiv preprint arXiv:1812.11780.
- Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
- Christopher Schröder and Andreas Niekler. 2020. A survey of active learning for text classification using deep neural networks.
- Ozan Sener and Silvio Savarese. 2018. Active learning for convolutional neural networks: A core-set approach. In ICLR (Poster).
- Burr Settles. 2009. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison.
- Multiple-instance active learning. Advances in neural information processing systems, 20.
- Active learning for sequence tagging with deep pre-trained models and bayesian uncertainty estimates. arXiv preprint arXiv:2101.08133.
- Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA. Association for Computational Linguistics.
- Diversity enhanced active learning with strictly proper scoring rules. Advances in Neural Information Processing Systems, 34:10906–10918.
- Simon Tong and Daphne Koller. 2001. Support vector machine active learning with applications to text classification. Journal of machine learning research, 2(Nov):45–66.
- Attention is all you need. Advances in neural information processing systems, 30.
- GLUE: A multi-task benchmark and analysis platform for natural language understanding. In the Proceedings of ICLR.
- Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Advances in Neural Information Processing Systems, 33:5776–5788.
- Neural network acceptability judgments. arXiv preprint arXiv:1805.12471.
- Towards general and efficient active learning. arXiv preprint arXiv:2112.07963.
- Minjie Xu and Gary Kazantsev. 2019. Understanding goal-oriented active learning via influence functions. arXiv preprint arXiv:1905.13183.
- Representative sampling for text classification using support vector machines. In European Conference on Information Retrieval, pages 393–407. Springer.
- A comparative survey of deep active learning. arXiv preprint arXiv:2203.13450.
- Character-level convolutional networks for text classification. In NIPS.
- Active discriminative text representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31.