Papers
Topics
Authors
Recent
Search
2000 character limit reached

On Dataset Transferability in Active Learning for Transformers

Published 16 May 2023 in cs.LG and cs.CL | (2305.09807v2)

Abstract: Active learning (AL) aims to reduce labeling costs by querying the examples most beneficial for model learning. While the effectiveness of AL for fine-tuning transformer-based pre-trained LLMs (PLMs) has been demonstrated, it is less clear to what extent the AL gains obtained with one model transfer to others. We consider the problem of transferability of actively acquired datasets in text classification and investigate whether AL gains persist when a dataset built using AL coupled with a specific PLM is used to train a different PLM. We link the AL dataset transferability to the similarity of instances queried by the different PLMs and show that AL methods with similar acquisition sequences produce highly transferable datasets regardless of the models used. Additionally, we show that the similarity of acquisition sequences is influenced more by the choice of the AL method than the choice of the model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Deep batch active learning by diverse, uncertain gradient lower bounds. arXiv preprint arXiv:1906.03671.
  2. Josh Attenberg and Foster Provost. 2011. Inactive learning? difficulties employing active learning in practice. ACM SIGKDD Explorations Newsletter, 12(2):36–41.
  3. ELECTRA: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555.
  4. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  5. Active learning for BERT: an empirical study. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7949–7962.
  6. On statistical bias in active learning: How and when to fix it. arXiv preprint arXiv:2101.11665.
  7. To softmax, or not to softmax: that is the question when applying active learning for transformer models. arXiv preprint arXiv:2210.03005.
  8. Fine-tuning BERT for low-resource natural language understanding via active learning. arXiv preprint arXiv:2012.02462.
  9. Harold W Kuhn. 1955. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83–97.
  10. Xin Li and Dan Roth. 2002. Learning question classifiers. In COLING 2002: The 19th International Conference on Computational Linguistics.
  11. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.
  12. Practical obstacles to deploying active learning. arXiv preprint arXiv:1807.04801.
  13. On the importance of effectively adapting pretrained language models for active learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 825–836.
  14. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the ACL.
  15. GloVe: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543.
  16. Rissanen data analysis: Examining dataset characteristics via description length. In International Conference on Machine Learning, pages 8500–8513. PMLR.
  17. Sampling bias in deep active classification: An empirical study. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4058–4068.
  18. Revisiting uncertainty-based query strategies for active learning with transformers. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2194–2203.
  19. Ozan Sener and Silvio Savarese. 2017. Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489.
  20. Burr Settles. 2009. Active learning literature survey. Computer sciences technical report.
  21. Active learning for sequence tagging with deep pre-trained models and bayesian uncertainty estimates. arXiv preprint arXiv:2101.08133.
  22. Katrin Tomanek and Katherina Morik. 2011. Inspecting sample reusability for active learning. In Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, pages 169–181. JMLR Workshop and Conference Proceedings.
  23. Towards computationally feasible deep active learning. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1198–1218.
  24. Attention is all you need. Advances in neural information processing systems, 30.
  25. Neural network acceptability judgments. arXiv preprint arXiv:1805.12471.
  26. Cold-start active learning through self-supervised language modeling. arXiv preprint arXiv:2010.09535.
  27. Character-level convolutional networks for text classification. Advances in neural information processing systems, 28.
  28. Fedor Zhdanov. 2019. Diverse mini-batch active learning. arXiv preprint arXiv:1901.05954.
Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.