ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios (2405.10808v1)
Abstract: Active learning is designed to minimize annotation efforts by prioritizing instances that most enhance learning. However, many active learning strategies struggle with a 'cold start' problem, needing substantial initial data to be effective. This limitation often reduces their utility for pre-trained models, which already perform well in few-shot scenarios. To address this, we introduce ActiveLLM, a novel active learning approach that leverages LLMs such as GPT-4, Llama 3, and Mistral Large for selecting instances. We demonstrate that ActiveLLM significantly enhances the classification performance of BERT classifiers in few-shot scenarios, outperforming both traditional active learning methods and the few-shot learning method SetFit. Additionally, ActiveLLM can be extended to non-few-shot scenarios, allowing for iterative selections. In this way, ActiveLLM can even help other active learning strategies to overcome their cold start problem. Our results suggest that ActiveLLM offers a promising solution for improving model performance across various learning setups.
- Gemini: A family of highly capable multimodal models. CoRR, abs/2312.11805, 2023. doi:10.48550/ARXIV.2312.11805. URL https://doi.org/10.48550/arXiv.2312.11805.
- Active learning and the total cost of annotation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing , EMNLP 2004, A meeting of SIGDAT, a Special Interest Group of the ACL, held in conjunction with ACL 2004, 25-26 July 2004, Barcelona, Spain, pages 9–16. ACL, 2004. URL https://aclanthology.org/W04-3202/.
- Multi-level fine-tuning, data augmentation, and few-shot learning for specialized cyber threat intelligence. Comput. Secur., 134:103430, 2023. doi:10.1016/J.COSE.2023.103430. URL https://doi.org/10.1016/j.cose.2023.103430.
- Language models are few-shot learners. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
- Making your first choice: To address cold start problem in medical active learning. In Ipek Oguz, Jack H. Noble, Xiaoxiao Li, Martin Styner, Christian Baumgartner, Mirabela Rusu, Tobias Heimann, Despina Kontos, Bennett A. Landman, and Benoit M. Dawant, editors, Medical Imaging with Deep Learning, MIDL 2023, 10-12 July 2023, Nashville, TN, USA, volume 227 of Proceedings of Machine Learning Research, pages 496–525. PMLR, 2023. URL https://proceedings.mlr.press/v227/chen24a.html.
- BERT: pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics, 2019. doi:10.18653/V1/N19-1423. URL https://doi.org/10.18653/v1/n19-1423.
- Active learning for BERT: an empirical study. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 7949–7962. Association for Computational Linguistics, 2020. doi:10.18653/V1/2020.EMNLP-MAIN.638. URL https://doi.org/10.18653/v1/2020.emnlp-main.638.
- Making pre-trained language models better few-shot learners. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli, editors, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pages 3816–3830. Association for Computational Linguistics, 2021a. doi:10.18653/V1/2021.ACL-LONG.295. URL https://doi.org/10.18653/v1/2021.acl-long.295.
- Making pre-trained language models better few-shot learners. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli, editors, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pages 3816–3830. Association for Computational Linguistics, 2021b. doi:10.18653/V1/2021.ACL-LONG.295. URL https://doi.org/10.18653/v1/2021.acl-long.295.
- Fine-tuning BERT for low-resource natural language understanding via active learning. In Donia Scott, Núria Bel, and Chengqing Zong, editors, Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8-13, 2020, pages 1158–1171. International Committee on Computational Linguistics, 2020. doi:10.18653/V1/2020.COLING-MAIN.100. URL https://doi.org/10.18653/v1/2020.coling-main.100.
- Bayesian active learning for classification and preference learning. CoRR, abs/1112.5745, 2011. URL http://arxiv.org/abs/1112.5745.
- Ruler: What’s the real context size of your long-context language models?, 2024.
- Active learning for reducing labeling effort in text classification tasks. In Luis A. Leiva, Cédric Pruski, Réka Markovich, Amro Najjar, and Christoph Schommer, editors, Artificial Intelligence and Machine Learning - 33rd Benelux Conference on Artificial Intelligence, BNAIC/Benelearn 2021, Esch-sur-Alzette, Luxembourg, November 10-12, 2021, Revised Selected Papers, volume 1530 of Communications in Computer and Information Science, pages 3–29. Springer, 2021. doi:10.1007/978-3-030-93842-0_1. URL https://doi.org/10.1007/978-3-030-93842-0_1.
- Mixtral of experts. CoRR, abs/2401.04088, 2024. doi:10.48550/ARXIV.2401.04088. URL https://doi.org/10.48550/arXiv.2401.04088.
- Practical obstacles to deploying active learning. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 21–30. Association for Computational Linguistics, 2019. doi:10.18653/V1/D19-1003. URL https://doi.org/10.18653/v1/D19-1003.
- On the stability of fine-tuning BERT: misconceptions, explanations, and strong baselines. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. URL https://openreview.net/forum?id=nzpLWnVAyah.
- FAMIE: A fast active learning framework for multilingual information extraction. CoRR, abs/2202.08316, 2022. URL https://arxiv.org/abs/2202.08316.
- OpenAI. GPT-4 technical report. CoRR, abs/2303.08774, 2023. doi:10.48550/ARXIV.2303.08774. URL https://doi.org/10.48550/arXiv.2303.08774.
- Sentence-bert: Sentence embeddings using siamese bert-networks. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 3980–3990. Association for Computational Linguistics, 2019. doi:10.18653/V1/D19-1410. URL https://doi.org/10.18653/v1/D19-1410.
- Exploiting cloze-questions for few-shot text classification and natural language inference. In Paola Merlo, Jörg Tiedemann, and Reut Tsarfaty, editors, Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, EACL 2021, Online, April 19 - 23, 2021, pages 255–269. Association for Computational Linguistics, 2021. doi:10.18653/V1/2021.EACL-MAIN.20. URL https://doi.org/10.18653/v1/2021.eacl-main.20.
- Revisiting uncertainty-based query strategies for active learning with transformers. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, editors, Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 2194–2203. Association for Computational Linguistics, 2022. doi:10.18653/V1/2022.FINDINGS-ACL.172. URL https://doi.org/10.18653/v1/2022.findings-acl.172.
- Active learning on pre-trained language model with task-independent triplet loss. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022, pages 11276–11284. AAAI Press, 2022. doi:10.1609/AAAI.V36I10.21378. URL https://doi.org/10.1609/aaai.v36i10.21378.
- Burr Settles. Active Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2012. ISBN 978-3-031-00432-2. doi:10.2200/S00429ED1V01Y201207AIM018. URL https://doi.org/10.2200/S00429ED1V01Y201207AIM018.
- Active learning for sequence tagging with deep pre-trained models and bayesian uncertainty estimates. In Paola Merlo, Jörg Tiedemann, and Reut Tsarfaty, editors, Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, EACL 2021, Online, April 19 - 23, 2021, pages 1698–1712. Association for Computational Linguistics, 2021. doi:10.18653/V1/2021.EACL-MAIN.145. URL https://doi.org/10.18653/v1/2021.eacl-main.145.
- Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 1631–1642. ACL, 2013. URL https://aclanthology.org/D13-1170/.
- Improving and simplifying pattern exploiting training. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pages 4980–4991. Association for Computational Linguistics, 2021. doi:10.18653/V1/2021.EMNLP-MAIN.407. URL https://doi.org/10.18653/v1/2021.emnlp-main.407.
- Towards computationally feasible deep active learning. In Marine Carpuat, Marie-Catherine de Marneffe, and Iván Vladimir Meza Ruíz, editors, Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pages 1198–1218. Association for Computational Linguistics, 2022. doi:10.18653/V1/2022.FINDINGS-NAACL.90. URL https://doi.org/10.18653/v1/2022.findings-naacl.90.
- Efficient few-shot learning without prompts. CoRR, abs/2209.11055, 2022a. doi:10.48550/ARXIV.2209.11055. URL https://doi.org/10.48550/arXiv.2209.11055.
- Efficient few-shot learning without prompts. CoRR, abs/2209.11055, 2022b. doi:10.48550/ARXIV.2209.11055. URL https://doi.org/10.48550/arXiv.2209.11055.
- GLUE: A multi-task benchmark and analysis platform for natural language understanding. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL https://openreview.net/forum?id=rJ4km2R5t7.
- Chain-of-thought prompting elicits reasoning in large language models. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html.
- Cold-start active learning through self-supervised language modeling. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 7935–7948. Association for Computational Linguistics, 2020. doi:10.18653/V1/2020.EMNLP-MAIN.637. URL https://doi.org/10.18653/v1/2020.emnlp-main.637.
- Differentiable prompt makes pre-trained language models better few-shot learners. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022a. URL https://openreview.net/forum?id=ek9a0qIafW.
- Character-level convolutional networks for text classification. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 649–657, 2015. URL https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html.
- Active example selection for in-context learning. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 9134–9148. Association for Computational Linguistics, 2022b. doi:10.18653/V1/2022.EMNLP-MAIN.622. URL https://doi.org/10.18653/v1/2022.emnlp-main.622.
- A survey of active learning for natural language processing. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 6166–6190. Association for Computational Linguistics, 2022c. doi:10.18653/V1/2022.EMNLP-MAIN.414. URL https://doi.org/10.18653/v1/2022.emnlp-main.414.
- Markus Bayer (8 papers)
- Christian Reuter (14 papers)