Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fine-tuning Strategies for Domain Specific Question Answering under Low Annotation Budget Constraints (2401.09168v1)

Published 17 Jan 2024 in cs.CL

Abstract: The progress introduced by pre-trained LLMs and their fine-tuning has resulted in significant improvements in most downstream NLP tasks. The unsupervised training of a LLM combined with further target task fine-tuning has become the standard QA fine-tuning procedure. In this work, we demonstrate that this strategy is sub-optimal for fine-tuning QA models, especially under a low QA annotation budget, which is a usual setting in practice due to the extractive QA labeling cost. We draw our conclusions by conducting an exhaustive analysis of the performance of the alternatives of the sequential fine-tuning strategy on different QA datasets. Based on the experiments performed, we observed that the best strategy to fine-tune the QA model in low-budget settings is taking a pre-trained LLM (PLM) and then fine-tuning PLM with a dataset composed of the target dataset and SQuAD dataset. With zero extra annotation effort, the best strategy outperforms the standard strategy by 2.28% to 6.48%. Our experiments provide one of the first investigations on how to best fine-tune a QA system under a low budget and are therefore of the utmost practical interest to the QA practitioners.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).   Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 4171–4186. [Online]. Available: https://aclanthology.org/N19-1423
  2. Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” 2019.
  3. C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” arXiv preprint arXiv:1910.10683, 2019.
  4. T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” arXiv preprint arXiv:2005.14165, 2020.
  5. R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill et al., “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258, 2021.
  6. S. González-Carvajal and E. C. Garrido-Merchán, “Comparing bert against traditional machine learning text classification,” arXiv preprint arXiv:2005.13012, 2020.
  7. D. Miller, “Leveraging bert for extractive text summarization on lectures,” arXiv preprint arXiv:1906.04165, 2019.
  8. W. Yang, Y. Xie, A. Lin, X. Li, L. Tan, K. Xiong, M. Li, and J. Lin, “End-to-end open-domain question answering with bertserini,” arXiv preprint arXiv:1902.01718, 2019.
  9. R. Antonello, N. Beckage, J. Turek, and A. Huth, “Selecting informative contexts improves language model fine-tuning,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers).   Online: Association for Computational Linguistics, Aug. 2021, pp. 1072–1085. [Online]. Available: https://aclanthology.org/2021.acl-long.87
  10. P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, “Squad: 100,000+ questions for machine comprehension of text,” arXiv preprint arXiv:1606.05250, 2016.
  11. H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al., “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023.
  12. J. Kocoń, I. Cichecki, O. Kaszyca, M. Kochanek, D. Szydło, J. Baran, J. Bielaniewicz, M. Gruza, A. Janz, K. Kanclerz, A. Kocoń, B. Koptyra, W. Mieleszczenko-Kowszewicz, P. Miłkowski, M. Oleksy, M. Piasecki, Łukasz Radliński, K. Wojtasik, S. Woźniak, and P. Kazienko, “Chatgpt: Jack of all trades, master of none,” 2023.
  13. J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang, “Biobert: a pre-trained biomedical language representation model for biomedical text mining,” Bioinformatics, Sep 2019. [Online]. Available: http://dx.doi.org/10.1093/bioinformatics/btz682
  14. D. Q. Nguyen, T. Vu, and A. T. Nguyen, “Bertweet: A pre-trained language model for english tweets,” 2020.
  15. Y. Zhao and S. Bethard, “How does BERT’s attention change when you fine-tune? an analysis methodology and a case study in negation scope,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.   Online: Association for Computational Linguistics, Jul. 2020, pp. 4729–4747. [Online]. Available: https://aclanthology.org/2020.acl-main.429
  16. A. Edwards, J. Camacho-Collados, H. De Ribaupierre, and A. Preece, “Go simple and pre-train on domain-specific corpora: On the role of training data for text classification,” in Proceedings of the 28th International Conference on Computational Linguistics.   Barcelona, Spain (Online): International Committee on Computational Linguistics, Dec. 2020, pp. 5522–5529. [Online]. Available: https://aclanthology.org/2020.coling-main.481
  17. B. Kratzwald, G. Kunpeng, S. Feuerriegel, and D. Diefenbach, “Intkb: A verifiable interactive framework for knowledge base completion,” in Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8-13, 2020, D. Scott, N. Bel, and C. Zong, Eds.   International Committee on Computational Linguistics, 2020, pp. 5591–5603. [Online]. Available: https://doi.org/10.18653/v1/2020.coling-main.490
  18. T. Möller, A. Reina, R. Jayakumar, and M. Pietsch, “COVID-QA: A question answering dataset for COVID-19,” in Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020.   Online: Association for Computational Linguistics, Jul. 2020. [Online]. Available: https://aclanthology.org/2020.nlpcovid19-acl.18
  19. A. Merchant, E. Rahimtoroghi, E. Pavlick, and I. Tenney, “What happens to BERT embeddings during fine-tuning?” in Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP.   Online: Association for Computational Linguistics, Nov. 2020, pp. 33–44. [Online]. Available: https://aclanthology.org/2020.blackboxnlp-1.4
  20. B. Cui, Y. Li, M. Chen, and Z. Zhang, “Fine-tune BERT with sparse self-attention mechanism,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).   Hong Kong, China: Association for Computational Linguistics, Nov. 2019, pp. 3548–3553. [Online]. Available: https://www.aclweb.org/anthology/D19-1361
  21. D. Khashabi, S. Min, T. Khot, A. Sabharwal, O. Tafjord, P. Clark, and H. Hajishirzi, “Unifiedqa: Crossing format boundaries with a single qa system,” 2020.
  22. B. Kratzwald and S. Feuerriegel, “Putting question-answering systems into practice: Transfer learning for efficient domain customization,” ACM Transactions on Management Information Systems, vol. 9, no. 4, pp. 15:1–15:20, 2019. [Online]. Available: http://doi.acm.org/10.1145/3309706
  23. H. Elsahar and M. Gallé, “To annotate or not? predicting performance drop under domain shift,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).   Hong Kong, China: Association for Computational Linguistics, Nov. 2019, pp. 2163–2173. [Online]. Available: https://aclanthology.org/D19-1222
  24. L. Mou, Z. Meng, R. Yan, G. Li, Y. Xu, L. Zhang, and Z. Jin, “How transferable are neural networks in nlp applications?” 2016.
  25. Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in Proceedings of the 26th Annual International Conference on Machine Learning, ser. ICML ’09.   New York, NY, USA: Association for Computing Machinery, 2009, p. 41–48. [Online]. Available: https://doi.org/10.1145/1553374.1553380
  26. K. M. Hermann, T. Kočiský, E. Grefenstette, L. Espeholt, W. Kay, M. Suleyman, and P. Blunsom, “Teaching machines to read and comprehend,” 2015.
  27. F. Hill, A. Bordes, S. Chopra, and J. Weston, “The goldilocks principle: Reading children’s books with explicit memory representations,” 2016.
  28. D. Hendrycks, C. Burns, A. Chen, and S. Ball, “Cuad: An expert-annotated nlp dataset for legal contract review,” 2021.
  29. A. Saha, R. Aralikatte, M. M. Khapra, and K. Sankaranarayanan, “Duorc: Towards complex language understanding with paraphrased reading comprehension,” 2018.
  30. A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “Glue: A multi-task benchmark and analysis platform for natural language understanding,” 2019.
  31. G. Lai, Q. Xie, H. Liu, Y. Yang, and E. Hovy, “Race: Large-scale reading comprehension dataset from examinations,” 2017.
  32. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” 2019.
  33. T. J. Hazen, S. Dhuliawala, and D. Boies, “Towards domain adaptation from limited data for question answering using deep neural networks,” arXiv preprint arXiv:1911.02655, 2019.
  34. E. J. Hu, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen et al., “Lora: Low-rank adaptation of large language models,” in International Conference on Learning Representations.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Kunpeng Guo (3 papers)
  2. Dennis Diefenbach (6 papers)
  3. Antoine Gourru (12 papers)
  4. Christophe Gravier (15 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets