Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

STAR: Constraint LoRA with Dynamic Active Learning for Data-Efficient Fine-Tuning of Large Language Models (2403.01165v2)

Published 2 Mar 2024 in cs.CL and cs.AI

Abstract: Though LLMs have demonstrated the powerful capabilities of few-shot learning through prompting methods, supervised training is still necessary for complex reasoning tasks. Because of their extensive parameters and memory consumption, both Parameter-Efficient Fine-Tuning (PEFT) methods and Memory-Efficient Fine-Tuning methods have been proposed for LLMs. Nevertheless, the issue of large annotated data consumption, the aim of Data-Efficient Fine-Tuning, remains unexplored. One obvious way is to combine the PEFT method with active learning. However, the experimental results show that such a combination is not trivial and yields inferior results. Through probe experiments, such observation might be explained by two main reasons: uncertainty gap and poor model calibration. Therefore, in this paper, we propose a novel approach to effectively integrate uncertainty-based active learning and LoRA. Specifically, for the uncertainty gap, we introduce a dynamic uncertainty measurement that combines the uncertainty of the base model and the uncertainty of the full model during the iteration of active learning. For poor model calibration, we incorporate the regularization method during LoRA training to keep the model from being over-confident, and the Monte-Carlo dropout mechanism is employed to enhance the uncertainty estimation. Experimental results show that the proposed approach outperforms existing baseline models on three complex reasoning tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7319–7328, Online. Association for Computational Linguistics.
  2. Constituency parsing using llms. arXiv preprint arXiv:2310.19462.
  3. On the opportunities and risks of foundation models. ArXiv.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  5. BoolQ: Exploring the surprising difficulty of natural yes/no questions. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2924–2936, Minneapolis, Minnesota. Association for Computational Linguistics.
  6. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  7. Active learning with statistical models. Journal of artificial intelligence research, 4:129–145.
  8. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
  9. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  10. Active prompting with chain-of-thought for large language models. arXiv preprint arXiv:2302.12246.
  11. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5(3):220–235.
  12. Chi Dung Doan and Shie-yui Liong. 2004. Generalization for multilayer neural network bayesian regularization or early stopping. In Proceedings of Asia Pacific association of hydrology and water resources 2nd conference, pages 5–8.
  13. A survey for in-context learning. arXiv preprint arXiv:2301.00234.
  14. Shifting attention to relevance: Towards the uncertainty estimation of large language models. arXiv preprint arXiv:2307.01379.
  15. Active Learning for BERT: An Empirical Study. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7949–7962, Online. Association for Computational Linguistics.
  16. Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059. PMLR.
  17. Ensemble deep learning: A review. Engineering Applications of Artificial Intelligence, 115:105151.
  18. Daniel Gissin and Shai Shalev-Shwartz. 2019. Discriminative active learning. arXiv preprint arXiv:1907.06347.
  19. Towards a unified view of parameter-efficient transfer learning. In International Conference on Learning Representations.
  20. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300.
  21. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR.
  22. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  23. LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5254–5276, Singapore. Association for Computational Linguistics.
  24. Scaling laws for downstream task performance of large language models. arXiv preprint arXiv:2402.04177.
  25. Josip Jukić and Jan Snajder. 2023a. Parameter-efficient language model tuning with active learning in low-resource settings. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5061–5074, Singapore. Association for Computational Linguistics.
  26. Josip Jukić and Jan Snajder. 2023b. Smooth sailing: Improving active learning for pre-trained language models with representation smoothness analysis. In Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD), pages 11–24, Gothenburg, Sweden. Association for Computational Linguistics.
  27. Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221.
  28. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  29. David D Lewis. 1995. A sequential algorithm for training text classifiers: Corrigendum and additional data. In Acm Sigir Forum, volume 29, pages 13–19. ACM New York, NY, USA.
  30. Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, Online. Association for Computational Linguistics.
  31. Loftq: Lora-fine-tuning-aware quantization for large language models. arXiv preprint arXiv:2310.08659.
  32. Make pre-trained model reversible: From parameter to memory efficient fine-tuning. In Thirty-seventh Conference on Neural Information Processing Systems.
  33. What makes good data for alignment? a comprehensive study of automatic data selection in instruction tuning. arXiv preprint arXiv:2312.15685.
  34. P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 61–68, Dublin, Ireland. Association for Computational Linguistics.
  35. Gpt understands, too. AI Open.
  36. Ilya Loshchilov and Frank Hutter. 2018. Decoupled weight decay regularization. In International Conference on Learning Representations.
  37. On the importance of effectively adapting pretrained language models for active learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 825–836, Dublin, Ireland. Association for Computational Linguistics.
  38. Active learning principles for in-context learning with large language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5011–5034, Singapore. Association for Computational Linguistics.
  39. Can a suit of armor conduct electricity? a new dataset for open book question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2381–2391, Brussels, Belgium. Association for Computational Linguistics.
  40. Selecting syntactic, non-redundant segments in active learning for machine translation. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 20–29, San Diego, California. Association for Computational Linguistics.
  41. OpenAI. 2022. Introducing ChatGPT.
  42. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  43. Subhro Roy and Dan Roth. 2016. Solving general arithmetic word problems. arXiv preprint arXiv:1608.01413.
  44. Claudio Filipi Gonçalves Dos Santos and João Paulo Papa. 2022. Avoiding overfitting: A survey on regularization methods for convolutional neural networks. ACM Computing Surveys (CSUR), 54(10s):1–25.
  45. Small-text: Active learning for text classification in python. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 84–95, Dubrovnik, Croatia. Association for Computational Linguistics.
  46. Revisiting uncertainty-based query strategies for active learning with transformers. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2194–2203, Dublin, Ireland. Association for Computational Linguistics.
  47. Ozan Sener and Silvio Savarese. 2018. Active learning for convolutional neural networks: A core-set approach. In International Conference on Learning Representations.
  48. Burr Settles. 2009. Active learning literature survey.
  49. Deep active learning for named entity recognition. In Proceedings of the 2nd Workshop on Representation Learning for NLP, pages 252–256, Vancouver, Canada. Association for Computational Linguistics.
  50. Investigating multi-source active learning for natural language inference. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2187–2209, Dubrovnik, Croatia. Association for Computational Linguistics.
  51. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  52. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  53. Efficient large language models: A survey. arXiv preprint arXiv:2312.03863, 1.
  54. Lora ensembles for large language model fine-tuning. arXiv preprint arXiv:2310.00035.
  55. Finetuned language models are zero-shot learners. In International Conference on Learning Representations.
  56. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  57. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  58. Qa-lora: Quantization-aware low-rank adaptation of large language models. arXiv preprint arXiv:2309.14717.
  59. Scaling relationship on learning mathematical reasoning with large language models. arXiv preprint arXiv:2308.01825.
  60. Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199.
  61. Llmaaa: Making large language models as active annotators. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 13088–13103.
  62. A survey of large language models. arXiv preprint arXiv:2303.18223.
  63. Active learning approaches to enhancing neural machine translation. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1796–1806, Online. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Linhai Zhang (12 papers)
  2. Jialong Wu (36 papers)
  3. Deyu Zhou (42 papers)
  4. Guoqiang Xu (20 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com