Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models (2304.01933v3)

Published 4 Apr 2023 in cs.CL

Abstract: The success of LLMs, like GPT-4 and ChatGPT, has led to the development of numerous cost-effective and accessible alternatives that are created by finetuning open-access LLMs with task-specific data (e.g., ChatDoctor) or instruction data (e.g., Alpaca). Among the various fine-tuning methods, adapter-based parameter-efficient fine-tuning (PEFT) is undoubtedly one of the most attractive topics, as it only requires fine-tuning a few external parameters instead of the entire LLMs while achieving comparable or even better performance. To enable further research on PEFT methods of LLMs, this paper presents LLM-Adapters, an easy-to-use framework that integrates various adapters into LLMs and can execute these adapter-based PEFT methods of LLMs for different tasks. The framework includes state-of-the-art open-access LLMs such as LLaMA, BLOOM, and GPT-J, as well as widely used adapters such as Series adapters, Parallel adapter, Prompt-based learning and Reparametrization-based methods. Moreover, we conduct extensive empirical studies on the impact of adapter types, placement locations, and hyper-parameters to the best design for each adapter-based methods. We evaluate the effectiveness of the adapters on fourteen datasets from two different reasoning tasks, Arithmetic Reasoning and Commonsense Reasoning. The results demonstrate that using adapter-based PEFT in smaller-scale LLMs (7B) with few extra trainable parameters yields comparable, and in some cases superior, performance to powerful LLMs (175B) in zero-shot inference on both reasoning tasks.

LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of LLMs

The paper "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of LLMs" presents a structured approach to fine-tuning LLMs efficiently using adapter-based methods. With the burgeoning success of models like GPT-4 and ChatGPT, this paper offers cost-effective alternatives which integrate various adapter techniques into different LLMs. The methodology revolves around parameter-efficient fine-tuning (PEFT), which requires tuning only a few external parameters as opposed to the entire model. This approach not only provides computational efficiency but also demonstrates competitive or superior performance compared to full-model fine-tuning.

The research encompasses a comprehensive empirical paper on three notable open-source LLMs, namely LLaMA, BLOOM, and GPT-J, exploring various adapters like Series adapters, Parallel adapters, Prompt-based learning, and Reparametrization-based methods. The paper navigates through adapter types, their specific placement within model layers, and fine-tuning hyperparameters to delineate the best designs for different adapter-based methods.

A significant contribution of this work is the LLM-Adapters framework, which facilitates the execution of these methods on different tasks using diverse datasets. The authors focus on two reasoning tasks, Arithmetic Reasoning and Commonsense Reasoning, utilizing fourteen datasets. Crucially, the results illustrate that smaller-scale LLMs (e.g., 7B parameters) employing PEFT can achieve comparable and occasionally superior zero-shot inference performance vis-a-vis significantly larger LLMs (175B parameters) in both reasoning tasks.

The core findings of the paper include:

  1. Ideal placement configurations for adapters, such as post-MLP layers for Series adapters, parallel to MLP layers for Parallel adapters, and after both Attention and MLP layers for LoRA.
  2. Smaller LLMs with PEFT can achieve competitive or superior performance in certain tasks compared to larger LLMs, evidenced by the LLaMA-13B outperforming GPT-3.5 on MultiArith, AddSub, and SingleEq.
  3. In-distribution fine-tuning results indicate smaller models can outperform larger models like ChatGPT on commonsense reasoning tasks, highlighting the potential of smaller models when fine-tuned with task-specific data.

Furthermore, the authors offer two high-quality training datasets—Math10K for math reasoning and Commonsense170K for commonsense reasoning—to enhance the PEFT performance. These datasets are intended to facilitate further research in this domain. The implications of this paper are considerable, pointing towards the continued optimization of LLMs in computational resource-constrained environments, making these models accessible to a wider audience.

This research opens pathways for future exploration in parameter-efficient tuning methods given the reduction in computational and storage requirements, which are often prohibitive in the deployment of full LLM fine-tuning. Moreover, the analysis of PEFT vs. full-model fine-tuning across various tasks and LLMs sets a foundation for further optimizing configurations that strike a balance between model complexity, resource usage, and task performance. By advancing the capability of smaller models to achieve results on par with larger models, the paper positions PEFT as a pivotal area for ongoing AI research and application development.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. In Annual Meeting of the Association for Computational Linguistics.
  2. Piqa: Reasoning about physical commonsense in natural language. In Thirty-Fourth AAAI Conference on Artificial Intelligence.
  3. Parameter-efficient fine-tuning design spaces. arXiv preprint arXiv:2301.01821.
  4. BoolQ: Exploring the surprising difficulty of natural yes/no questions. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2924–2936, Minneapolis, Minnesota. Association for Computational Linguistics.
  5. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv:1803.05457v1.
  6. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  7. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  8. Krona: Parameter efficient tuning with kronecker adapter. ArXiv, abs/2212.10650.
  9. Learn-to-share: A hardware-friendly transfer learning framework exploiting computation and parameter sharing. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 3469–3479. PMLR.
  10. Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366.
  11. Towards a unified view of parameter-efficient transfer learning. In International Conference on Learning Representations.
  12. SparseAdapter: An easy approach for improving the parameter-efficiency of adapters. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2184–2190, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  13. Compacter: Efficient low-rank hypercomplex adapter layers. In Advances in Neural Information Processing Systems.
  14. Learning to solve arithmetic word problems with verb categorization. In EMNLP, pages 523–533.
  15. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning.
  16. Lora: Low-rank adaptation of large language models. ArXiv, abs/2106.09685.
  17. Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916.
  18. Parsing algebraic word problems into equations. Transactions of the Association for Computational Linguistics, 3:585–597.
  19. MAWPS: A math word problem repository. In Proceedings of NAACL, pages 1152–1157.
  20. The power of scale for parameter-efficient prompt tuning. ArXiv, abs/2104.08691.
  21. Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, Online. Association for Computational Linguistics.
  22. Program induction by rationale generation: Learning to solve and explain algebraic word problems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 158–167.
  23. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft.
  24. Unipelt: A unified framework for parameter-efficient language model tuning. ArXiv, abs/2110.07577.
  25. Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786.
  26. OpenAI. 2022. Introducing chatgpt. https://openai.com/blog/chatgpt.
  27. OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
  28. Are NLP models really able to solve simple math word problems? In Proceedings of NAACL, pages 2080–2094.
  29. Mad-x: An adapter-based framework for multi-task cross-lingual transfer. In Conference on Empirical Methods in Natural Language Processing.
  30. Is chatgpt a general-purpose natural language processing task solver? arXiv preprint arXiv:2302.06476.
  31. Exploring universal intrinsic task subspace via prompt tuning. arXiv e-prints, pages arXiv–2110.
  32. Subhro Roy and Dan Roth. 2016. Solving general arithmetic word problems. arXiv preprint arXiv:1608.01413.
  33. Winogrande: An adversarial winograd schema challenge at scale. Communications of the ACM, 64(9):99–106.
  34. Socialiqa: Commonsense reasoning about social interactions. arXiv preprint arXiv:1904.09728.
  35. Hugginggpt: Solving AI tasks with chatgpt and its friends in huggingface. CoRR, abs/2303.17580.
  36. Lst: Ladder side-tuning for parameter and memory efficient transfer learning. ArXiv, abs/2206.06522.
  37. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  38. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  39. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
  40. Spot: Better frozen model adaptation through soft prompt transfer. arXiv preprint arXiv:2110.07904.
  41. Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
  42. Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models. arXiv preprint arXiv:2305.04091.
  43. Adamix: Mixture-of-adapter for parameter-efficient tuning of large language models. ArXiv, abs/2205.12410.
  44. Chatdoctor: A medical chat model fine-tuned on llama model using medical domain knowledge. arXiv preprint arXiv:2303.14070.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Zhiqiang Hu (48 papers)
  2. Lei Wang (975 papers)
  3. Yihuai Lan (8 papers)
  4. Wanyu Xu (4 papers)
  5. Ee-Peng Lim (57 papers)
  6. Lidong Bing (144 papers)
  7. Xing Xu (48 papers)
  8. Soujanya Poria (138 papers)
  9. Roy Ka-Wei Lee (68 papers)
Citations (171)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets