LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models (2304.01933v3)

Published 4 Apr 2023 in cs.CL

Abstract: The success of LLMs, like GPT-4 and ChatGPT, has led to the development of numerous cost-effective and accessible alternatives that are created by finetuning open-access LLMs with task-specific data (e.g., ChatDoctor) or instruction data (e.g., Alpaca). Among the various fine-tuning methods, adapter-based parameter-efficient fine-tuning (PEFT) is undoubtedly one of the most attractive topics, as it only requires fine-tuning a few external parameters instead of the entire LLMs while achieving comparable or even better performance. To enable further research on PEFT methods of LLMs, this paper presents LLM-Adapters, an easy-to-use framework that integrates various adapters into LLMs and can execute these adapter-based PEFT methods of LLMs for different tasks. The framework includes state-of-the-art open-access LLMs such as LLaMA, BLOOM, and GPT-J, as well as widely used adapters such as Series adapters, Parallel adapter, Prompt-based learning and Reparametrization-based methods. Moreover, we conduct extensive empirical studies on the impact of adapter types, placement locations, and hyper-parameters to the best design for each adapter-based methods. We evaluate the effectiveness of the adapters on fourteen datasets from two different reasoning tasks, Arithmetic Reasoning and Commonsense Reasoning. The results demonstrate that using adapter-based PEFT in smaller-scale LLMs (7B) with few extra trainable parameters yields comparable, and in some cases superior, performance to powerful LLMs (175B) in zero-shot inference on both reasoning tasks.

PDF HTML Abstract

LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of LLMs

The paper "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of LLMs" presents a structured approach to fine-tuning LLMs efficiently using adapter-based methods. With the burgeoning success of models like GPT-4 and ChatGPT, this paper offers cost-effective alternatives which integrate various adapter techniques into different LLMs. The methodology revolves around parameter-efficient fine-tuning (PEFT), which requires tuning only a few external parameters as opposed to the entire model. This approach not only provides computational efficiency but also demonstrates competitive or superior performance compared to full-model fine-tuning.

The research encompasses a comprehensive empirical paper on three notable open-source LLMs, namely LLaMA, BLOOM, and GPT-J, exploring various adapters like Series adapters, Parallel adapters, Prompt-based learning, and Reparametrization-based methods. The paper navigates through adapter types, their specific placement within model layers, and fine-tuning hyperparameters to delineate the best designs for different adapter-based methods.

A significant contribution of this work is the LLM-Adapters framework, which facilitates the execution of these methods on different tasks using diverse datasets. The authors focus on two reasoning tasks, Arithmetic Reasoning and Commonsense Reasoning, utilizing fourteen datasets. Crucially, the results illustrate that smaller-scale LLMs (e.g., 7B parameters) employing PEFT can achieve comparable and occasionally superior zero-shot inference performance vis-a-vis significantly larger LLMs (175B parameters) in both reasoning tasks.

The core findings of the paper include:

Ideal placement configurations for adapters, such as post-MLP layers for Series adapters, parallel to MLP layers for Parallel adapters, and after both Attention and MLP layers for LoRA.
Smaller LLMs with PEFT can achieve competitive or superior performance in certain tasks compared to larger LLMs, evidenced by the LLaMA-13B outperforming GPT-3.5 on MultiArith, AddSub, and SingleEq.
In-distribution fine-tuning results indicate smaller models can outperform larger models like ChatGPT on commonsense reasoning tasks, highlighting the potential of smaller models when fine-tuned with task-specific data.

Furthermore, the authors offer two high-quality training datasets—Math10K for math reasoning and Commonsense170K for commonsense reasoning—to enhance the PEFT performance. These datasets are intended to facilitate further research in this domain. The implications of this paper are considerable, pointing towards the continued optimization of LLMs in computational resource-constrained environments, making these models accessible to a wider audience.

This research opens pathways for future exploration in parameter-efficient tuning methods given the reduction in computational and storage requirements, which are often prohibitive in the deployment of full LLM fine-tuning. Moreover, the analysis of PEFT vs. full-model fine-tuning across various tasks and LLMs sets a foundation for further optimizing configurations that strike a balance between model complexity, resource usage, and task performance. By advancing the capability of smaller models to achieve results on par with larger models, the paper positions PEFT as a pivotal area for ongoing AI research and application development.

PDF Markdown Bookmark Chat (Pro)

References (44)

Authors (9)

Zhiqiang Hu (48 papers)
Lei Wang (975 papers)
Yihuai Lan (8 papers)
Wanyu Xu (4 papers)
Ee-Peng Lim (57 papers)
Lidong Bing (144 papers)
Xing Xu (48 papers)
Soujanya Poria (138 papers)
Roy Ka-Wei Lee (68 papers)

Citations (171)

View on Semantic Scholar

LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models (2304.01933v3)

LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of LLMs

Related Papers

Tweets