Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MerA: Merging Pretrained Adapters For Few-Shot Learning (2308.15982v1)

Published 30 Aug 2023 in cs.CL

Abstract: Adapter tuning, which updates only a few parameters, has become a mainstream method for fine-tuning pretrained LLMs to downstream tasks. However, it often yields subpar results in few-shot learning. AdapterFusion, which assembles pretrained adapters using composition layers tailored to specific tasks, is a possible solution but significantly increases trainable parameters and deployment costs. Despite this, our preliminary study reveals that even single adapters can outperform Adapterfusion in few-shot learning, urging us to propose \textbf{\texttt{Merging Pretrained Adapters}} (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion. Extensive experiments on two PLMs demonstrate that MerA achieves substantial improvements compared to both single adapters and AdapterFusion. To further enhance the capacity of MerA, we also introduce a simple yet effective technique, referred to as the "\textit{same-track}" setting, that merges adapters from the same track of pretraining tasks. With the implementation of the "\textit{same-track}" setting, we observe even more impressive gains, surpassing the performance of both full fine-tuning and adapter tuning by a substantial margin, e.g., 3.5\% in MRPC and 5.0\% in MNLI.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Shwai He (23 papers)
  2. Run-Ze Fan (9 papers)
  3. Liang Ding (159 papers)
  4. Li Shen (363 papers)
  5. Tianyi Zhou (172 papers)
  6. Dacheng Tao (829 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.