Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Inducing Generalization across Languages and Tasks using Featurized Low-Rank Mixtures (2402.17934v2)

Published 27 Feb 2024 in cs.CL and cs.AI

Abstract: Adapting pretrained LLMs to various downstream tasks in tens or hundreds of human languages is computationally expensive. Parameter-efficient fine-tuning (PEFT) significantly reduces the adaptation cost, by tuning only a small amount of parameters. However, common PEFT methods LoRA (Hu et al., 2022) suffer from suboptimal performance on diverse dataset mixtures, due to aggressive parameter tying and negative interference among different datasets. In this work, we propose Featurized Low-rank Mixtures (FLix), a novel PEFT method designed for effective multitask multilingual adaptation. FLix associates each unique dataset feature, such as the dataset's language or task, with its own low-rank weight update parameters. By composing feature-specific parameters for each dataset, FLix can accommodate diverse dataset mixtures and generalize better to unseen datasets. Our experiments show that FLix leads to significant improvements over a variety of tasks for both supervised learning and zero-shot settings with gains of up to $14.2$ inexact match points in zero-shot semantic parsing.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Chu-Cheng Lin (13 papers)
  2. Xinyi Wang (152 papers)
  3. Jonathan H. Clark (17 papers)
  4. Han Lu (32 papers)
  5. Yun Zhu (52 papers)
  6. Chenxi Whitehouse (17 papers)
  7. Hongkun Yu (17 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets