Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

156 tokens/sec

GPT-4o

7 tokens/sec

Gemini 2.5 Pro Pro

45 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

311 1

Towards Modular LLMs by Building and Reusing a Library of LoRAs (2405.11157v1)

Published 18 May 2024 in cs.LG and cs.CL

Abstract: The growing number of parameter-efficient adaptations of a base LLM calls for studying whether we can reuse such trained adapters to improve performance for new tasks. We study how to best build a library of adapters given multi-task data and devise techniques for both zero-shot and supervised task generalization through routing in such library. We benchmark existing approaches to build this library and introduce model-based clustering, MBC, a method that groups tasks based on the similarity of their adapter parameters, indirectly optimizing for transfer across the multi-task dataset. To re-use the library, we present a novel zero-shot routing mechanism, Arrow, which enables dynamic selection of the most relevant adapters for new inputs without the need for retraining. We experiment with several LLMs, such as Phi-2 and Mistral, on a wide array of held-out tasks, verifying that MBC-based adapters and Arrow routing lead to superior generalization to new tasks. We make steps towards creating modular, adaptable LLMs that can match or outperform traditional joint training.

References (98)

Citations (14)

View on Semantic Scholar

Summary

The paper introduces a novel Model-Based Clustering method to group task-specific LoRA adapters, efficiently compressing information from multiple tasks.
It explores various routing strategies, including the innovative zero-shot Arrow Routing, to dynamically select the most relevant adapters for new tasks.
Experimental results demonstrate that the modular LoRA library outperforms standard methods in both zero-shot and supervised scenarios.

Towards Modular LLMs by Building and Reusing a Library of LoRAs

Introduction

Parameter-efficient fine-tuning (PEFT) techniques like the Low-Rank Adaptation (LoRA) method have made it simpler to adapt LLMs to a wide variety of tasks. Imagine having not just one or two but hundreds of such adaptations readily available. This paper explores how we can effectively build and leverage a library of these LoRA adapters to make LLMs more modular, adaptable, and generally proficient at handling unseen tasks.

Building the LoRA Library

The core idea presented involves creating a library of task-specific adapters that enable the LLM to perform well on both seen and unseen tasks. The main methods proposed to build this library include:

Private Adapters: Each adapter is trained individually on a specific task. This method works well in decentralized setups but doesn't leverage multi-task learning.
Shared Adapter: A single adapter is trained on data from all tasks combined, promoting task transfer but risking negative transfer due to task interference.
Poly/MHR Adapters: These methods train a set of 'basis' adapters on multi-task data and then expand them into the final adapters for each task through linear combinations.

The novel contribution here is Model-Based Clustering (MBC), a two-stage approach:

Train individual LoRA adapters for each task for a specified initial number of steps.
Cluster these adapters based on the similarity of their weights, then train one adapter per cluster for the remaining steps.

This method allows for transferring useful information between similar tasks, effectively compressing the information from a large set of tasks into fewer, more generalized adapters.

Reusing the LoRA Library

Once a library of adapters is built, selecting and reusing the right adapters for new tasks is crucial. The authors explore several routing strategies for this purpose:

μ Routing: All adapters are equally weighted, essentially averaging their outputs.
Task Predictor (TP) Routing: A classifier is trained to predict the relevant task for a given input, guiding which adapters to use.
Centroid Matching (CM) Routing: Each adapter has a prototype representation obtained from its training data, and the adapters are selected based on the similarity of this prototype to the given input.
Arrow Routing: This novel zero-shot method uses Singular Value Decomposition (SVD) to find the most relevant adapters based on the direction of maximum variance induced by their parameters.

Arrow Routing offers a lightweight, efficient way to dynamically select the best-suited adapters without requiring access to the training data, thus fitting well into decentralized, asynchronous learning environments.

Experimental Results

The experimental evaluation spans both zero-shot and supervised learning scenarios on LLMs like Phi-2 and Mistral. The results show:

Zero-Shot Performance: Libraries built using MBC consistently outperformed other methods, with Arrow Routing enhancing the performance further, especially when used with a large library of adapters.
Supervised Adaptation: MBC combined with Poly routing achieves the best results, showing that both effective clustering and routing are key to leveraging the full potential of the adapter library.

Implications and Future Work

This research paves the way for more scalable and flexible usage of LLMs, leveraging decentralized and collaborative model training. By efficiently combining and routing through a myriad of pre-trained adapters, models can be made more adaptable to new tasks without retraining from scratch.

Potential future directions include:

Extending the techniques to other types of adapters beyond LoRA.
Scaling the approach to larger models and more diverse datasets.
Investigating usage in continuous learning settings where new tasks continuously emerge.

By focusing on modular and parameter-efficient executions, this approach could significantly reduce the computational footprint and increase accessibility for smaller research groups or applications with constrained resources.

Conclusion

This paper makes significant strides in making LLMs more modular and adaptable through the innovative use of a library of LoRA adapters. By addressing both the construction and utilization of such a library, and introducing methods like MBC and Arrow Routing, it demonstrates how we can push the boundaries of efficient, flexible, and robust multi-task learning. The future looks promising for more scalable and collaborative advancements in LLMs.

PDF Markdown

Tweets

https://twitter.com/_akhaliq/status/1792750856728670464

https://twitter.com/s_scardapane/status/1801190602689642595

https://twitter.com/fly51fly/status/1794709451372740850

https://twitter.com/LucasPCaccia/status/1844446505794756760

https://twitter.com/murefil/status/1815092694710104143

https://twitter.com/McDonaghTech/status/1794741638540984696

YouTube

Show All Videos