Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild (2402.09997v1)

Published 15 Feb 2024 in cs.AI, cs.CL, and cs.LG

Abstract: Low-Rank Adaptation (LoRA) provides an effective yet efficient solution for fine-tuning LLMs (LLM). The modular and plug-and-play nature of LoRA enables the integration of diverse domain-specific LoRAs to enhance the capabilities of LLMs. Previous research on exploiting multiple LoRAs either focuses on specific isolated downstream tasks or fixes the selection of LoRAs during training. However, in real-world scenarios, LLMs receive diverse prompts covering different tasks, and the pool of candidate LoRAs is often dynamically updated. To bridge this gap, we propose LoraRetriever, a retrieve-then-compose framework that adaptively retrieves and composes multiple LoRAs according to the input prompts. LoraRetriever contains three main components: firstly, identifying and retrieving LoRAs relevant to the given input; secondly, formulating strategies for effectively integrating the retrieved LoRAs; and thirdly, developing efficient batch inference to accommodate heterogeneous requests. Experimental results indicate that LoraRetriever consistently outperforms the baselines, highlighting its practical effectiveness and versatility.

LoraRetriever, as proposed in the paper "LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild," aims to enhance the adaptiveness and efficiency of Low-Rank Adaptation (LoRA) for fine-tuning LLMs in diverse real-world scenarios (Zhao et al., 15 Feb 2024 ). LoRA enables modular adaptations to LLMs by incorporating domain-specific submodules. However, the current utilization of multiple LoRA modules often focuses on isolated tasks or static compositions, which limits their adaptability to the dynamic nature of real-world tasks and prompts.

The LoraRetriever framework addresses this limitation by employing a retrieve-then-compose approach that dynamically selects and integrates LoRA modules based on the input prompts. This process consists of three key stages:

  1. Identifying and Retrieving Relevant LoRA Modules: The system first determines which LoRA modules are most pertinent to the given input.
  2. Formulating Integration Strategies: It then devises strategies to effectively combine the retrieved LoRA modules to enhance the LLM's performance on the specific input.
  3. Developing Efficient Batch Inference: Finally, it accommodates heterogeneous requests through efficient batch processing.

Experimental results indicate that LoraRetriever consistently outperforms baseline models, demonstrating its practical effectiveness and versatility in managing mixed tasks in dynamic environments.

In relation to LoraRetriever, another system worth noting is LoraHub, which also focuses on the composition of LoRA modules for cross-task generalization. LoraHub allows fluid combination of LoRA modules trained on various tasks to achieve improved performance on unseen tasks without requiring additional parameters or gradients (Huang et al., 2023 ). This system highlights the potential for creating a shared ecosystem of LoRA modules that can be applied to novel tasks, facilitating broader adaptability and user collaboration.

Furthermore, frameworks like DoRA, which decompose the weight updates during fine-tuning into magnitude and direction, aim to bridge the accuracy gap between full fine-tuning and LoRA-based methods by enhancing the learning capacity and stability of LoRA adaptations (Wang et al., 14 Feb 2024 ).

Overall, LoraRetriever and related methodologies like LoraHub and DoRA represent significant advancements in making LLMs more adaptable and efficient for a wide range of dynamically changing tasks and prompts.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Ziyu Zhao (28 papers)
  2. Leilei Gan (21 papers)
  3. Guoyin Wang (108 papers)
  4. Wangchunshu Zhou (73 papers)
  5. Hongxia Yang (130 papers)
  6. Kun Kuang (114 papers)
  7. Fei Wu (317 papers)
Citations (16)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets