Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CA-LoRA: Adapting Existing LoRA for Compressed LLMs to Enable Efficient Multi-Tasking on Personal Devices (2307.07705v3)

Published 15 Jul 2023 in cs.CL

Abstract: Recently, there has been a demand to deploy LLMs on personal devices such as laptops and smartphones. These LLMs have different model variants when handling different tasks. However, personal devices have limited resources and require reduced storage overhead. To address this, there are two key methods available: the first is model compression, which compresses LLMs into smaller sizes; the second is LoRA, which can transfer an LLM to other tasks with very few parameters, avoiding the storage of multiple model variants in multi-task scenarios by only preserving LoRAs. However, our experiments show that directly combining these two methods yields sub-optimal performance. Considering that the open-source community has already contributed many LoRAs to LLMs, we propose to adapt these existing LoRAs from the LLMs to their compressed version and introduce a Compression-Aware LoRA (CA-LoRA) framework. We incorporate knowledge inheritance and recovery strategies to recover the lost knowledge caused by model compression. Experiment results demonstrate that CA-LoRA outperforms the vanilla LoRA methods applied to a compressed LLM and achieves comparable performance to the non-compressed LLM with existing LoRA modules. The source code of CA-LoRA is available at https://github.com/thunlp/CA-LoRA.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Weilin Zhao (22 papers)
  2. Yuxiang Huang (17 papers)
  3. Xu Han (270 papers)
  4. Zhiyuan Liu (433 papers)
  5. Zhengyan Zhang (46 papers)
  6. Maosong Sun (337 papers)
  7. Kuai Li (4 papers)
  8. Chen Chen (753 papers)
  9. Tao Yang (520 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.