Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models (2406.08903v3)

Published 13 Jun 2024 in cs.CL

Abstract: Fine-tuning is a crucial process for adapting LLMs to diverse applications. In certain scenarios, such as multi-tenant serving, deploying multiple LLMs becomes necessary to meet complex demands. Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs. In this work, we observe that existing low-rank and low-bit compression methods can significantly harm the model performance for task-specific fine-tuned LLMs (e.g., WizardMath for math problems). Motivated by the long-tail distribution of singular values in the delta weights, we propose a delta quantization approach using mixed-precision. This method employs higher-bit representation for singular vectors corresponding to larger singular values. We evaluate our approach on various fine-tuned LLMs, including math LLMs, code LLMs, chat LLMs, and even VLMs. Experimental results demonstrate that our approach performs comparably to full fine-tuned LLMs, surpassing both low-rank and low-bit baselines by a considerable margin. Additionally, we show that our method is compatible with various backbone LLMs, such as Llama-2, Llama-3, and Mistral, highlighting its generalizability.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Bowen Ping (5 papers)
  2. Shuo Wang (382 papers)
  3. Hanqing Wang (32 papers)
  4. Xu Han (270 papers)
  5. Yuzhuang Xu (12 papers)
  6. Yukun Yan (39 papers)
  7. Yun Chen (134 papers)
  8. Baobao Chang (80 papers)
  9. Zhiyuan Liu (433 papers)
  10. Maosong Sun (337 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com