SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression (2403.07378v5)

Published 12 Mar 2024 in cs.CL and cs.LG

Abstract: The advancements in LLMs have been hindered by their substantial sizes, which necessitates LLM compression methods for practical deployment. Singular Value Decomposition (SVD) offers a promising solution for LLM compression. However, state-of-the-art SVD-based LLM compression methods have two key limitations: truncating smaller singular values may lead to higher compression loss, and the lack of update on the compressed weights after SVD truncation. In this work, we propose SVD-LLM, a SVD-based post-training LLM compression method that addresses the limitations of existing methods. SVD-LLM incorporates a truncation-aware data whitening technique to ensure a direct mapping between singular values and compression loss. Moreover, SVD-LLM adopts a parameter update with sequential low-rank approximation to compensate for the accuracy degradation after SVD compression. We evaluate SVD-LLM on 10 datasets and seven models from three different LLM families at three different scales. Our results demonstrate the superiority of SVD-LLM over state-of-the-arts, especially at high model compression ratios. Our code is available at https://github.com/AIoT-MLSys-Lab/SVD-LLM

References (31)

Citations (23)

View on Semantic Scholar

Summary

The paper presents SVD-LLM, which leverages truncation-aware SVD with data whitening to accurately discard singular values while managing compression loss.
It updates the left singular vectors layer-wise post-truncation to maintain performance even with high compression ratios.
Experiments show SVD-LLM reduces perplexity by up to 99% and compresses models up to ten times faster than previous methods.

SVD-LLM: Truncation-aware Singular Value Decomposition for LLM Compression

The paper addresses the pressing need for efficient compression techniques tailored for LLMs, which, despite their impressive capabilities, present significant deployment challenges due to their size and computational demands. Singular Value Decomposition (SVD)-based methods emerge as a promising approach to these challenges, offering lightweight alternatives to more resource-intensive methodologies like quantization or pruning.

Methodology and Contributions

The authors propose a novel compression method termed SVD-LLM, which focuses on resolving existing limitations in SVD-based LLM compression methods, particularly ASVD and FWSVD. These existing methods either inadequately address the relationship between singular value magnitudes and compression loss or neglect the importance of parameter updates post-truncation. The significant innovations presented in SVD-LLM include:

Truncation-Aware Data Whitening:
- This method involves preprocessing activation data through Cholesky decomposition to ensure orthogonality in the input channels. Such preprocessing allows for a straightforward determination of which singular values can be discarded with minimal impact on compression error. The intuitive guiding principle is that each singular value equates directly to a quantifiable portion of compression loss.
Layer-Wise Closed-Form Model Parameter Update:
- Post-truncation of singular values, the authors propose updating only the left singular vectors in a way that both respects the low-rank approximation of the original matrix and compensates for the performance drop typically seen with high compression ratios. This layer-wise approach allows for fine-tuning at individual layers instead of holistic model adjustments, which is efficient and computationally manageable.

Experimental Evaluation

SVD-LLM's effectiveness is demonstrated extensively using a series of benchmarks and scenarios involving eight different models from LLaMA, OPT, and Mistral families. The experimentation undertakes various compression ratios ranging from 20% to 60%, with SVD-LLM consistently surpassing baseline methods (SVD, FWSVD, ASVD) in performance. The highlights include:

Perplexity Reduction: Achieving a 99% reduction in perplexity relative to conventional SVD methods in scenarios with higher compression ratios—indicative of significant performance retention.
Computational Efficiency: The proposed method is significantly quicker, demonstrating a compression process tenfold more efficient than previous approaches like ASVD. Particularly it compresses LLaMA-7B in just 15 minutes compared to ASVD's 5.5 hours.
Scalability: SVD-LLM shows prowess not only on smaller 7B models but also on larger scales, including 13B, 30B, and 65B variants, demonstrating broad applicability across model sizes.

Implications and Future Directions

The practical implications of SVD-LLM are clear: it offers a scalable, efficient solution for LLM compression that significantly facilitates deploying these models on resource-constrained environments. Such compression can democratize access to AI capabilities by enabling LLM deployment on edge devices or personal computers without the accompanying prohibitive computational costs.

From a theoretical perspective, SVD-LLM provides insights into how structured matrix decompositions can be leveraged beyond traditional linear algebra applications, opening new avenues in model optimization and compression techniques.

Looking ahead, SVD-LLM could serve as a foundation for refining additional model compression techniques, possibly synergizing with existing quantization or pruning strategies to push the envelope in terms of performance and efficiency further. Investigating such hybrid methodologies might be fertile ground for future research, especially in contexts where both speed and precision are paramount.

Overall, SVD-LLM makes significant strides in LLM compression, setting a new direction for research and application in this area.

PDF Markdown

Related Papers

GitHub

GitHub - AIoT-MLSys-Lab/SVD-LLM: Official Code for "SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression" (228 stars)