SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression

Published 16 Mar 2025 in cs.CL | (2503.12340v1)

Abstract: Despite significant advancements, the practical deployment of LLMs is often hampered by their immense sizes, highlighting the need for effective compression techniques. Singular Value Decomposition (SVD) is a promising LLM compression technique. However, existing SVD-based compression methods fall short in reducing truncation losses, leading to less competitive performance in compressed models. In this work, we introduce SVD-LLM V2, a SVD-based LLM compression method that optimizes singular value truncation in SVD compression with two techniques. First, SVD-LLM V2 proposes to use theoretical truncation loss of weight matrices to assign a unique compression ratio to each weight matrix at different layers to accommodate weight redundancy heterogeneity. Second, SVD-LLM V2 proposes loss-optimized weight truncation to ensure that the truncated singular values result in a lower and more stable truncation loss in practice. We evaluate SVD-LLM V2 on ten datasets and five LLMs at various scales. Our results show SVD-LLM V2 outperforms state-of-the-art SVD-based LLM compression methods. Our code is available at https://github.com/AIoT-MLSys-Lab/SVD-LLM

Abstract PDF Upgrade to Chat

Summary

The paper introduces SVD-LLM V2, a novel SVD-based LLM compression technique optimizing singular value truncation through heterogeneous compression ratio allocation and loss-optimized weight truncation.
SVD-LLM V2 enhances efficiency by assigning unique compression ratios per layer based on theoretical truncation loss and using a numerically stable, loss-optimized method for SVD computation.
Experimental evaluations demonstrate SVD-LLM V2 significantly outperforms prior SVD methods, achieving up to a 42% perplexity reduction and a 9% accuracy gain across multiple LLMs and datasets.

Overview of SVD-LLM V2

The paper "SVD-LLM V2: Optimizing Singular Value Truncation for LLM Compression" (2503.12340) addresses the challenge of deploying LLMs by introducing an improved SVD-based compression technique. SVD-LLM V2 enhances existing SVD compression methods by optimizing singular value truncation through heterogeneous compression ratio allocation and loss-optimized weight truncation. The authors demonstrate that SVD-LLM V2 outperforms state-of-the-art SVD-based LLM compression methods across various datasets and LLM scales.

Methodological Details of SVD-LLM V2

SVD-LLM V2 introduces two primary innovations to optimize singular value truncation in SVD-based LLM compression:

Heterogeneous Compression Ratio Allocation: SVD-LLM V2 addresses the limitations of uniform compression ratios by assigning unique compression ratios to each weight matrix at different layers based on theoretical truncation loss. This approach accommodates weight redundancy heterogeneity, leading to more effective compression. The theoretical truncation loss for each weight matrix is calculated separately, allowing for compression ratios tailored to the specific characteristics of each layer.
Loss-optimized Weight Truncation: SVD-LLM V2 employs a method that directly computes the SVD of weighted input matrices, ensuring numerical stability and lower truncation loss. This circumvents the issues of numerical instability and constraints associated with traditional singular value truncation methods, including those employing Cholesky decomposition, which often require positive definiteness. The design aims to align the practical outcomes of truncation with theoretical minima more closely.

Experimental Results

The authors evaluated SVD-LLM V2 on ten datasets and five LLMs at various scales, including LLaMA-7B, LLaMA-13B, LLaMA-30B, and OPT-6.7B. The tasks ranged from language modeling to classification and generation. The results demonstrate that SVD-LLM V2 consistently surpasses existing state-of-the-art SVD-based approaches in performance metrics such as perplexity reduction and accuracy improvement, achieving up to 42% reduction in perplexity and 9% accuracy gain.

When compared to quantization and pruning methods, SVD-LLM exhibits superior performance under equivalent memory budgets. It notably offers competitive performance even when contrasted with 1-bit quantization techniques, achieving significant perplexity reduction.

Implications and Future Work

SVD-LLM V2 has implications for the deployment of AI models by reducing their computational demands. It advances the understanding of how heterogeneous compression ratios can optimize model performance. The approach paves the way for future exploration into hybrid methods combining SVD with other compression techniques like quantization and pruning. The code for SVD-LLM V2 is available on GitHub.

In summary, the paper introduces SVD-LLM V2, a novel SVD-based LLM compression method that optimizes singular value truncation. By assigning heterogeneous compression ratios and employing loss-optimized weight truncation, SVD-LLM V2 outperforms existing methods, offering substantial perplexity reduction and accuracy gains across various datasets and LLM scales.

Markdown Report Issue