SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression (2403.07378v5)
Abstract: The advancements in LLMs have been hindered by their substantial sizes, which necessitates LLM compression methods for practical deployment. Singular Value Decomposition (SVD) offers a promising solution for LLM compression. However, state-of-the-art SVD-based LLM compression methods have two key limitations: truncating smaller singular values may lead to higher compression loss, and the lack of update on the compressed weights after SVD truncation. In this work, we propose SVD-LLM, a SVD-based post-training LLM compression method that addresses the limitations of existing methods. SVD-LLM incorporates a truncation-aware data whitening technique to ensure a direct mapping between singular values and compression loss. Moreover, SVD-LLM adopts a parameter update with sequential low-rank approximation to compensate for the accuracy degradation after SVD compression. We evaluate SVD-LLM on 10 datasets and seven models from three different LLM families at three different scales. Our results demonstrate the superiority of SVD-LLM over state-of-the-arts, especially at high model compression ratios. Our code is available at https://github.com/AIoT-MLSys-Lab/SVD-LLM
- A large annotated corpus for learning natural language inference. In Màrquez, L., Callison-Burch, C., and Su, J. (eds.), Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 632–642, Lisbon, Portugal, September 2015. Association for Computational Linguistics. doi: 10.18653/v1/D15-1075. URL https://aclanthology.org/D15-1075.
- Language models are few-shot learners, 2020.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/.
- Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv:1803.05457v1, 2018.
- Automatically constructing a corpus of sentential paraphrases. In Proceedings of the Third International Workshop on Paraphrasing (IWP2005), 2005. URL https://aclanthology.org/I05-5002.
- SparseGPT: Massive language models can be accurately pruned in one-shot. arXiv preprint arXiv:2301.00774, 2023.
- GPTQ: Accurate post-training compression for generative pretrained transformers. arXiv preprint arXiv:2210.17323, 2022.
- A generalization of the eckart-young-mirsky matrix approximation theorem. Linear Algebra and its Applications, 88-89:317–327, 1987. ISSN 0024-3795. doi: https://doi.org/10.1016/0024-3795(87)90114-5. URL https://www.sciencedirect.com/science/article/pii/0024379587901145.
- A survey of generative ai applications, 2023.
- Knowledge distillation of large language models, 2023.
- Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. arXiv preprint arXiv:2305.02301, 2023.
- Language model compression with weighted low-rank factorization, 2022.
- Lora: Low-rank adaptation of large language models, 2021.
- Mistral 7b, 2023.
- Awq: Activation-aware weight quantization for llm compression and acceleration. arXiv, 2023.
- Llm-pruner: On the structural pruning of large language models. In Advances in Neural Information Processing Systems, 2023.
- Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330, 1993. URL https://aclanthology.org/J93-2004.
- Pointer sentinel mixture models, 2016.
- Meyer, C. D. Matrix analysis and applied linear algebra, volume 188. Siam, 2023.
- Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(1), jan 2020. ISSN 1532-4435.
- Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP, pp. 1631–1642. ACL, 2013.
- Llama: Open and efficient foundation language models, 2023a.
- Llama 2: Open foundation and fine-tuned chat models, 2023b.
- Efficient large language models: A survey, 2023.
- Iot in the era of generative ai: Vision and challenges, 2024.
- Neural network acceptability judgments. Trans. Assoc. Comput. Linguistics, 7:625–641, 2019.
- A broad-coverage challenge corpus for sentence understanding through inference. In NAACL-HLT, pp. 1112–1122. Association for Computational Linguistics, 2018.
- SmoothQuant: Accurate and efficient post-training quantization for large language models. In Proceedings of the 40th International Conference on Machine Learning, 2023.
- Asvd: Activation-aware singular value decomposition for compressing large language models, 2023.
- A survey of large language models, 2023.
- A survey on model compression for large language models, 2023.