Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TQCompressor: improving tensor decomposition methods in neural networks via permutations (2401.16367v1)

Published 29 Jan 2024 in cs.LG, cs.AI, and cs.CL

Abstract: We introduce TQCompressor, a novel method for neural network model compression with improved tensor decompositions. We explore the challenges posed by the computational and storage demands of pre-trained LLMs in NLP tasks and propose a permutation-based enhancement to Kronecker decomposition. This enhancement makes it possible to reduce loss in model expressivity which is usually associated with factorization. We demonstrate this method applied to the GPT-2${small}$. The result of the compression is TQCompressedGPT-2 model, featuring 81 mln. parameters compared to 124 mln. in the GPT-2${small}$. We make TQCompressedGPT-2 publicly available. We further enhance the performance of the TQCompressedGPT-2 through a training strategy involving multi-step knowledge distillation, using only a 3.1% of the OpenWebText. TQCompressedGPT-2 surpasses DistilGPT-2 and KnGPT-2 in comparative evaluations, marking an advancement in the efficient and effective deployment of models in resource-constrained environments.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. V. Abronin (2 papers)
  2. A. Naumov (8 papers)
  3. D. Mazur (3 papers)
  4. D. Bystrov (1 paper)
  5. K. Tsarova (1 paper)
  6. Ar. Melnikov (4 papers)
  7. I. Oseledets (5 papers)
  8. S. Dolgov (2 papers)
  9. R. Brasher (1 paper)
  10. M. Perelshtein (3 papers)
Citations (5)
Youtube Logo Streamline Icon: https://streamlinehq.com