Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient GPT Model Pre-training using Tensor Train Matrix Representation (2306.02697v1)

Published 5 Jun 2023 in cs.AI

Abstract: Large-scale transformer models have shown remarkable performance in LLMling tasks. However, such models feature billions of parameters, leading to difficulties in their deployment and prohibitive training costs from scratch. To reduce the number of the parameters in the GPT-2 architecture, we replace the matrices of fully-connected layers with the corresponding Tensor Train Matrix~(TTM) structure. Finally, we customize forward and backward operations through the TTM-based layer for simplicity and the stableness of further training. % The resulting GPT-2-based model stores up to 40% fewer parameters, showing the perplexity comparable to the original model. On the downstream tasks, including language understanding and text summarization, the model performs similarly to the original GPT-2 model. The proposed tensorized layers could be used to efficiently pre-training other Transformer models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Viktoriia Chekalina (6 papers)
  2. Georgii Novikov (6 papers)
  3. Julia Gusak (13 papers)
  4. Ivan Oseledets (187 papers)
  5. Alexander Panchenko (92 papers)
Citations (4)