Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Short Study on Compressing Decoder-Based Language Models (2110.08460v1)

Published 16 Oct 2021 in cs.CL

Abstract: Pre-trained LLMs (PLMs) have been successful for a wide range of NLP tasks. The state-of-the-art of PLMs, however, are extremely large to be used on edge devices. As a result, the topic of model compression has attracted increasing attention in the NLP community. Most of the existing works focus on compressing encoder-based models (tiny-BERT, distilBERT, distilRoBERTa, etc), however, to the best of our knowledge, the compression of decoder-based models (such as GPT-2) has not been investigated much. Our paper aims to fill this gap. Specifically, we explore two directions: 1) we employ current state-of-the-art knowledge distillation techniques to improve fine-tuning of DistilGPT-2. 2) we pre-train a compressed GPT-2 model using layer truncation and compare it against the distillation-based method (DistilGPT2). The training time of our compressed model is significantly less than DistilGPT-2, but it can achieve better performance when fine-tuned on downstream tasks. We also demonstrate the impact of data cleaning on model performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Tianda Li (10 papers)
  2. Yassir El Mesbahi (5 papers)
  3. Ivan Kobyzev (23 papers)
  4. Ahmad Rashid (24 papers)
  5. Atif Mahmud (1 paper)
  6. Nithin Anchuri (2 papers)
  7. Habib Hajimolahoseini (10 papers)
  8. Yang Liu (2253 papers)
  9. Mehdi Rezagholizadeh (78 papers)
Citations (24)