Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads (2011.03770v1)

Published 7 Nov 2020 in cs.CL

Abstract: Deep pre-trained Transformer models have achieved state-of-the-art results over a variety of NLP tasks. By learning rich language knowledge with millions of parameters, these models are usually overparameterized and significantly increase the computational overhead in applications. It is intuitive to address this issue by model compression. In this work, we propose a method, called Single-Shot Meta-Pruning, to compress deep pre-trained Transformers before fine-tuning. Specifically, we focus on pruning unnecessary attention heads adaptively for different downstream tasks. To measure the informativeness of attention heads, we train our Single-Shot Meta-Pruner (SMP) with a meta-learning paradigm aiming to maintain the distribution of text representations after pruning. Compared with existing compression methods for pre-trained models, our method can reduce the overhead of both fine-tuning and inference. Experimental results show that our pruner can selectively prune 50% of attention heads with little impact on the performance on downstream tasks and even provide better text representations. The source code will be released in the future.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Zhengyan Zhang (46 papers)
  2. Fanchao Qi (33 papers)
  3. Zhiyuan Liu (433 papers)
  4. Qun Liu (230 papers)
  5. Maosong Sun (337 papers)
Citations (28)