Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Task-specific Compression for Multi-task Language Models using Attribution-based Pruning (2205.04157v2)

Published 9 May 2022 in cs.CL and cs.AI

Abstract: Multi-task LLMs show outstanding performance for various natural language understanding tasks with only a single model. However, these LLMs utilize an unnecessarily large number of model parameters, even when used only for a specific task. This paper proposes a novel training-free compression method for multi-task LLMs using a pruning method. Specifically, we use an attribution method to determine which neurons are essential for performing a specific task. We task-specifically prune unimportant neurons and leave only task-specific parameters. Furthermore, we extend our method to be applicable in low-resource and unsupervised settings. Since our compression method is training-free, it uses few computing resources and does not destroy the pre-trained knowledge of LLMs. Experimental results on the six widely-used datasets show that our proposed pruning method significantly outperforms baseline pruning methods. In addition, we demonstrate that our method preserves performance even in an unseen domain setting.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Nakyeong Yang (9 papers)
  2. Yunah Jang (4 papers)
  3. Hwanhee Lee (36 papers)
  4. Seohyeong Jung (1 paper)
  5. Kyomin Jung (76 papers)
Citations (6)