Task-specific Compression for Multi-task Language Models using Attribution-based Pruning (2205.04157v2)
Abstract: Multi-task LLMs show outstanding performance for various natural language understanding tasks with only a single model. However, these LLMs utilize an unnecessarily large number of model parameters, even when used only for a specific task. This paper proposes a novel training-free compression method for multi-task LLMs using a pruning method. Specifically, we use an attribution method to determine which neurons are essential for performing a specific task. We task-specifically prune unimportant neurons and leave only task-specific parameters. Furthermore, we extend our method to be applicable in low-resource and unsupervised settings. Since our compression method is training-free, it uses few computing resources and does not destroy the pre-trained knowledge of LLMs. Experimental results on the six widely-used datasets show that our proposed pruning method significantly outperforms baseline pruning methods. In addition, we demonstrate that our method preserves performance even in an unseen domain setting.
- Nakyeong Yang (9 papers)
- Yunah Jang (4 papers)
- Hwanhee Lee (36 papers)
- Seohyeong Jung (1 paper)
- Kyomin Jung (76 papers)