Skeleton: A New Framework for Accelerating Language Models via Task Neuron Localized Prompt Tuning (2404.11916v2)
Abstract: Prompt tuning methods have shown comparable performance to general training methods as parameter-efficient fine-tuning (PEFT) methods in various natural language understanding tasks. However, existing prompt tuning methods still utilize the entire model architecture even when solving a specific task, which prevents them from accelerating inference speed during the application procedure. In this paper, we propose a novel prompt tuning framework called Skeleton to efficiently utilize a LLM in terms of memory and time complexity for solving various tasks, retaining only task-relevant neurons by using an explainability method. From our framework, we can efficiently solve various tasks by using only task-relevant neurons and prepending adequate task-specific prompt tokens with only a single LLM. Experiments reveal that our method significantly enhances inference efficiency (at most x 1.73 speed up) for various widely used benchmarks, showing comparable performance to the prompt tuning method. Moreover, our method is applicable across various transformer-based architectures, confirming its practicality and scalability.
- Few-shot unified question answering: Tuning models or prompts? arXiv preprint arXiv:2305.14569.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.
- Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190.
- Prompts can play lottery tickets well: Achieving lifelong information extraction via lottery prompt tuning.
- P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Xprompt: Exploring the extreme of prompt tuning. arXiv preprint arXiv:2210.04457.
- Learning word vectors for sentiment analysis. Association for Computational Linguistics.
- Task-specific skill localization in fine-tuned language models. arXiv preprint arXiv:2302.06600.
- Not just a black box: Interpretable deep learning by propagating activation differences. arXiv preprint arXiv:1605.01713, 4.
- Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461.
- Fine-grained retrieval prompt tuning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 2644–2652.
- Task-specific compression for multi-task language models using attribution-based pruning. In Findings of the Association for Computational Linguistics: EACL 2023, pages 582–592.
- Character-level convolutional networks for text classification. Advances in neural information processing systems, 28.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.