Papers
Topics
Authors
Recent
2000 character limit reached

Prompt Tuning Strikes Back: Customizing Foundation Models with Low-Rank Prompt Adaptation (2405.15282v2)

Published 24 May 2024 in cs.LG and cs.AI

Abstract: Parameter-Efficient Fine-Tuning (PEFT) has become the standard for customising Foundation Models (FMs) to user-specific downstream tasks. However, typical PEFT methods require storing multiple task-specific adapters, creating scalability issues as these adapters must be housed and run at the FM server. Traditional prompt tuning offers a potential solution by customising them through task-specific input prefixes, but it under-performs compared to other PEFT methods like LoRA. To address this gap, we propose Low-Rank Prompt Adaptation (LoPA), a prompt-tuning-based approach that performs on par with state-of-the-art PEFT methods and full fine-tuning while being more parameter-efficient and not requiring a server-based adapter. LoPA generates soft prompts by balancing between sharing task-specific information across instances and customization for each instance. It uses a low-rank decomposition of the soft-prompt component encoded for each instance to achieve parameter efficiency. We provide a comprehensive evaluation on multiple natural language understanding and code generation and understanding tasks across a wide range of foundation models with varying sizes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Program synthesis with large language models. arXiv preprint arXiv:2108.07732, 2021.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  3. Automatically constructing a corpus of sentential paraphrases. In Third international workshop on paraphrasing (IWP2005), 2005.
  4. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155, 2020.
  5. The third pascal recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing, pages 1–9, 2007.
  6. Cruxeval: A benchmark for code reasoning, understanding and execution. arXiv preprint arXiv:2401.03065, 2024.
  7. Deepseek-coder: When the large language model meets programming–the rise of code intelligence. arXiv preprint arXiv:2401.14196, 2024.
  8. Warp: Word-level adversarial reprogramming. arXiv preprint arXiv:2101.00121, 2021.
  9. Parameter-efficient fine-tuning for large models: A comprehensive survey. arXiv preprint arXiv:2403.14608, 2024.
  10. Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366, 2021.
  11. Parameter-efficient transfer learning for nlp. In International conference on machine learning, pages 2790–2799. PMLR, 2019.
  12. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  13. Phi-2: The surprising power of small language models. Microsoft Research Blog, 2023.
  14. Compacter: Efficient low-rank hypercomplex adapter layers. Advances in Neural Information Processing Systems, 34:1022–1035, 2021.
  15. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021.
  16. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
  17. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems, 35:1950–1965, 2022.
  18. Dora: Weight-decomposed low-rank adaptation. arXiv preprint arXiv:2402.09353, 2024.
  19. Late prompt tuning: A late prompt could be better than many prompts. arXiv preprint arXiv:2210.11292, 2022.
  20. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602, 2021.
  21. Gpt understands, too. AI Open, 2023.
  22. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  23. Meta. https://ai.meta.com/blog/meta-llama-3/. 2024.
  24. Microsoft. https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/. 2024.
  25. Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474, 2022.
  26. When do prompting and prefix-tuning work? a theory of capabilities and limitations. arXiv preprint arXiv:2310.19698, 2023.
  27. Adapterfusion: Non-destructive task composition for transfer learning. arXiv preprint arXiv:2005.00247, 2020.
  28. Learning how to ask: Querying lms with mixtures of soft prompts. arXiv preprint arXiv:2104.06599, 2021.
  29. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250, 2016.
  30. Dept: Decomposed prompt tuning for parameter-efficient fine-tuning. arXiv preprint arXiv:2309.05173, 2023.
  31. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1631–1642, 2013.
  32. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295, 2024.
  33. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  34. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018.
  35. Batched low-rank adaptation of foundation models. arXiv preprint arXiv:2312.05677, 2023.
  36. Batched low-rank adaptation of foundation models. In ICLR, 2024.
  37. A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426, 2017.
  38. Idpg: An instance-dependent prompt generation method. arXiv preprint arXiv:2204.04497, 2022.
  39. Beyond fully-connected layers with quaternions: Parameterization of hypercomplex multiplications with 1/n1𝑛1/n1 / italic_n parameters. arXiv preprint arXiv:2102.08597, 2021.
  40. Code representation learning at scale. arXiv preprint arXiv:2402.01935, 2024.
  41. Spt: Learning to selectively insert prompts for better prompt tuning. In The 2023 Conference on Empirical Methods in Natural Language Processing, 2023.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.