Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HyperPrompt: Prompt-based Task-Conditioning of Transformers (2203.00759v2)

Published 1 Mar 2022 in cs.CL and cs.LG

Abstract: Prompt-Tuning is a new paradigm for finetuning pre-trained LLMs in a parameter-efficient way. Here, we explore the use of HyperNetworks to generate hyper-prompts: we propose HyperPrompt, a novel architecture for prompt-based task-conditioning of self-attention in Transformers. The hyper-prompts are end-to-end learnable via generation by a HyperNetwork. HyperPrompt allows the network to learn task-specific feature maps where the hyper-prompts serve as task global memories for the queries to attend to, at the same time enabling flexible information sharing among tasks. We show that HyperPrompt is competitive against strong multi-task learning baselines with as few as $0.14\%$ of additional task-conditioning parameters, achieving great parameter and computational efficiency. Through extensive empirical experiments, we demonstrate that HyperPrompt can achieve superior performances over strong T5 multi-task learning baselines and parameter-efficient adapter variants including Prompt-Tuning and HyperFormer++ on Natural Language Understanding benchmarks of GLUE and SuperGLUE across many model sizes.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Yun He (26 papers)
  2. Huaixiu Steven Zheng (11 papers)
  3. Yi Tay (94 papers)
  4. Jai Gupta (16 papers)
  5. Yu Du (52 papers)
  6. Vamsi Aribandi (6 papers)
  7. Zhe Zhao (97 papers)
  8. YaGuang Li (71 papers)
  9. Zhao Chen (54 papers)
  10. Donald Metzler (49 papers)
  11. Heng-Tze Cheng (16 papers)
  12. Ed H. Chi (74 papers)
Citations (74)