Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding (2306.04933v1)

Published 8 Jun 2023 in cs.CL, cs.LG, and stat.ML

Abstract: Soft prompt tuning achieves superior performances across a wide range of few-shot tasks. However, the performances of prompt tuning can be highly sensitive to the initialization of the prompts. We also empirically observe that conventional prompt tuning methods cannot encode and learn sufficient task-relevant information from prompt tokens. In this work, we develop an information-theoretic framework that formulates soft prompt tuning as maximizing mutual information between prompts and other model parameters (or encoded representations). This novel view helps us to develop a more efficient, accurate and robust soft prompt tuning method InfoPrompt. With this framework, we develop two novel mutual information based loss functions, to (i) discover proper prompt initialization for the downstream tasks and learn sufficient task-relevant information from prompt tokens and (ii) encourage the output representation from the pretrained LLM to be more aware of the task-relevant information captured in the learnt prompt. Extensive experiments validate that InfoPrompt can significantly accelerate the convergence of the prompt tuning and outperform traditional prompt tuning methods. Finally, we provide a formal theoretical result for showing to show that gradient descent type algorithm can be used to train our mutual information loss.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Junda Wu (35 papers)
  2. Tong Yu (119 papers)
  3. Rui Wang (996 papers)
  4. Zhao Song (253 papers)
  5. Ruiyi Zhang (98 papers)
  6. Handong Zhao (38 papers)
  7. Chaochao Lu (39 papers)
  8. Shuai Li (295 papers)
  9. Ricardo Henao (71 papers)
Citations (15)

Summary

We haven't generated a summary for this paper yet.