Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 82 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 19 tok/s Pro
GPT-5 High 17 tok/s Pro
GPT-4o 107 tok/s Pro
Kimi K2 174 tok/s Pro
GPT OSS 120B 468 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation (2208.10160v2)

Published 22 Aug 2022 in cs.CL

Abstract: Prompt Transfer (PoT) is a recently-proposed approach to improve prompt-tuning, by initializing the target prompt with the existing prompt trained on similar source tasks. However, such a vanilla PoT approach usually achieves sub-optimal performance, as (i) the PoT is sensitive to the similarity of source-target pair and (ii) directly fine-tuning the prompt initialized with source prompt on target task might lead to forgetting of the useful general knowledge learned from source task. To tackle these issues, we propose a new metric to accurately predict the prompt transferability (regarding (i)), and a novel PoT approach (namely PANDA) that leverages the knowledge distillation technique to alleviate the knowledge forgetting effectively (regarding (ii)). Extensive and systematic experiments on 189 combinations of 21 source and 9 target datasets across 5 scales of PLMs demonstrate that: 1) our proposed metric works well to predict the prompt transferability; 2) our PANDA consistently outperforms the vanilla PoT approach by 2.3% average score (up to 24.1%) among all tasks and model sizes; 3) with our PANDA approach, prompt-tuning can achieve competitive and even better performance than model-tuning in various PLM scales scenarios. We have publicly released our code in https://github.com/WHU-ZQH/PANDA.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in NAACL, 2019.
  2. Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” arXiv, 2019.
  3. P. He, X. Liu, J. Gao, and W. Chen, “Deberta: Decoding-enhanced bert with disentangled attention,” in ICLR, 2020.
  4. C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” JMLR, 2020.
  5. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” in NeurIPS, 2020.
  6. F. Yuan, X. He, A. Karatzoglou, and L. Zhang, “Parameter-efficient transfer from sequential behaviors for user modeling and recommendation,” in SIGIR, 2020.
  7. E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” in ICLR, 2022.
  8. B. Lester, R. Al-Rfou, and N. Constant, “The power of scale for parameter-efficient prompt tuning,” in EMNLP, 2021.
  9. X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,” in ACL, 2021.
  10. T. Vu, B. Lester, N. Constant, R. Al-Rfou, and D. Cer, “SPoT: Better frozen model adaptation through soft prompt transfer,” in ACL, 2022.
  11. X. Liu, Y. Zheng, Z. Du, M. Ding, Y. Qian, Z. Yang, and J. Tang, “Gpt understands, too,” arXiv, 2021.
  12. Y. Pruksachatkun, J. Phang, H. Liu, P. M. Htut, X. Zhang, R. Y. Pang, C. Vania, K. Kann, and S. Bowman, “Intermediate-task transfer learning with pretrained language models: When and why does it work?” in ACL, 2020.
  13. Y. Su, X. Wang, Y. Qin, C.-M. Chan, Y. Lin, H. Wang, K. Wen, Z. Liu, P. Li, J. Li et al., “On transferability of prompt tuning for natural language processing,” in NAACL, 2022.
  14. S. Chen, Y. Hou, Y. Cui, W. Che, T. Liu, and X. Yu, “Recall and learn: Fine-tuning deep pretrained language models with less forgetting,” in EMNLP, 2020.
  15. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in NeurIPS, 2017.
  16. M. Joshi, D. Chen, Y. Liu, D. S. Weld, L. Zettlemoyer, and O. Levy, “Spanbert: Improving pre-training by representing and predicting spans,” TACL, 2020.
  17. M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” in ACL, 2020.
  18. Q. Zhong, L. Ding, Y. Zhan, Y. Qiao, Y. Wen, L. Shen, J. Liu, B. Yu, B. Du, Y. Chen et al., “Toward efficient language model pretraining and downstream adaptation via self-evolution: A case study on superglue,” arXiv, 2022.
  19. Q. Zhong, L. Ding, K. Peng, J. Liu, B. Du, L. Shen, Y. Zhan, and D. Tao, “Bag of tricks for effective language model pretraining and downstream adaptation: A case study on glue,” arXiv, 2023.
  20. Q. Zhong, L. Ding, J. Liu, B. Du, and D. Tao, “Self-evolution learning for discriminative language model pretraining,” in Findings of ACL, 2023.
  21. P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, “Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing,” ACM Computing Surveys, 2023.
  22. Q. Zhong, L. Ding, J. Liu, B. Du, and D. Tao, “E2s2: Encoding-enhanced sequence-to-sequence pretraining for language understanding and generation,” IEEE Transactions on Knowledge and Data Engineering, 2023.
  23. R. Guan, H. Zhang, Y. Liang, F. Giunchiglia, L. Huang, and X. Feng, “Deep feature-based text clustering and its explanation,” IEEE Transactions on Knowledge and Data Engineering, 2020.
  24. Q. Zhong, L. Ding, J. Liu, B. Du, H. Jin, and D. Tao, “Knowledge graph augmented network towards multiview representation learning for aspect-based sentiment analysis,” IEEE Transactions on Knowledge and Data Engineering, 2023.
  25. J. Li, B. Chiu, S. Feng, and H. Wang, “Few-shot named entity recognition via meta-learning,” IEEE Transactions on Knowledge and Data Engineering, 2020.
  26. L. Chen, F. Yuan, J. Yang, X. He, C. Li, and M. Yang, “User-specific adaptive fine-tuning for cross-domain recommendations,” IEEE Transactions on Knowledge and Data Engineering, 2021.
  27. J. Li, A. Sun, and Y. Ma, “Neural named entity boundary detection,” IEEE Transactions on Knowledge and Data Engineering, 2020.
  28. J. Li, A. Sun, J. Han, and C. Li, “A survey on deep learning for named entity recognition,” IEEE Transactions on Knowledge and Data Engineering, 2020.
  29. Q. Zhong, L. Ding, L. Shen, P. Mi, J. Liu, B. Du, and D. Tao, “Improving sharpness-aware minimization with fisher mask for better generalization on language models,” in Findings of EMNLP, 2022.
  30. Q. Zhong, L. Ding, J. Liu, X. Liu, M. Zhang, B. Du, and D. Tao, “Revisiting token dropping strategy in efficient bert pretraining,” in ACL, 2023.
  31. R. Xu, F. Luo, Z. Zhang, C. Tan, B. Chang, S. Huang, and F. Huang, “Raise a child in large language model: Towards effective and generalizable fine-tuning,” in EMNLP, 2021.
  32. N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for nlp,” in ICML, 2019.
  33. H. Huang, X. Liu, G. Shi, and Q. Liu, “Event extraction with dynamic prefix tuning and relevance retrieval,” IEEE Transactions on Knowledge and Data Engineering, 2023.
  34. X. Liu, K. Ji, Y. Fu, W. Tam, Z. Du, Z. Yang, and J. Tang, “P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks,” in ACL, 2022.
  35. Y. Gu, X. Han, Z. Liu, and M. Huang, “PPT: Pre-trained prompt tuning for few-shot learning,” in ACL, 2022.
  36. T. Schick and H. Schütze, “Few-shot text generation with pattern-exploiting training,” in EMNLP, 2021.
  37. T. Schick and H. Schutze, “Exploiting cloze-questions for few-shot text classification and natural language inference,” in ACL, 2021.
  38. T. Shin, Y. Razeghi, R. L. Logan IV, E. Wallace, and S. Singh, “Autoprompt: Eliciting knowledge from language models with automatically generated prompts,” in EMNLP, 2020.
  39. X. Han, W. Zhao, N. Ding, Z. Liu, and M. Sun, “Ptr: Prompt tuning with rules for text classification,” AI Open, 2022.
  40. A. Asai, M. Salehi, M. E. Peters, and H. Hajishirzi, “Attempt: Parameter-efficient multi-task tuning via attentional mixtures of soft prompts,” in EMNLP, 2022.
  41. X. Peng, C. Xing, P. K. Choubey, C.-S. Wu, and C. Xiong, “Model ensemble instead of prompt fusion: a sample-specific knowledge transfer method for few-shot prompt tuning,” in ICLR, 2023.
  42. G. Hinton, O. Vinyals, J. Dean et al., “Distilling the knowledge in a neural network,” in NeurIPS, 2015.
  43. G. Xu, Z. Liu, X. Li, and C. C. Loy, “Knowledge distillation meets self-supervision,” in ECCV, 2020.
  44. T. Furlanello, Z. Lipton, M. Tschannen, L. Itti, and A. Anandkumar, “Born again neural networks,” in ICML, 2018.
  45. Q. Zhong, L. Ding, L. Shen, J. Liu, B. Du, and D. Tao, “Revisiting knowledge distillation for autoregressive language models,” arXiv, 2024.
  46. L. Ding, L. Wang, X. Liu, D. F. Wong, D. Tao, and Z. Tu, “Understanding and improving lexical choice in non-autoregressive translation,” in ICLR, 2021.
  47. S. Hahn and H. Choi, “Self-knowledge distillation in natural language processing,” in RANLP, 2019.
  48. S. Gururangan, A. Marasović, S. Swayamdipta, K. Lo, I. Beltagy, D. Downey, and N. A. Smith, “Don’t stop pretraining: Adapt language models to domains and tasks,” in ACL, 2020.
  49. T. Jiang, J. Jiao, S. Huang, Z. Zhang, D. Wang, F. Zhuang, F. Wei, H. Huang, D. Deng, and Q. Zhang, “Promptbert: Improving bert sentence embeddings with prompts,” in EMNLP, 2022.
  50. B. Wang, L. Ding, Q. Zhong, X. Li, and D. Tao, “A contrastive cross-channel data augmentation framework for aspect-based sentiment analysis,” in COLING, 2022.
  51. A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. Bowman, “Glue: A multi-task benchmark and analysis platform for natural language understanding,” in EMNLP, 2018.
  52. A. Wang, Y. Pruksachatkun, N. Nangia, A. Singh, J. Michael, F. Hill, O. Levy, and S. Bowman, “Superglue: A stickier benchmark for general-purpose language understanding systems,” in NeurIPS, 2019.
  53. P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, “Squad: 100, 000+ questions for machine comprehension of text,” in EMNLP, 2016.
  54. E. T. K. Sang and F. De Meulder, “Introduction to the conll-2003 shared task: Language-independent named entity recognition,” in NAACL, 2003.
  55. X. Carreras and L. Màrquez, “Introduction to the CoNLL-2004 shared task: Semantic role labeling,” in NAACL, 2004.
  56. E. Hovy, M. Marcus, M. Palmer, L. Ramshaw, and R. Weischedel, “Ontonotes: the 90% solution,” in NAACL, 2006.
  57. X. Carreras and L. Màrquez, “Introduction to the conll-2005 shared task: Semantic role labeling,” in CoNLL-2005, 2005.
  58. S. Pradhan, A. Moschitti, N. Xue, O. Uryupina, and Y. Zhang, “CoNLL-2012 shared task: Modeling multilingual unrestricted coreference in OntoNotes,” in EMNLP, 2012.
  59. S. Stanton, P. Izmailov, P. Kirichenko, A. A. Alemi, and A. G. Wilson, “Does knowledge distillation really work?” in NeurIPS, 2021.
  60. T. Kim, J. Oh, N. Y. Kim, S. Cho, and S.-Y. Yun, “Comparing kullback-leibler divergence and mean squared error loss in knowledge distillation,” in IJCAI, 2021.
  61. S. Zhang, S. Roller, N. Goyal, M. Artetxe, M. Chen, S. Chen, C. Dewan, M. Diab, X. Li, X. V. Lin et al., “Opt: Open pre-trained transformer language models,” arXiv, 2022.
  62. H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale et al., “Llama 2: Open foundation and fine-tuned chat models,” arXiv, 2023.
  63. C. Spearman, “The proof and measurement of association between two things.” American Journal of Psychology, 1904.
Citations (36)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube