Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Contrastive Demonstration Tuning for Pre-trained Language Models (2204.04392v4)

Published 9 Apr 2022 in cs.CL, cs.AI, cs.IR, and cs.LG

Abstract: Pretrained LLMs can be effectively stimulated by textual prompts or demonstrations, especially in low-data scenarios. Recent works have focused on automatically searching discrete or continuous prompts or optimized verbalizers, yet studies for the demonstration are still limited. Concretely, the demonstration examples are crucial for an excellent final performance of prompt-tuning. In this paper, we propose a novel pluggable, extensible, and efficient approach named contrastive demonstration tuning, which is free of demonstration sampling. Furthermore, the proposed approach can be: (i) Plugged into any previous prompt-tuning approaches; (ii) Extended to widespread classification tasks with a large number of categories. Experimental results on 16 datasets illustrate that our method integrated with previous approaches LM-BFF and P-tuning can yield better performance. Code is available in https://github.com/zjunlp/PromptKG/tree/main/research/Demo-Tuning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Unilmv2: Pseudo-masked language models for unified language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 642–652. PMLR.
  2. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  3. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 1597–1607. PMLR.
  4. Disentangled contrastive learning for learning robust textual representations. In Artificial Intelligence - First CAAI International Conference, CICAI 2021, Hangzhou, China, June 5-6, 2021, Proceedings, Part II, volume 13070 of Lecture Notes in Computer Science, pages 215–226. Springer.
  5. Lightner: A lightweight generative framework with prompt-guided attention for low-resource ner. arXiv preprint arXiv:2109.00720.
  6. Knowprompt: Knowledge-aware prompt-tuning with synergistic optimization for relation extraction. CoRR, abs/2104.07650.
  7. Xinlei Chen and Kaiming He. 2021. Exploring simple siamese representation learning. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 15750–15758. Computer Vision Foundation / IEEE.
  8. Template-based named entity recognition using BART. In Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021, volume ACL/IJCNLP 2021 of Findings of ACL, pages 1835–1845. Association for Computational Linguistics.
  9. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics.
  10. Prompt-learning for fine-grained entity typing. CoRR, abs/2108.10604.
  11. Unified language model pre-training for natural language understanding and generation. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 13042–13054.
  12. Making pre-trained language models better few-shot learners. CoRR, abs/2012.15723.
  13. Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pages 3816–3830. Association for Computational Linguistics.
  14. Simcse: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pages 6894–6910. Association for Computational Linguistics.
  15. Declutr: Deep contrastive learning for unsupervised textual representations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pages 879–895. Association for Computational Linguistics.
  16. Bootstrap your own latent - A new approach to self-supervised learning. In NeurIPS.
  17. Retrieval augmented language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 3929–3938. PMLR.
  18. WARP: word-level adversarial reprogramming. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pages 4921–4933. Association for Computational Linguistics.
  19. WARP: word-level adversarial reprogramming. CoRR, abs/2101.00121.
  20. PTR: prompt tuning with rules for text classification. CoRR, abs/2105.11259.
  21. Event extraction as natural language generation. arXiv preprint arXiv:2108.12724.
  22. A survey on contrastive self-supervised learning. CoRR, abs/2011.00362.
  23. Self-guided contrastive learning for BERT sentence representations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pages 2528–2540. Association for Computational Linguistics.
  24. Good examples make A faster learner: Simple demonstration-based learning for low-resource NER. CoRR, abs/2110.08454.
  25. The power of scale for parameter-efficient prompt tuning. CoRR, abs/2104.08691.
  26. The power of scale for parameter-efficient prompt tuning. In EMNLP (1), pages 3045–3059. Association for Computational Linguistics.
  27. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, pages 7871–7880. Association for Computational Linguistics.
  28. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  29. Sentiprompt: Sentiment knowledge enhanced prompt-tuning for aspect-based sentiment analysis. arXiv preprint arXiv:2109.08306.
  30. Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pages 4582–4597. Association for Computational Linguistics.
  31. What makes good in-context examples for gpt-3? CoRR, abs/2101.06804.
  32. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. CoRR, abs/2107.13586.
  33. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. CoRR, abs/2110.07602.
  34. Self-supervised learning: Generative or contrastive. CoRR, abs/2006.08218.
  35. GPT understands, too. CoRR, abs/2103.10385.
  36. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
  37. Lajanugen Logeswaran and Honglak Lee. 2018. An efficient framework for learning sentence representations. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.
  38. Template-free prompt tuning for few-shot NER. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pages 5721–5732. Association for Computational Linguistics.
  39. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States, pages 3111–3119.
  40. Rethinking the role of demonstrations: What makes in-context learning work? CoRR, abs/2202.12837.
  41. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 8024–8035.
  42. Guanghui Qin and Jason Eisner. 2021. Learning how to ask: Querying lms with mixtures of soft prompts. CoRR, abs/2104.06599.
  43. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 3980–3990. Association for Computational Linguistics.
  44. Timo Schick and Hinrich Schütze. 2020. It’s not just size that matters: Small language models are also few-shot learners. CoRR, abs/2009.07118.
  45. Timo Schick and Hinrich Schütze. 2021. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, EACL 2021, Online, April 19 - 23, 2021, pages 255–269. Association for Computational Linguistics.
  46. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 4222–4235. Association for Computational Linguistics.
  47. Prototypical networks for few-shot learning. In NIPS, pages 4077–4087.
  48. Improving and simplifying pattern exploiting training. CoRR, abs/2103.11955.
  49. MSP: multi-stage prompting for making pre-trained language models better translators. CoRR, abs/2110.06609.
  50. Spot: Better frozen model adaptation through soft prompt transfer. CoRR, abs/2110.07904.
  51. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
  52. Entailment as few-shot learner. CoRR, abs/2104.14690.
  53. Transformers: State-of-the-art natural language processing. In EMNLP (Demos), pages 38–45. Association for Computational Linguistics.
  54. From discrimination to generation: Knowledge graph completion with generative transformer. CoRR, abs/2202.02113.
  55. From discrimination to generation: Knowledge graph completion with generative transformer. arXiv preprint arXiv:2202.02113.
  56. Consert: A contrastive framework for self-supervised sentence representation transfer. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pages 5065–5075. Association for Computational Linguistics.
  57. Learning to ask for data-efficient event argument extraction. arXiv preprint arXiv:2110.00479.
  58. Ontology-enhanced prompt-tuning for few-shot learning. CoRR, abs/2201.11332.
  59. Ontoprotein: Protein pretraining with gene ontology embedding. CoRR, abs/2201.11147.
  60. Ontoprotein: Protein pretraining with gene ontology embedding. arXiv preprint arXiv:2201.11147.
  61. Differentiable prompt makes pre-trained language models better few-shot learners. CoRR, abs/2108.13161.
  62. Reasoning through memorization: Nearest neighbor knowledge graph embeddings. CoRR, abs/2201.05575.
  63. Factual probing is [MASK]: learning vs. learning to recall. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, pages 5017–5033. Association for Computational Linguistics.
  64. Plug-tagger: A pluggable sequence labeling framework using language models. CoRR, abs/2110.07331.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xiaozhuan Liang (14 papers)
  2. Ningyu Zhang (148 papers)
  3. Siyuan Cheng (41 papers)
  4. Zhenru Zhang (13 papers)
  5. Chuanqi Tan (56 papers)
  6. Huajun Chen (198 papers)
Citations (9)

Summary

Contrastive Demonstration Tuning for Pre-trained LLMs

The paper "Contrastive Demonstration Tuning for Pre-trained LLMs" presents an innovative approach for improving the performance of pre-trained LLMs (PLMs), especially in low-data scenarios, through a technique called contrastive demonstration tuning (Demo-tuning). This technique aims to optimize the demonstration component in prompt-tuning, an area less explored compared to other fine-tuning methodologies.

Overview

Pre-trained LLMs have become essential in NLP due to their ability to be fine-tuned for diverse tasks using textual prompts or demonstrations. Previous strategies have delved into discrete and continuous prompt optimization, but the demonstration sampling technique, which plays a crucial role in refining prompt-tuning performance, has not been thoroughly investigated. This paper proposes a novel method that leverages contrastive learning to enhance demonstration selection, improving the flexibility and efficiency of existing prompt-based methods.

Key Contributions

  1. Pluggable and Extensible Approach: Demo-tuning is designed to be integrated into existing prompt-tuning methodologies without the need for manual demonstration sampling. This approach provides a platform to extend prompt-tuning to various classification tasks, regardless of the number of categories.
  2. Virtual Demonstration with Contrastive Learning: By using continuous embeddings as virtual demonstrations, the method sidesteps the limitations imposed by model input length constraints. These virtual demonstrations are optimized through a straightforward contrastive framework that foregoes negative pairs, focusing instead on improving discriminative comparisons.
  3. Comprehensive Evaluation: The authors conducted experiments across 16 NLP datasets, proving that their method achieves superior results when combined with established techniques like LM-BFF and P-tuning. Notably, in few-shot settings, Demo-tuning consistently outperformed standard fine-tuning and other prompt-based tuning methods.

Experimental Findings

The experimental results highlighted several advantages offered by Demo-tuning. For instance, significant improvements were observed in tasks like sentiment analysis and natural language inference when combined with P-tuning, showcasing its compatibility with different architectures. The authors also demonstrated the efficacy of virtual demonstrations in scenarios where the number of possible classes is extensive, overcoming the traditional limitations associated with input length in PLMs.

Additionally, alternative demonstration sampling strategies were evaluated. The use of contrastive learning to optimize virtual demonstrations proved more effective than both random and similarity-based sampling methods, suggesting a robust potential for this framework in enhancing NLP model performance.

Implications and Future Directions

From a practical perspective, Demo-tuning's flexibility and model-agnostic design imply its applicability across various NLP tasks without the need for extensive modifications to existing systems. The theoretical implications extend to possible connections with prototype learning, encouraging further investigation into the nature and role of demonstrations as prototypes within prompt-tuning frameworks.

Future research could explore parameter-efficient fine-tuning approaches leveraging this strategy, as well as applications beyond classification into generative tasks. Investigating the integration of external knowledge within demonstrations might also provide insights into the use of demonstrations as a means of knowledge enrichment in PLMs.

Conclusion

The paper presents a compelling case for contrastive demonstration tuning as a critical enhancement for pre-trained LLMs. Its potential to streamline prompt-based methods and improve performance in low-data scenarios makes it a valuable contribution to the field. Moving forward, understanding the broader applicability and optimization of virtual demonstrations within various architectures remains a fertile ground for research, promising advances in efficiency and effectiveness of NLP models.

X Twitter Logo Streamline Icon: https://streamlinehq.com