Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning (2305.18169v3)

Published 29 May 2023 in cs.CL

Abstract: In recent years, there has been significant progress in developing pre-trained LLMs for NLP. However, these models often struggle when fine-tuned on small datasets. To address this issue, researchers have proposed various adaptation approaches. Prompt-based tuning is arguably the most common way, especially for larger models. Previous research shows that adding contrastive learning to prompt-based fine-tuning is effective as it helps the model generate embeddings that are more distinguishable between classes, and it can also be more sample-efficient as the model learns from positive and negative examples simultaneously. One of the most important components of contrastive learning is data augmentation, but unlike computer vision, effective data augmentation for NLP is still challenging. This paper proposes LM-CPPF, Contrastive Paraphrasing-guided Prompt-based Fine-tuning of LLMs, which leverages prompt-based few-shot paraphrasing using generative LLMs, especially LLMs such as GPT-3 and OPT-175B, for data augmentation. Our experiments on multiple text classification benchmarks show that this augmentation method outperforms other methods, such as easy data augmentation, back translation, and multiple templates.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Automatic speech recognition for speech assessment of persian preschool children. arXiv preprint arXiv:2203.12886.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  3. Language models are few-shot learners.
  4. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR.
  5. Recent advances in pre-trained language models: Why do they work and how do they work. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Tutorial Abstracts, pages 8–15, Taipei. Association for Computational Linguistics.
  6. Marcos V Conde and Kerem Turgutlu. 2021. Clip-art: Contrastive pre-training for fine-grained art classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3956–3960.
  7. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  8. Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723.
  9. Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3816–3830.
  10. Unsupervised contextual paraphrase generation using lexical control and reinforcement learning. arXiv preprint arXiv:2103.12777.
  11. Ptr: Prompt tuning with rules for text classification.
  12. Chaitra Hegde and Shrikumar Patil. 2020. Unsupervised paraphrase generation using pre-trained language models. arXiv preprint arXiv:2006.05477.
  13. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR.
  14. Contrastive learning for prompt-based few-shot language learners. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5577–5587, Seattle, United States. Association for Computational Linguistics.
  15. Vlado Keselj. 2009. Speech and language processing daniel jurafsky and james h. martin (stanford university and university of colorado at boulder) pearson prentice hall, 2009, xxxi+ 988 pp; hardbound, isbn 978-0-13-187321-6.
  16. Supervised contrastive learning. Advances in Neural Information Processing Systems, 33:18661–18673.
  17. Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997.
  18. Submodular optimization-based diverse paraphrasing and its effectiveness in data augmentation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3609–3619, Minneapolis, Minnesota. Association for Computational Linguistics.
  19. Contrastive representation learning: A framework and review. Ieee Access, 8:193907–193934.
  20. Bootstrapping semantic segmentation with regional contrast. arXiv preprint arXiv:2104.04465.
  21. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  22. ExtraPhrase: Efficient data augmentation for abstractive summarization. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, pages 16–24, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.
  23. Simple contrastive representation adversarial learning for nlp tasks. arXiv preprint arXiv:2111.13301.
  24. Improving language understanding by generative pre-training.
  25. Aurko Roy and David Grangier. 2019. Unsupervised paraphrasing without translation. arXiv preprint arXiv:1905.12752.
  26. Timo Schick and Hinrich Schütze. 2020a. Exploiting cloze questions for few shot text classification and natural language inference. arXiv preprint arXiv:2001.07676.
  27. Timo Schick and Hinrich Schütze. 2020b. It’s not just size that matters: Small language models are also few-shot learners. arXiv preprint arXiv:2009.07118.
  28. Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 86–96, Berlin, Germany. Association for Computational Linguistics.
  29. Unsupervised paraphrasing via deep reinforcement learning. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1800–1809.
  30. Prototypical networks for few-shot learning. Advances in neural information processing systems, 30.
  31. Amane Sugiyama and Naoki Yoshinaga. 2019. Data augmentation using back-translation for context-aware neural machine translation. In Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019), pages 35–44.
  32. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1199–1208.
  33. Contrastive multiview coding. In European conference on computer vision, pages 776–794. Springer.
  34. Towards unified prompt tuning for few-shot text classification.
  35. Jason Wei and Kai Zou. 2019. Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196.
  36. John Wieting and Kevin Gimpel. 2018. ParaNMT-50M: Pushing the limits of paraphrastic sentence embeddings with millions of machine translations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 451–462, Melbourne, Australia. Association for Computational Linguistics.
  37. Unsupervised data augmentation for consistency training.
  38. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32.
  39. Prompt tuning for discriminative pre-trained language models. arXiv preprint arXiv:2205.11166.
  40. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
  41. Jianing Zhou and Suma Bhat. 2021. Paraphrase generation: A survey of the state of the art. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5075–5086.
  42. Dual context-guided continuous prompt tuning for few-shot learning. In Findings of the Association for Computational Linguistics: ACL 2022, pages 79–84, Dublin, Ireland. Association for Computational Linguistics.
  43. DuQM: A Chinese dataset of linguistically perturbed natural questions for evaluating the robustness of question matching models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7782–7794, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Amirhossein Abaskohi (14 papers)
  2. Sascha Rothe (16 papers)
  3. Yadollah Yaghoobzadeh (34 papers)
Citations (12)