MetricPrompt: Prompting Model as a Relevance Metric for Few-shot Text Classification (2306.08892v1)
Abstract: Prompting methods have shown impressive performance in a variety of text mining tasks and applications, especially few-shot ones. Despite the promising prospects, the performance of prompting model largely depends on the design of prompt template and verbalizer. In this work, we propose MetricPrompt, which eases verbalizer design difficulty by reformulating few-shot text classification task into text pair relevance estimation task. MetricPrompt adopts prompting model as the relevance metric, further bridging the gap between Pre-trained LLM's (PLM) pre-training objective and text classification task, making possible PLM's smooth adaption. Taking a training sample and a query one simultaneously, MetricPrompt captures cross-sample relevance information for accurate relevance estimation. We conduct experiments on three widely used text classification datasets across four few-shot settings. Results show that MetricPrompt outperforms manual verbalizer and other automatic verbalizer design methods across all few-shot settings, achieving new state-of-the-art (SOTA) performance.
- Charu C Aggarwal and Charu C Aggarwal. 2016. Content-based recommender systems. Recommender systems: The textbook (2016), 139–166.
- Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
- Improving Multi-Document Summarization via Text Classification. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA, Satinder P. Singh and Shaul Markovitch (Eds.). AAAI Press, 3053–3059. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14525
- Evaluating large language models trained on code. ArXiv preprint abs/2107.03374 (2021). https://arxiv.org/abs/2107.03374
- Thomas Cover and Peter Hart. 1967. Nearest neighbor pattern classification. IEEE transactions on information theory 13, 1 (1967), 21–27.
- Prototypical Verbalizer for Prompt-based Few-shot Tuning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 7014–7024. https://doi.org/10.18653/v1/2022.acl-long.483
- Template-Based Named Entity Recognition Using BART. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, 1835–1845. https://doi.org/10.18653/v1/2021.findings-acl.161
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
- OpenPrompt: An Open-source Framework for Prompt-learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Dublin, Ireland, 105–113. https://doi.org/10.18653/v1/2022.acl-demo.10
- Sanjay K Dwivedi and Chandrakala Arya. 2016. Automatic text classification in information retrieval: A survey. In Proceedings of the second international conference on information and communication technology for competitive strategies. 1–6.
- Making Pre-trained Language Models Better Few-shot Learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 3816–3830. https://doi.org/10.18653/v1/2021.acl-long.295
- WARP: Word-level Adversarial ReProgramming. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 4921–4933. https://doi.org/10.18653/v1/2021.acl-long.381
- Web forum retrieval and text analytics: A survey. Foundations and Trends® in Information Retrieval 12, 1 (2018), 1–163.
- Inverse is Better! Fast and Accurate Prompt for Few-shot Slot Tagging. In Findings of the Association for Computational Linguistics: ACL 2022. Association for Computational Linguistics, Dublin, Ireland, 637–647. https://doi.org/10.18653/v1/2022.findings-acl.53
- Vandana Korde and C Namrata Mahender. 2012. TEXT CLASSIFICATION AND CLASSIFIERS: A SURVEY. International Journal of Artificial Intelligence & Applications 3, 2 (2012), 85.
- Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semantic web (2015).
- Prototypical Contrastive Learning of Unsupervised Representations. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https://openreview.net/forum?id=KmykpuSrjcq
- Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 4582–4597. https://doi.org/10.18653/v1/2021.acl-long.353
- Bing Liu and Lei Zhang. 2012. A survey of opinion mining and sentiment analysis. In Mining text data. Springer, 415–463.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ArXiv preprint abs/2107.13586 (2021). https://arxiv.org/abs/2107.13586
- GPT Understands, Too. ArXiv preprint abs/2103.10385 (2021). https://arxiv.org/abs/2103.10385
- Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=Bkg6RiCqY7
- Opinion mining and sentiment analysis. Foundations and Trends® in information retrieval 2, 1–2 (2008), 1–135.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 8024–8035. https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html
- Language Models as Knowledge Bases?. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 2463–2473. https://doi.org/10.18653/v1/D19-1250
- Natural Language Inference Prompts for Zero-shot Emotion Classification in Text across Corpora. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 6805–6817. https://aclanthology.org/2022.coling-1.592
- Raul Puri and Bryan Catanzaro. 2019. Zero-shot text classification with generative language models. ArXiv preprint abs/1912.10165 (2019). https://arxiv.org/abs/1912.10165
- Improving language understanding by generative pre-training. (2018).
- Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21 (2020), 1–67.
- Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), 5569–5578. https://doi.org/10.18653/v1/2020.coling-main.488
- Timo Schick and Hinrich Schütze. 2020. Few-shot text generation with pattern-exploiting training. ArXiv preprint abs/2012.11926 (2020). https://arxiv.org/abs/2012.11926
- Timo Schick and Hinrich Schütze. 2021a. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, Online, 255–269. https://doi.org/10.18653/v1/2021.eacl-main.20
- Timo Schick and Hinrich Schütze. 2021b. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 2339–2352. https://doi.org/10.18653/v1/2021.naacl-main.185
- AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 4222–4235. https://doi.org/10.18653/v1/2020.emnlp-main.346
- KL Sumathy and M Chidambaram. 2013. Text mining: concepts, applications, tools and issues-an overview. International Journal of Computer Applications 80, 4 (2013).
- oLMpics-On What Language Model Pre-training Captures. Transactions of the Association for Computational Linguistics 8 (2020), 743–758. https://doi.org/10.1162/tacl_a_00342
- Improving and Simplifying Pattern Exploiting Training. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 4980–4991. https://doi.org/10.18653/v1/2021.emnlp-main.407
- Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38–45. https://doi.org/10.18653/v1/2020.emnlp-demos.6
- Differentiable prompt makes pre-trained language models better few-shot learners. ArXiv preprint abs/2108.13161 (2021). https://arxiv.org/abs/2108.13161
- Character-level Convolutional Networks for Text Classification. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.). 649–657. https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html