A Novel Prompt-tuning Method: Incorporating Scenario-specific Concepts into a Verbalizer (2401.05204v1)
Abstract: The verbalizer, which serves to map label words to class labels, is an essential component of prompt-tuning. In this paper, we present a novel approach to constructing verbalizers. While existing methods for verbalizer construction mainly rely on augmenting and refining sets of synonyms or related words based on class names, this paradigm suffers from a narrow perspective and lack of abstraction, resulting in limited coverage and high bias in the label-word space. To address this issue, we propose a label-word construction process that incorporates scenario-specific concepts. Specifically, we extract rich concepts from task-specific scenarios as label-word candidates and then develop a novel cascade calibration module to refine the candidates into a set of label words for each class. We evaluate the effectiveness of our proposed approach through extensive experiments on {five} widely used datasets for zero-shot text classification. The results demonstrate that our method outperforms existing methods and achieves state-of-the-art results.
- Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems (pp. 1877–1901). Curran Associates, Inc. volume 33.
- Prototypical Verbalizer for Prompt-based Few-shot Tuning. arXiv:2203.09770.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171--4186). Minneapolis, Minnesota: Association for Computational Linguistics.
- OpenPrompt: An open-source framework for prompt-learning. arXiv preprint arXiv:2111.01998, . arXiv:2111.01998.
- Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19, 61--74.
- Fundamentals of Coginition. (3rd ed.). Routledge.
- Making Pre-trained Language Models Better Few-shot Learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 3816--3830). Online: Association for Computational Linguistics.
- WARP: Word-level Adversarial ReProgramming. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 4921--4933). Online: Association for Computational Linguistics.
- Pre-Trained Models: Past, Present and Future. arXiv:2106.07139.
- Universal Language Model Fine-tuning for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 328--339). Melbourne, Australia: Association for Computational Linguistics.
- Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 2225--2240). Dublin, Ireland: Association for Computational Linguistics.
- Few-Shot Fine-Grained Entity Typing with Automatic Label Interpretation and Instance Generation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining KDD ’22 (pp. 605--614). New York, NY, USA: Association for Computing Machinery.
- Microsoft Concept Graph: Mining Semantic Concepts for Short Text Understanding. Data Intelligence, 1, 238--270.
- How Can We Know What Language Models Know? Transactions of the Association for Computational Linguistics, 8, 423--438.
- Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1326--1340). Online: Association for Computational Linguistics.
- DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web, .
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 7871--7880). Online: Association for Computational Linguistics.
- Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 4582--4597). Online: Association for Computational Linguistics.
- Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. arXiv:2107.13586.
- RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692.
- Learning Word Vectors for Sentiment Analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (pp. 142--150). Portland, Oregon, USA: Association for Computational Linguistics.
- Hidden factors and hidden topics: Understanding rating dimensions with review text. In Proceedings of the 7th ACM Conference on Recommender Systems RecSys ’13 (pp. 165--172). New York, NY, USA: Association for Computing Machinery.
- Text Classification Using Label Names Only: A Language Model Self-Training Approach. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 9006--9017). Online: Association for Computational Linguistics.
- Language Models as Knowledge Bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 2463--2473). Hong Kong, China: Association for Computational Linguistics.
- Language Models are Unsupervised Multitask Learners, .
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, 21, 1--67.
- Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems CHI EA ’21 (pp. 1--7). New York, NY, USA: Association for Computing Machinery.
- Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (pp. 255--269). Online: Association for Computational Linguistics.
- It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 2339--2352). Online: Association for Computational Linguistics.
- AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 4222--4235). Online: Association for Computational Linguistics.
- A Cross-Task Analysis of Text Span Representations. In Proceedings of the 5th Workshop on Representation Learning for NLP (pp. 166--176). Online: Association for Computational Linguistics.
- Probase: A probabilistic taxonomy for text understanding. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data SIGMOD ’12 (pp. 481--492). New York, NY, USA: Association for Computing Machinery.
- Making Pre-trained Language Models End-to-end Few-shot Learners with Contrastive Prompt Tuning. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining WSDM ’23 (pp. 438--446). New York, NY, USA: Association for Computing Machinery.
- Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 3914--3923). Hong Kong, China: Association for Computational Linguistics.
- BARTScore: Evaluating Generated Text as Text Generation. In Advances in Neural Information Processing Systems (pp. 27263--27277). Curran Associates, Inc. volume 34.
- Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. In International Conference on Learning Representations.
- Character-level Convolutional Networks for Text Classification. In Advances in Neural Information Processing Systems. Curran Associates, Inc. volume 28.
- Calibrate Before Use: Improving Few-shot Performance of Language Models. In Proceedings of the 38th International Conference on Machine Learning (pp. 12697--12706). PMLR.