Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification

Published 26 Oct 2020 in cs.CL, cs.AI, and cs.LG | (2010.13641v1)

Abstract: A recent approach for few-shot text classification is to convert textual inputs to cloze questions that contain some form of task description, process them with a pretrained LLM and map the predicted words to labels. Manually defining this mapping between words and labels requires both domain expertise and an understanding of the LLM's abilities. To mitigate this issue, we devise an approach that automatically finds such a mapping given small amounts of training data. For a number of tasks, the mapping found by our approach performs almost as well as hand-crafted label-to-word mappings.

Abstract PDF Upgrade to Chat

Citations (195)

View on Semantic Scholar

Summary

Overview of Few-Shot Text Classification Using Automatic Label Discovery

The paper entitled "Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification" by Schick et al. presents a methodological advancement in few-shot learning scenarios for text classification tasks. The authors introduce Pet with Automatic Labels (Petal), an extension of the Pattern-Exploiting Training (Pet) framework, aimed at automatically identifying appropriate words to correspond with labels in the text classification domain under scarce data conditions.

Background and Motivation

The flourishing of large-scale pre-trained language models, such as BERT and RoBERTa, have significantly boosted performance across various Natural Language Processing (NLP) tasks. Despite these models' capabilities, few-shot learning remains a challenge due to the limited availability of labeled data. Previously, Pet has shown promise in overcoming these data-less scenarios by rephrasing inputs into cloze questions, thus enabling the language models to predict labels based on provided verbalizations. However, manually determining these verbalizations is non-trivial, necessitating domain expertise and an intricate understanding of the language model's internal representation capabilities. Petal addresses this complexity by automating the word-label mapping process.

Methodology

Petal’s approach decomposes the mapping problem into smaller, tractable subproblems. The central feature of Petal is its Likelihood Ratio Verbalizer Search (LRVS), which utilizes likelihood ratios to optimize verbalizer selection without requiring task-specific knowledge. The technique relies on calculating likelihood ratios over candidate verbalizations and selecting those that maximize class distinction through one-vs-rest reduction strategies.

The authors employ two main variations in their approach:

Binarized Training Sets: For each label, separate training sets are created to contrast examples within that label against examples of all other labels.
Likelihood Ratio Criterion: The likelihood ratios replace standard cross-entropy computations, prioritizing relative verbalization effectiveness rather than absolute likelihoods. This method enhances the performance without bias from highly probable, yet irrelevant, substitutes.

Experimental Outcomes

Qualitative and quantitative evaluations vividly illustrate the efficacy of Petal. The automatic label discovery process succeeded in identifying words that succinctly characterize labels within a variety of tasks like Yahoo Questions and MNLI, with results closely mirroring hand-engineered mappings. Particularly on the MNLI task, Petal achieved superior performance to manual methods by selecting more contextually relevant verbalizations.

The robustness of this approach is further demonstrated by high average accuracies across multiple datasets, including Yelp and AG's News, even when limited to as few as 50 training samples. This validates the practicality of Petal in real-world scenarios where data scarcity is a critical limiting factor.

Implications and Future Directions

The automation of label-word mapping in few-shot text classification holds substantial implications for both theoretical advancements and practical applications in NLP. The method not only reduces reliance on expert interventions in label definition but also potentially broadens the scope of few-shot tasks to more diverse domains by dynamically adapting to new labels.

Future research may explore the automation of input pattern discoveries, another facet of Pet that still necessitates manual input. Fully automating this process could further simplify the deployment of few-shot learning methodologies in novel NLP tasks.

In conclusion, Petal significantly narrows the gap between abundant data-driven solutions and few-shot learning, heralding new possibilities in NLP task automation and efficiency in contexts where data scarcity is prevalent.