Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective
This paper introduces a novel approach to zero-shot learning in Natural Language Understanding (NLU) tasks through a unified multiple-choice (MC) framework known as UniMC. The proposed method seeks to address common challenges in zero-shot learning paradigms, such as those prompted by large-scale models like GPT-3 and FLAN, by converting various NLU tasks into multiple-choice problems. This transformation allows the model to function independently of task-specific formats and relies on an efficient architectural design that remarkably reduces parameter size.
Key Contributions and Methodology
The authors focus on a framework that compactly unifies text classification, commonsense reasoning, coreference resolution, and sentiment analysis into MC tasks. The primary innovation is the development of an Option Masked LLM (O-MLM) alongside Option Prediction methods. This technique involves tokenizing input sequences to include option-mask tokens ({\tt [O-MASK]}) that require the model to predict yes or no, effectively allowing the transformation of complex NLU tasks into a manageable MC format.
A salient achievement of this model is its size. With only 235 million parameters, it presents a scalable and resource-conserving alternative to existing state-of-the-art models. Extensive results across multiple benchmarks reveal that UniMC not only matches but, in some cases, surpasses the performance of much larger models, such as FLAN, on tasks like natural language inference and text classification.
Experimental Analysis
The empirical results underscore UniMC's capabilities with robust numerical outcomes. For instance, in text classification on the Dbpedia dataset, UniMC shows an improvement of up to 48% over larger models. Additionally, it demonstrates superior performance in tasks involving natural language inference, as evidenced by its results on the ANLI datasets. The paper reports an accuracy of 60.9% on SNLI and 52.7% on MNLI, highlighting its substantial performance given its smaller size.
Theoretical and Practical Implications
Theoretically, the paper contributes to the ongoing discourse on parameter efficiency in neural LLMs, showing that strategic task formulation can lead to significant performance gains without extensive computational resources. Practically, it provides a framework that is more adaptable to real-world applications due to its smaller size and efficiency, allowing for more straightforward deployment in environments with limited computational capabilities.
Future Directions
Looking forward, potential avenues for research could explore further reductions in dependency on handcrafted prompt engineering, as well as expanding the framework to encompass few-shot learning scenarios. Additionally, investigating the role of different backbone architectures could provide insights into optimizing UniMC for even broader NLU applications.
In summary, the paper presents a compelling alternative to dominant zero-shot learning approaches by leveraging the multiple-choice paradigm, reinforcing the significance of formulation in advancing the capabilities of NLU systems while maintaining computational efficiency.