Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective (2210.08590v2)

Published 16 Oct 2022 in cs.CL

Abstract: We propose a new paradigm for zero-shot learners that is format agnostic, i.e., it is compatible with any format and applicable to a list of language tasks, such as text classification, commonsense reasoning, coreference resolution, and sentiment analysis. Zero-shot learning aims to train a model on a given task such that it can address new learning tasks without any additional training. Our approach converts zero-shot learning into multiple-choice tasks, avoiding problems in commonly used large-scale generative models such as FLAN. It not only adds generalization ability to models but also significantly reduces the number of parameters. Our method shares the merits of efficient training and deployment. Our approach shows state-of-the-art performance on several benchmarks and produces satisfactory results on tasks such as natural language inference and text classification. Our model achieves this success with only 235M parameters, which is substantially smaller than state-of-the-art models with billions of parameters. The code and pre-trained models are available at https://github.com/IDEA-CCNL/Fengshenbang-LM .

PDF Abstract

Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective

This paper introduces a novel approach to zero-shot learning in Natural Language Understanding (NLU) tasks through a unified multiple-choice (MC) framework known as UniMC. The proposed method seeks to address common challenges in zero-shot learning paradigms, such as those prompted by large-scale models like GPT-3 and FLAN, by converting various NLU tasks into multiple-choice problems. This transformation allows the model to function independently of task-specific formats and relies on an efficient architectural design that remarkably reduces parameter size.

Key Contributions and Methodology

The authors focus on a framework that compactly unifies text classification, commonsense reasoning, coreference resolution, and sentiment analysis into MC tasks. The primary innovation is the development of an Option Masked LLM (O-MLM) alongside Option Prediction methods. This technique involves tokenizing input sequences to include option-mask tokens ({\tt [O-MASK]}) that require the model to predict yes or no, effectively allowing the transformation of complex NLU tasks into a manageable MC format.

A salient achievement of this model is its size. With only 235 million parameters, it presents a scalable and resource-conserving alternative to existing state-of-the-art models. Extensive results across multiple benchmarks reveal that UniMC not only matches but, in some cases, surpasses the performance of much larger models, such as FLAN, on tasks like natural language inference and text classification.

Experimental Analysis

The empirical results underscore UniMC's capabilities with robust numerical outcomes. For instance, in text classification on the Dbpedia dataset, UniMC shows an improvement of up to 48% over larger models. Additionally, it demonstrates superior performance in tasks involving natural language inference, as evidenced by its results on the ANLI datasets. The paper reports an accuracy of 60.9% on SNLI and 52.7% on MNLI, highlighting its substantial performance given its smaller size.

Theoretical and Practical Implications

Theoretically, the paper contributes to the ongoing discourse on parameter efficiency in neural LLMs, showing that strategic task formulation can lead to significant performance gains without extensive computational resources. Practically, it provides a framework that is more adaptable to real-world applications due to its smaller size and efficiency, allowing for more straightforward deployment in environments with limited computational capabilities.

Future Directions

Looking forward, potential avenues for research could explore further reductions in dependency on handcrafted prompt engineering, as well as expanding the framework to encompass few-shot learning scenarios. Additionally, investigating the role of different backbone architectures could provide insights into optimizing UniMC for even broader NLU applications.

In summary, the paper presents a compelling alternative to dominant zero-shot learning approaches by leveraging the multiple-choice paradigm, reinforcing the significance of formulation in advancing the capabilities of NLU systems while maintaining computational efficiency.

PDF Markdown Bookmark Chat (Pro)

Authors (9)

Ping Yang (83 papers)
Junjie Wang (164 papers)
Ruyi Gan (14 papers)
Xinyu Zhu (29 papers)
Lin Zhang (342 papers)
Ziwei Wu (19 papers)
Xinyu Gao (58 papers)
Jiaxing Zhang (39 papers)
Tetsuya Sakai (30 papers)

Citations (20)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - IDEA-CCNL/Fengshenbang-LM: Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系，成为中文AIGC和认知智能的基础设施。 (4,056 stars)