Entailment as Few-Shot Learner (2104.14690v1)

Published 29 Apr 2021 in cs.CL and cs.AI

Abstract: Large pre-trained LLMs (LMs) have demonstrated remarkable ability as few-shot learners. However, their success hinges largely on scaling model parameters to a degree that makes it challenging to train and serve. In this paper, we propose a new approach, named as EFL, that can turn small LMs into better few-shot learners. The key idea of this approach is to reformulate potential NLP task into an entailment one, and then fine-tune the model with as little as 8 examples. We further demonstrate our proposed method can be: (i) naturally combined with an unsupervised contrastive learning-based data augmentation method; (ii) easily extended to multilingual few-shot learning. A systematic evaluation on 18 standard NLP tasks demonstrates that this approach improves the various existing SOTA few-shot learning methods by 12\%, and yields competitive few-shot performance with 500 times larger models, such as GPT-3.

PDF Abstract

Overview of Entailment as Few-Shot Learner Approach

The strategy outlined in the examined paper introduces a novel approach for enhancing few-shot learning in LLMs (LMs), particularly small LMs which typically struggle in comparison to their larger counterparts like GPT-3.

Few-Shot Learning and Current Challenges

Traditional few-shot learning relies on pre-trained LMs that are fine-tuned on a large corpus, followed by additional fine-tuning for specific downstream tasks. While earlier models like GPT-3 demonstrated remarkable few-shot learning capabilities by merely using prompts with examples, they are not particularly parameter-efficient. This inefficiency results in significant computational resource demands for both training and deployment. On the other hand, the proposed strategy, Entailment as Few-shot Learner (EFL), leverages task reformulation. It reimagines any given NLP task into an entailment problem, enabling the LM to boast competitive few-shot performance after fine-tuning with as few as 8 examples, suggesting a more parameter-efficient alternative.

Empirical Validation of EFL

The empirical validation of EFL is robust. Benchmarked against a suite of NLP tasks, including the GLUE and SuperGLUE benchmarks, the EFL model demonstrates an impressive 12% average improvement over state-of-the-art few-shot learning methods. Additionally, with full datasets, it outperforms standard fine-tuned RoBERTa models by 1.9 percentage points. These results suggest a considerable leap in efficiency for few-shot learning.

EFL: Beyond Monolingual Application

A key theme in the paper is the adaptability of EFL to multilingual contexts. The paper details an extension of the method to multilingual few-shot learning, achieving an average 19 percentage point improvement over standard fine-tuning methods in this domain. The EFL method's performance reinforces the idea that entailment is a fundamental linguistic task, pivotal for understanding and refining language understanding models.

Conclusion and Future Directions

In conclusion, the paper posits EFL as a more accessible approach for LLM fine-tuning, democratizing the few-shot learning capability without the need for vast computational resources. Integral to this process is the reformulation of classification tasks into entailment tasks coupled with unsupervised contrastive learning. Areas for further research include optimizing label descriptions through reinforcement learning and exploring more impactful entailment training tasks, signaling the ongoing refinement in the pursuit of language understanding.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Sinong Wang (45 papers)
Han Fang (61 papers)
Madian Khabsa (38 papers)
Hanzi Mao (8 papers)
Hao Ma (116 papers)

Citations (172)

View on Semantic Scholar