Making Pre-trained Language Models Better Few-shot Learners

Published 31 Dec 2020 in cs.CL and cs.LG | (2012.15723v2)

Abstract: The recent GPT-3 model (Brown et al., 2020) achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context. Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller LLMs for which fine-tuning is computationally efficient. We present LM-BFF--better few-shot fine-tuning of LLMs--a suite of simple and complementary techniques for fine-tuning LLMs on a small number of annotated examples. Our approach includes (1) prompt-based fine-tuning together with a novel pipeline for automating prompt generation; and (2) a refined strategy for dynamically and selectively incorporating demonstrations into each context. Finally, we present a systematic evaluation for analyzing few-shot performance on a range of NLP tasks, including classification and regression. Our experiments demonstrate that our methods combine to dramatically outperform standard fine-tuning procedures in this low resource setting, achieving up to 30% absolute improvement, and 11% on average across all tasks. Our approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.

Abstract PDF Upgrade to Chat

Citations (1,779)

View on Semantic Scholar

Summary

The paper introduces an automated prompt generation technique that minimizes human intervention in designing effective templates.
It leverages dynamic in-context learning by sampling clear demonstration examples to provide focused contextual information.
The method achieves up to a 30% absolute improvement and around 90% accuracy in binary classification tasks with very few training instances.

Overview of Few-shot Learning Techniques

This work investigates how to enhance the few-shot learning capabilities of pre-trained LLMs (PLMs) of moderately-sized configurations, such as BERT and RoBERTa, by applying novel fine-tuning techniques. Few-shot learning refers to the ability of models to learn from a very limited amount of labeled training data. The focus here is on fine-tuning LLMs on a small number of examples, which is not only more realistic but also computationally more efficient.

Improved Prompt-based Fine-tuning Approach

Prompt-based fine-tuning is a strategy where the model leverages a task-specific template and generates a textual response—labeled words that complete the prompt. However, the process of discovering the most effective prompts, especially when the amount of training data is small, presents a significant challenge. The authors introduce an automated prompt generation technique that minimises human intervention in designing effective prompts. This is achieved through a combination of search techniques that identify the best-working label words and an innovative algorithm that automatically creates prompt templates using a generative Transformer model, specifically T5.

Novel Demonstration Strategies

In addition to prompt-based fine-tuning, the paper explores the concept of incorporating demonstration examples directly into the input context—a practice known as “in-context learning”—which has shown promise in similar work with models like GPT-3. This work suggests a refined strategy for dynamically selecting demonstration instances that are most informative and discriminative for the task at hand. To mitigate the detrimental effects of less informative or overwhelming contexts, it proposes sampling a single example from each class to form multiple, simple demonstration sets, providing the model with cleaner, more focused context.

Systematic Evaluation and Observations

The paper presents a comprehensive evaluation framework which includes several NLP tasks, such as classification and regression. The experiments demonstrate convincing improvements over standard fine-tuning approaches. The reported results show gains of up to 30% absolute improvement and 11% on average across all tasks evaluated. One illuminating discovery is that their approach—referred to as LM-BFF—"better few-shot fine-tuning of LLMs," achieves around 90% accuracy on most binary sentence classification tasks with RoBERTa-large, despite being trained on as few as 32 examples.

Task-Agnostic Few-shot Learning Method

The proposed methods are significant because they assume minimal resources and domain knowledge, making them hugely beneficial for a broad range of tasks and languages. Overall, these techniques push the frontiers in task-agnostic few-shot learning and present a strong case for the potential of prompt-based fine-tuning with demonstrations in making the most out of PLMs with small datasets.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

Making Pre-trained Language Models Better Few-shot Learners

Summary

Overview of Few-shot Learning Techniques

Improved Prompt-based Fine-tuning Approach

Novel Demonstration Strategies

Systematic Evaluation and Observations

Task-Agnostic Few-shot Learning Method

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (3)

Collections

Making Pre-trained Language Models Better Few-shot Learners

Summary

Overview of Few-shot Learning Techniques

Improved Prompt-based Fine-tuning Approach

Novel Demonstration Strategies

Systematic Evaluation and Observations

Task-Agnostic Few-shot Learning Method

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (3)

Collections