Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Making Pre-trained Language Models Better Few-shot Learners (2012.15723v2)

Published 31 Dec 2020 in cs.CL and cs.LG
Making Pre-trained Language Models Better Few-shot Learners

Abstract: The recent GPT-3 model (Brown et al., 2020) achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context. Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller LLMs for which fine-tuning is computationally efficient. We present LM-BFF--better few-shot fine-tuning of LLMs--a suite of simple and complementary techniques for fine-tuning LLMs on a small number of annotated examples. Our approach includes (1) prompt-based fine-tuning together with a novel pipeline for automating prompt generation; and (2) a refined strategy for dynamically and selectively incorporating demonstrations into each context. Finally, we present a systematic evaluation for analyzing few-shot performance on a range of NLP tasks, including classification and regression. Our experiments demonstrate that our methods combine to dramatically outperform standard fine-tuning procedures in this low resource setting, achieving up to 30% absolute improvement, and 11% on average across all tasks. Our approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.

Overview of Few-shot Learning Techniques

This work investigates how to enhance the few-shot learning capabilities of pre-trained LLMs (PLMs) of moderately-sized configurations, such as BERT and RoBERTa, by applying novel fine-tuning techniques. Few-shot learning refers to the ability of models to learn from a very limited amount of labeled training data. The focus here is on fine-tuning LLMs on a small number of examples, which is not only more realistic but also computationally more efficient.

Improved Prompt-based Fine-tuning Approach

Prompt-based fine-tuning is a strategy where the model leverages a task-specific template and generates a textual response—labeled words that complete the prompt. However, the process of discovering the most effective prompts, especially when the amount of training data is small, presents a significant challenge. The authors introduce an automated prompt generation technique that minimises human intervention in designing effective prompts. This is achieved through a combination of search techniques that identify the best-working label words and an innovative algorithm that automatically creates prompt templates using a generative Transformer model, specifically T5.

Novel Demonstration Strategies

In addition to prompt-based fine-tuning, the paper explores the concept of incorporating demonstration examples directly into the input context—a practice known as “in-context learning”—which has shown promise in similar work with models like GPT-3. This work suggests a refined strategy for dynamically selecting demonstration instances that are most informative and discriminative for the task at hand. To mitigate the detrimental effects of less informative or overwhelming contexts, it proposes sampling a single example from each class to form multiple, simple demonstration sets, providing the model with cleaner, more focused context.

Systematic Evaluation and Observations

The paper presents a comprehensive evaluation framework which includes several NLP tasks, such as classification and regression. The experiments demonstrate convincing improvements over standard fine-tuning approaches. The reported results show gains of up to 30% absolute improvement and 11% on average across all tasks evaluated. One illuminating discovery is that their approach—referred to as LM-BFF—"better few-shot fine-tuning of LLMs," achieves around 90% accuracy on most binary sentence classification tasks with RoBERTa-large, despite being trained on as few as 32 examples.

Task-Agnostic Few-shot Learning Method

The proposed methods are significant because they assume minimal resources and domain knowledge, making them hugely beneficial for a broad range of tasks and languages. Overall, these techniques push the frontiers in task-agnostic few-shot learning and present a strong case for the potential of prompt-based fine-tuning with demonstrations in making the most out of PLMs with small datasets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Tianyu Gao (35 papers)
  2. Adam Fisch (32 papers)
  3. Danqi Chen (84 papers)
Citations (1,779)