Papers
Topics
Authors
Recent
Search
2000 character limit reached

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Published 28 Jul 2021 in cs.CL, cs.AI, and cs.LG | (2107.13586v1)

Abstract: This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub "prompt-based learning". Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(y|x), prompt-based learning is based on LLMs that model the probability of text directly. To use these models to perform prediction tasks, the original input x is modified using a template into a textual string prompt x' that has some unfilled slots, and then the LLM is used to probabilistically fill the unfilled information to obtain a final string x, from which the final output y can be derived. This framework is powerful and attractive for a number of reasons: it allows the LLM to be pre-trained on massive amounts of raw text, and by defining a new prompting function the model is able to perform few-shot or even zero-shot learning, adapting to new scenarios with few or no labeled data. In this paper we introduce the basics of this promising paradigm, describe a unified set of mathematical notations that can cover a wide variety of existing work, and organize existing work along several dimensions, e.g.the choice of pre-trained models, prompts, and tuning strategies. To make the field more accessible to interested beginners, we not only make a systematic review of existing works and a highly structured typology of prompt-based concepts, but also release other resources, e.g., a website http://pretrain.nlpedia.ai/ including constantly-updated survey, and paperlist.

Citations (3,416)

Summary

  • The paper unifies prompt-based learning by categorizing prompt and answer engineering methods for diverse NLP tasks.
  • It details both manual and automated template crafting alongside various tuning strategies like tuning-free and prompt+LM tuning.
  • The survey highlights practical implications for knowledge probing, text generation, and information extraction while addressing challenges for future research.

An Expert Overview of "Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing"

Introduction

"Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing" by Liu et al. is a comprehensive examination of the emerging paradigm of prompt-based learning in NLP. Unlike traditional supervised learning, which relies on task-specific models and extensive labeled datasets, prompt-based learning leverages pre-trained LMs using prompts to guide the models in performing various NLP tasks. This survey aims to unify and systematize the diverse research efforts in this field, offering not only a review but also practical and theoretical insights into the use of prompts.

Core Paradigm

The survey categorizes the evolution of NLP into four paradigms: feature engineering, architecture engineering, objective engineering, and now, the pre-train, prompt, and predict paradigm. Prompt-based learning stands on the shoulders of pre-trained LMs such as GPT-3 and BERT. Instead of fine-tuning these models for each task, prompt-based learning modifies the input x\bm{x} into a prompt x\bm{x}' which helps the LM produce more accurate outputs y\bm{y} by filling in the blank spaces in the prompt.

Prompt Engineering

Prompt engineering is critical to the success of this learning paradigm. The primary methods include manual template crafting and automated template learning. Manual designs rely on human expertise to create effective prompts, while automated methods, such as gradient-based search and prompt paraphrasing, aim to optimize prompts algorithmically. This area remains a fertile ground for research, given the complex trade-offs between interpretability, effectiveness, and generalizability of prompts.

Answer Engineering

Answer engineering involves designing the space of possible answers Z\mathcal{Z} and mapping them to the target output space Y\mathcal{Y}. Most current works focus on simple token or span-level answers, but emerging strategies like automated answer search offer promising directions. Future research could explore extending these methods, particularly in applications requiring structured or multi-token answers.

Training Strategies and Parameter Tuning

The survey examines various training strategies, including tuning-free prompting, fixed-LM prompt tuning, fixed-prompt LM tuning, and prompt+LM tuning.

  • Tuning-free prompting keeps the LM's parameters fixed, relying entirely on the prompt for task specification.
  • Fixed-LM prompt tuning involves tuning the prompt while keeping the LM parameters constant.
  • Fixed-prompt LM tuning adjusts the LM parameters while using static prompts, combining the benefits of pre-trained models with specific task guidance.
  • Prompt+LM tuning adjusts both prompt and LM parameters, offering the most flexibility but at the risk of overfitting.

Applications

Prompt-based methods have made significant inroads across a variety of NLP tasks:

  • Knowledge Probing: Models like LAMA and X-FACTR use prompts to probe the factual and linguistic knowledge embedded within LMs.
  • Text Classification and NLI: Prompt-based learning simplifies the reformulation of these tasks, making them suitable for few-shot scenarios.
  • Information Extraction: Although challenging, prompts have been adapted for tasks like named entity recognition and relation extraction.
  • Question Answering: Unified systems like UnifiedQA demonstrate the power of prompt-based approaches in handling diverse QA formats.
  • Text Generation: Models such as GPT-3 showcase the flexibility of prompts in facilitating text generation tasks, including summarization and translation.

Challenges and Future Directions

Despite its potential, prompt-based learning faces several challenges:

  • Prompt Design Complexity: Extending prompt use to tasks beyond classification and generation is non-trivial.
  • Structured Data Integration: Encoding structured information in prompts requires further research.
  • Training Dynamics: Understanding the interplay between prompt, LM parameter tuning, and dataset size is critical.
  • Task-Specific Adaptation: Developing universal prompts that generalize across tasks remains an open question.

Furthermore, issues such as calibration of model probabilities and the interpretability of continuous prompts are areas ripe for investigation.

Conclusion

This survey not only highlights the efficacy of prompt-based learning but also identifies key challenges and areas for future exploration. By organizing the current state of knowledge and practice, Liu et al. provide a crucial resource for researchers and practitioners aiming to harness the full potential of NLP through prompt engineering. The pre-train, prompt, and predict paradigm represents a significant shift, with the potential to simplify and unify NLP model architectures while leveraging the capabilities of pre-trained LLMs.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Explain it Like I'm 14

What is this paper about?

This paper is a big “map” of a new way to use AI for language called prompt-based learning. Instead of training a separate model for every task (like sentiment analysis, translation, or question answering), we use one LLM that’s been trained on tons of text, and we “prompt” it with the right words or patterns so it can do the task—sometimes with almost no extra training. The authors explain the ideas, tools, and tricks behind prompting, and organize the growing research so newcomers can learn it faster.

What questions are they trying to answer?

  • What is prompting, and how is it different from older ways of training LLMs?
  • How do you turn a task into a prompt the model understands?
  • What kinds of LLMs (like GPT, BERT, T5) work best with which prompts?
  • How can we design good prompts and good answer formats (and even automate that)?
  • What training strategies work: no training at all, tuning the prompt, tuning the model, or both?
  • Where does prompting help most, and what are the current challenges?

How did they study it? (Their approach)

This is a survey paper. That means the authors:

  • Collected and compared many research papers on prompting.
  • Created a shared, simple way to describe how prompting works.
  • Built a typology (a structured “family tree”) of methods and choices in prompting.
  • Gave examples, patterns, and resources (including a website) to help beginners.

Here are the core ideas they explain in everyday terms:

  • LLM (LM): Think of a model that’s read a huge library and learned how words usually fit together. It can guess the next word or fill in missing words.
  • Prompt: A short instruction or “fill‑in‑the‑blank” wrapper that turns your problem into something the model already knows how to do. Examples:
    • Cloze (fill‑in‑the‑blank): “I missed the bus today. I felt so [Z].”
    • Prefix (instruction + answer): “English: I missed the bus today. French: [Z]”
  • Few-shot and zero-shot:
    • Zero-shot: You just give the prompt; no examples.
    • Few-shot: You add a few examples to guide the model.
  • Three steps of prompting (simple version): 1) Prompt addition: Wrap your input with a template (the “instruction” and a blank to fill). 2) Answer search: Let the model fill in the blank (it picks the most likely word(s)). 3) Answer mapping: If needed, map the filled word(s) to the final answer (e.g., “great” → “very positive”).
  • Prompt engineering: Crafting (or automatically learning) good templates—that is, choosing the right words and structure for the prompt.
  • Answer engineering: Choosing the shape of the answer (single word, span, full sentence) and how to map it to labels or outputs.
  • Types of pre-trained models and when they fit:
    • GPT-style (left-to-right): Great for prompts that end with an answer to generate (prefix prompts).
    • BERT-style (masked): Great for fill‑in‑the‑blank (cloze) prompts for understanding tasks.
    • Encoder–decoder (T5/BART): Good at turning input text into output text (translation, summarization), and work well with many prompt styles.
  • Training strategies with prompts:
    • Tuning-free: Just prompt the model; don’t change its weights.
    • Prompt tuning: Learn the prompt (sometimes as special “virtual” tokens) but keep the model frozen.
    • Fine-tuning: Adjust the model itself for the task (optionally with a prompt).
    • Combined: Tune both prompt and model.

What did they find?

The authors organize the field around a few key choices that strongly affect results:

  • Model choice matters:
    • GPT-like models are strong for generating answers after a prefix prompt.
    • BERT-like models shine when you can phrase the task as a fill‑in‑the‑blank.
    • Encoder–decoder models are flexible for input→output tasks.
  • Prompt and answer design matter a lot:
    • The exact wording of a prompt can make performance jump up or down.
    • “Cloze” prompts pair naturally with masked models; “prefix” prompts pair naturally with generative models.
    • Answers can be single words, short spans, or whole sentences; picking the right format helps.
  • You can automate some parts:
    • Algorithms can search for good prompt words (discrete prompts) or learn “soft” prompt vectors the model understands (continuous prompts).
    • You can also ensemble multiple prompts or combine/decompose them for tougher tasks.
  • Training choices span a spectrum:
    • Zero/few-shot prompting is great when you lack labeled data.
    • Prompt tuning (with a frozen model) uses far fewer parameters than full fine-tuning, yet can perform well.
    • Full fine-tuning can still help, especially with more data or when you need maximum control.
  • Big picture: “Pre-train, prompt, and predict”
    • Instead of “pre-train, fine-tune” for every task, prompting lets one big model tackle many tasks by changing the instruction (the prompt), sometimes with no extra training.

Why this is important:

  • It can dramatically reduce the need for expensive labeled datasets.
  • One model can be adapted quickly to many tasks by changing the prompt.
  • It opens new ways to “program” AI using natural language.

Why does this matter?

  • Saves time and money: Less need to collect thousands of labeled examples for every new task.
  • Flexible: The same base model can translate, summarize, answer questions, classify text, and more—just by changing prompts.
  • Accessible: People can guide models with natural-language instructions instead of writing code-heavy pipelines.

There are also challenges:

  • Prompt sensitivity: Small wording changes can change results a lot.
  • Fairness and reliability: Prompts can accidentally trigger biases or errors.
  • Evaluation: We need better, more consistent ways to test prompted models.
  • Automation: Tools to suggest or learn better prompts are still developing.

What could be the impact?

  • For researchers: A shared framework and vocabulary to compare methods, build better prompts, and design fairer tests.
  • For developers: Faster prototypes—try tasks with zero or few labeled examples; use prompt or parameter-efficient tuning to deploy models with lower costs.
  • For education and tools: User-friendly interfaces that help non-experts “program” AI by writing instructions, with built-in helpers to suggest and refine prompts.
  • For the future: More general-purpose models that understand many tasks and languages, with improved methods to make prompts robust, fair, and easy to design.

In short, this paper shows how prompting turns LLMs into flexible problem-solvers, explains the main design choices, and points to a future where we guide AI mostly by telling it what we want—in plain language.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.