Papers
Topics
Authors
Recent
Search
2000 character limit reached

The Prompt Report: A Systematic Survey of Prompt Engineering Techniques

Published 6 Jun 2024 in cs.CL and cs.AI | (2406.06608v6)

Abstract: Generative Artificial Intelligence (GenAI) systems are increasingly being deployed across diverse industries and research domains. Developers and end-users interact with these systems through the use of prompting and prompt engineering. Although prompt engineering is a widely adopted and extensively researched area, it suffers from conflicting terminology and a fragmented ontological understanding of what constitutes an effective prompt due to its relatively recent emergence. We establish a structured understanding of prompt engineering by assembling a taxonomy of prompting techniques and analyzing their applications. We present a detailed vocabulary of 33 vocabulary terms, a taxonomy of 58 LLM prompting techniques, and 40 techniques for other modalities. Additionally, we provide best practices and guidelines for prompt engineering, including advice for prompting state-of-the-art (SOTA) LLMs such as ChatGPT. We further present a meta-analysis of the entire literature on natural language prefix-prompting. As a culmination of these efforts, this paper presents the most comprehensive survey on prompt engineering to date.

Citations (53)

Summary

  • The paper introduces a comprehensive taxonomy of 58 prompting techniques, categorizing them into six major groups to clarify conflicting terminologies.
  • The study employs a PRISMA-guided review of 1,565 papers and evaluates techniques on the MMLU benchmark using GPT-3.5-turbo, highlighting performance differences.
  • The paper presents practical case studies, including prompt engineering for crisis detection, and discusses security, bias mitigation, and future research directions.

A Systematic Survey of Prompting Techniques: Comprehensive Analysis and Case Studies

Introduction

The paper "The Prompt Report: A Systematic Survey of Prompting Techniques" led by Sander Schulhoff et al. presents a thorough taxonomy and detailed analysis of various prompting techniques utilized for LLMs. This systematic survey aims to establish a structured understanding of prompts, identify conflicting terminologies, and evaluate the efficacy of numerous prompting methods. The authors also provide empirical benchmarks to illustrate the comparative performance of selected techniques and offer insights into real-world applications through detailed case studies.

Scope and Methodology

The authors conducted a machine-assisted systematic literature review using the PRISMA process, accumulating and filtering a dataset of 1,565 papers. The primary sources included arXiv, Semantic Scholar, and ACL Anthology. The systematic review identified 58 text-based prompting techniques, which were subsequently categorized into six major groups: Zero-Shot, Few-Shot, Thought Generation, Ensembling, Self-Criticism, and Decomposition. Additionally, the review includes an analysis of multilingual and multimodal prompting techniques, highlighting the comprehensive nature of the survey.

Text-Based Prompting Techniques

The text-based prompting techniques span a wide array of methods, each designed to leverage the capabilities of LLMs in unique ways:

  • Zero-Shot Prompts: These techniques require no exemplar data. Examples include Emotion Prompting, Role Prompting, and System 2 Attention (S2A).
  • Few-Shot Prompts: These rely on a small number of exemplars to guide the model. Essential techniques involve K-Nearest Neighbor (KNN), Vote-K, and Self-Generated In-Context Learning (SG-ICL).
  • Thought Generation: Chain-of-Thought (CoT) Prompting is a prominent technique, inducing the model to articulate its reasoning steps.
  • Ensembling: This method aggregates outputs from multiple prompts, enhancing accuracy and robustness.
  • Self-Criticism: Techniques like Self-Verification and Self-Refine enable the model to critique and improve its outputs iteratively.
  • Decomposition: Methods such as Least-to-Most Prompting break down complex tasks into simpler sub-tasks, facilitating better performance.

Multilingual and Multimodal Prompting Techniques

The survey also explores prompting techniques beyond English text, addressing the nuances of multilingual and multimodal applications:

  • Multilingual Prompting: Techniques like Translate First Prompting and Cross-Lingual Self-Consistent Prompting (CLSP) are explored.
  • Multimodal Prompting: The paper categorizes techniques for images, audio, video, and 3D modalities. Examples include Duty Distinct Chain-of-Thought (DDCoT) for image tasks and Interactive-Chain-Prompting (ICP) for video tasks.

Empirical Benchmarks

To establish a comparative performance benchmark, the authors selected a subset of prompting techniques and evaluated them on the MMLU benchmark using GPT-3.5-turbo. Techniques such as Zero-Shot-CoT, Few-Shot-CoT, and Self-Consistency were tested, revealing significant variability in performance, with Few-Shot-CoT emerging as the most effective method.

Prompt Engineering Case Study

The authors provided an illustrative case study on identifying entrapment, an indicator of suicidal crisis, in text. The case study involved multiple iterations of prompt engineering, including Zero-Shot, Few-Shot, Chain-of-Thought, and AutoCoT techniques. The detailed documentation of this process offers insights into the challenges and intricacies of prompt engineering, highlighting both successes and failures.

Evaluation Frameworks

The paper discusses various frameworks for evaluating the effectiveness of prompting techniques, such as LLM-EVAL and G-EVAL, which involve both explicit and implicit scoring methodologies. These frameworks are crucial for assessing the practical utility of prompting methods in real-world applications.

Security and Alignment

Prompting techniques also pose significant security and alignment challenges. The authors address issues like prompt hacking, data privacy, and bias mitigation. They propose potential hardening measures, including prompt-based defenses and guardrails, to bolster security and alignment.

Conclusions and Future Directions

The survey underscores the complexity and potential of prompting techniques in expanding the capabilities of LLMs. The authors advocate for ongoing engagement between prompt engineers and domain experts to ensure the development of robust and contextually appropriate prompts. They highlight the necessity of empirical benchmarks and offer recommendations for future research, encouraging the integration of new methods within their comprehensive taxonomy.

In summary, the paper provides a foundational understanding of prompting techniques, extends the taxonomy of this evolving field, and sets the stage for future advancements in the application of LLMs across diverse domains.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 187 tweets with 5608 likes about this paper.