The Prompt Report: A Systematic Survey of Prompting Techniques (2406.06608v1)

Published 6 Jun 2024 in cs.CL and cs.AI

Abstract: Generative Artificial Intelligence (GenAI) systems are being increasingly deployed across all parts of industry and research settings. Developers and end users interact with these systems through the use of prompting or prompt engineering. While prompting is a widespread and highly researched concept, there exists conflicting terminology and a poor ontological understanding of what constitutes a prompt due to the area's nascency. This paper establishes a structured understanding of prompts, by assembling a taxonomy of prompting techniques and analyzing their use. We present a comprehensive vocabulary of 33 vocabulary terms, a taxonomy of 58 text-only prompting techniques, and 40 techniques for other modalities. We further present a meta-analysis of the entire literature on natural language prefix-prompting.

PDF HTML Abstract

A Systematic Survey of Prompting Techniques: Comprehensive Analysis and Case Studies

Introduction

The paper "The Prompt Report: A Systematic Survey of Prompting Techniques" led by Sander Schulhoff et al. presents a thorough taxonomy and detailed analysis of various prompting techniques utilized for LLMs. This systematic survey aims to establish a structured understanding of prompts, identify conflicting terminologies, and evaluate the efficacy of numerous prompting methods. The authors also provide empirical benchmarks to illustrate the comparative performance of selected techniques and offer insights into real-world applications through detailed case studies.

Scope and Methodology

The authors conducted a machine-assisted systematic literature review using the PRISMA process, accumulating and filtering a dataset of 1,565 papers. The primary sources included arXiv, Semantic Scholar, and ACL Anthology. The systematic review identified 58 text-based prompting techniques, which were subsequently categorized into six major groups: Zero-Shot, Few-Shot, Thought Generation, Ensembling, Self-Criticism, and Decomposition. Additionally, the review includes an analysis of multilingual and multimodal prompting techniques, highlighting the comprehensive nature of the survey.

Text-Based Prompting Techniques

The text-based prompting techniques span a wide array of methods, each designed to leverage the capabilities of LLMs in unique ways:

Zero-Shot Prompts: These techniques require no exemplar data. Examples include Emotion Prompting, Role Prompting, and System 2 Attention (S2A).
Few-Shot Prompts: These rely on a small number of exemplars to guide the model. Essential techniques involve K-Nearest Neighbor (KNN), Vote-K, and Self-Generated In-Context Learning (SG-ICL).
Thought Generation: Chain-of-Thought (CoT) Prompting is a prominent technique, inducing the model to articulate its reasoning steps.
Ensembling: This method aggregates outputs from multiple prompts, enhancing accuracy and robustness.
Self-Criticism: Techniques like Self-Verification and Self-Refine enable the model to critique and improve its outputs iteratively.
Decomposition: Methods such as Least-to-Most Prompting break down complex tasks into simpler sub-tasks, facilitating better performance.

Multilingual and Multimodal Prompting Techniques

The survey also explores prompting techniques beyond English text, addressing the nuances of multilingual and multimodal applications:

Multilingual Prompting: Techniques like Translate First Prompting and Cross-Lingual Self-Consistent Prompting (CLSP) are explored.
Multimodal Prompting: The paper categorizes techniques for images, audio, video, and 3D modalities. Examples include Duty Distinct Chain-of-Thought (DDCoT) for image tasks and Interactive-Chain-Prompting (ICP) for video tasks.

Empirical Benchmarks

To establish a comparative performance benchmark, the authors selected a subset of prompting techniques and evaluated them on the MMLU benchmark using GPT-3.5-turbo. Techniques such as Zero-Shot-CoT, Few-Shot-CoT, and Self-Consistency were tested, revealing significant variability in performance, with Few-Shot-CoT emerging as the most effective method.

Prompt Engineering Case Study

The authors provided an illustrative case paper on identifying entrapment, an indicator of suicidal crisis, in text. The case paper involved multiple iterations of prompt engineering, including Zero-Shot, Few-Shot, Chain-of-Thought, and AutoCoT techniques. The detailed documentation of this process offers insights into the challenges and intricacies of prompt engineering, highlighting both successes and failures.

Evaluation Frameworks

The paper discusses various frameworks for evaluating the effectiveness of prompting techniques, such as LLM-EVAL and G-EVAL, which involve both explicit and implicit scoring methodologies. These frameworks are crucial for assessing the practical utility of prompting methods in real-world applications.

Security and Alignment

Prompting techniques also pose significant security and alignment challenges. The authors address issues like prompt hacking, data privacy, and bias mitigation. They propose potential hardening measures, including prompt-based defenses and guardrails, to bolster security and alignment.

Conclusions and Future Directions

The survey underscores the complexity and potential of prompting techniques in expanding the capabilities of LLMs. The authors advocate for ongoing engagement between prompt engineers and domain experts to ensure the development of robust and contextually appropriate prompts. They highlight the necessity of empirical benchmarks and offer recommendations for future research, encouraging the integration of new methods within their comprehensive taxonomy.

In summary, the paper provides a foundational understanding of prompting techniques, extends the taxonomy of this evolving field, and sets the stage for future advancements in the application of LLMs across diverse domains.