Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution (2309.16797v1)

Published 28 Sep 2023 in cs.CL, cs.AI, cs.LG, and cs.NE

Abstract: Popular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of LLMs in various domains. However, such hand-crafted prompt-strategies are often sub-optimal. In this paper, we present Promptbreeder, a general-purpose self-referential self-improvement mechanism that evolves and adapts prompts for a given domain. Driven by an LLM, Promptbreeder mutates a population of task-prompts, and subsequently evaluates them for fitness on a training set. Crucially, the mutation of these task-prompts is governed by mutation-prompts that the LLM generates and improves throughout evolution in a self-referential way. That is, Promptbreeder is not just improving task-prompts, but it is also improving the mutationprompts that improve these task-prompts. Promptbreeder outperforms state-of-the-art prompt strategies such as Chain-of-Thought and Plan-and-Solve Prompting on commonly used arithmetic and commonsense reasoning benchmarks. Furthermore, Promptbreeder is able to evolve intricate task-prompts for the challenging problem of hate speech classification.

PDF HTML Abstract

An Overview of "Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution"

The paper presents "Promptbreeder," a novel approach to leverage LLMs by evolving effective prompt strategies through self-referential self-improvement. Promptbreeder aims to address the often sub-optimal nature of hand-crafted prompting strategies by automatically evolving and refining prompts tailored to specific domains.

Core Concept and Methodology

Promptbreeder utilizes a genetic algorithm framework to evolve prompts. The process begins with initializing a population of task-prompts. Unlike traditional approaches, Promptbreeder employs an LLM to generate variations, guided by mutation-prompts. The distinct characteristic of this system is its self-referential nature; it evolves both the task-prompts and the mutation-prompts. This dual evolution allows the system to not only enhance prompts but also optimize the mutation process itself.

Implementation and Evaluation

The algorithm undertakes multiple iterations where task-prompts are tested for fitness based on their efficacy in a given domain. This evaluation spans arithmetic and commonsense reasoning benchmarks like GSM8K and AQuA-RAT, as well as challenging tasks such as hate speech classification. The results showcase Promptbreeder's superiority over existing prompting strategies, such as Chain-of-Thought and Plan-and-Solve, by yielding higher accuracies across all tested datasets.

Numerical Results and Findings

Promptbreeder demonstrated remarkable improvements, achieving 99.7% accuracy on MultiArith and 83.9% on GSM8K, significantly outperforming state-of-the-art prompting methods. Its ability to evolve intricate prompts was highlighted in its application to the ETHOS hate speech classification task, illustrating its adaptability in complex scenarios.

Implications and Future Directions

The implications of Promptbreeder are manifold, suggesting potential advancements in the refinement of LLM-utilization strategies. By automating prompt optimization, this approach can lead to more efficient LLM deployment across various domains. Theoretically, it paves the way towards systems capable of continuous self-improvement without direct human intervention. A fascinating future direction is scaling Promptbreeder with increasingly capable LLMs, exploring more complex thought processes, and enhancing its diversity and adaptability.

In conclusion, Promptbreeder represents a significant step in automating the optimization of LLM prompting strategies, showcasing a method with the potential to vastly enhance the capability of AI systems through self-referential improvements. The research opens pathways for future explorations into more complex, adaptable, and efficient AI self-improvement mechanisms.