Prompt Optimization with Human Feedback (2405.17346v1)

Published 27 May 2024 in cs.LG and cs.AI

Abstract: LLMs have demonstrated remarkable performances in various tasks. However, the performance of LLMs heavily depends on the input prompt, which has given rise to a number of recent works on prompt optimization. However, previous works often require the availability of a numeric score to assess the quality of every prompt. Unfortunately, when a human user interacts with a black-box LLM, attaining such a score is often infeasible and unreliable. Instead, it is usually significantly easier and more reliable to obtain preference feedback from a human user, i.e., showing the user the responses generated from a pair of prompts and asking the user which one is preferred. Therefore, in this paper, we study the problem of prompt optimization with human feedback (POHF), in which we aim to optimize the prompt for a black-box LLM using only human preference feedback. Drawing inspiration from dueling bandits, we design a theoretically principled strategy to select a pair of prompts to query for preference feedback in every iteration, and hence introduce our algorithm named automated POHF (APOHF). We apply our APOHF algorithm to various tasks, including optimizing user instructions, prompt optimization for text-to-image generative models, and response optimization with human feedback (i.e., further refining the response using a variant of our APOHF). The results demonstrate that our APOHF can efficiently find a good prompt using a small number of preference feedback instances. Our code can be found at \url{https://github.com/xqlin98/APOHF}.

PDF HTML Abstract

Prompt Optimization with Human Feedback: An Expert Overview

The paper “Prompt Optimization with Human Feedback (POHF)” by Xiaoqiang Lin et al. provides a formal treatment of the challenging problem of optimizing prompts for LLMs using human preference feedback. The research addresses the common obstacle where numeric assessment scores for prompt quality are often infeasible or unreliable in real-world scenarios where users interact directly with black-box LLMs.

Problem Definition and Methodology

The primary objective of this paper is to optimize prompts using only binary preference feedback from human users, a problem defined as POHF. Traditional methods rely on numeric scores to evaluate prompts, which is impractical in many use cases. Instead, this paper proposes the use of human preference feedback, where users indicate their preferred response from a pair of LLM-generated prompts.

The authors present the Automated Prompt Optimization with Human Feedback (APOHF) algorithm, drawing inspiration from dueling bandits, a subset of multi-armed bandit problems. The APOHF algorithm:

Utilizes embeddings from pre-trained LLMs as continuous representations of prompts.
Trains a neural network (NN) to predict the performance of different prompts based on these embeddings.
Implements a strategy for selecting a pair of prompts per iteration—inspired by upper confidence bound principles—balancing exploration and exploitation to efficiently find high-performing prompts.

Empirical Evaluation

The APOHF algorithm was evaluated using various tasks to demonstrate its effectiveness:

User Instruction Optimization:
- The setup leveraged 30 instruction induction tasks.
- Generated prompts using ChatGPT with initial task descriptions.
- Showcased better performance compared to baseline methods, achieving higher validation accuracy.
Text-to-Image Generative Models:
- Employed scenarios where prompts were optimized to produce images.
- Evaluated against four scenes using DALLE-3.
- Demonstrated increased alignment between generated images and predefined ground-truth images over iterations.
Response Optimization with Human Feedback:
- Adapted APOHF for further refining responses generated by LLMs.
- Utilized Anthropic Helpfulness and Harmlessness datasets for evaluation.
- Illustrated substantial improvement in response quality with minimal feedback iterations.

Numerical Results

The numerical outcomes in the experiments highlight the efficacy of the APOHF algorithm:

Instruction Optimization: APOHF showed consistent superior performance in achieving high validation accuracy, significantly outperforming alternatives such as random search, linear dueling bandits, and Double Thompson Sampling (DoubleTS).
Image Generation: The algorithm demonstrated an increment in image similarity scores, indicating better prompt generation as iterations increased.
Response Optimization: APOHF outperformed DoubleTS significantly, indicating its strong potential in improving LLM responses with limited feedback.

Implications and Future Directions

The implications of this research are twofold—practical and theoretical. Practically, APOHF enables users to optimize LLM prompts efficiently without the need for complex scoring systems, making LLMs more accessible and user-friendly. Theoretically, this paper extends the capabilities of bandit algorithms through the integration of neural network-based continuous function optimization, pushing the boundary of what can be achieved with bandit-inspired prompt optimization.

Future work could involve extending the framework to handle simultaneous multiple prompt selections with ranking-based feedback and further fine-tuning the algorithms for more specialized applications.

Conclusion

This research contributes significantly to the domain of prompt optimization for black-box LLMs, offering practical methods for real-world application and laying the groundwork for further advancements in this field. The APOHF algorithm’s reliance on human preference feedback rather than numeric scoring broadens its applicability, providing a robust and user-centric approach to optimizing prompts for LLMs.