Universal Self-Adaptive Prompting (2305.14926v2)

Published 24 May 2023 in cs.CL, cs.AI, and cs.LG

Abstract: A haLLMark of modern LLMs is their impressive general zero-shot and few-shot abilities, often elicited through in-context learning (ICL) via prompting. However, while highly coveted and being the most general, zero-shot performances in LLMs are still typically weaker due to the lack of guidance and the difficulty of applying existing automatic prompt design methods in general tasks when ground-truth labels are unavailable. In this study, we address this by presenting Universal Self-Adaptive Prompting (USP), an automatic prompt design approach specifically tailored for zero-shot learning (while compatible with few-shot). Requiring only a small amount of unlabeled data and an inference-only LLM, USP is highly versatile: to achieve universal prompting, USP categorizes a possible NLP task into one of the three possible task types and then uses a corresponding selector to select the most suitable queries and zero-shot model-generated responses as pseudo-demonstrations, thereby generalizing ICL to the zero-shot setup in a fully automated way. We evaluate USP with PaLM and PaLM 2 models and demonstrate performances that are considerably stronger than standard zero-shot baselines and often comparable to or even superior to few-shot baselines across more than 40 natural language understanding, natural language generation, and reasoning tasks.

PDF Abstract

Abstract

Universal Self-Adaptive Prompting (USP) presents an approach designed to bolster zero-shot learning capabilities of LLMs. Traditional in-context learning (ICL) shows remarkable zero-shot abilities but often suffers from a lack of guidance when real-world labels are scarce. USP overcomes these limitations by using models' output as pseudo-demonstrations for ICL. The model requires minimal unlabeled data and works in inference mode, displaying flexibility in a variety of NLP tasks. Notably, USP distinguishes between task types—Classification (CLS), Short-form Generation (SFG), and Long-form Generation (LFG)—applying tailored selection mechanisms to derive quality pseudo-demonstrations. Its performance was assessed across multiple tasks with PaLM and PaLM 2 models.

Preliminaries

USP builds on ICL principles by leveraging generated outputs as pseudo-demonstrations for zero-shot ICL, modifying the test query with these demonstrations. The process involves two stages: initial zero-shot prompting to create potential pseudo-demos and a follow-up step that appends these to the query for improved prediction. This method is also adjacently influenced by the concept of self-consistency, where an LLM decodes a query multiple times to generate diverse predictions and uses the majority as the final output.

Universal Self-Adaptive Prompting

USP innovatively applies a task-specific heuristic for pseudo-demo selection, informed by the need to adapt confidence metrics to varying task objectives. Given the differing nature of responses across task types, USP introduces category-specific scoring functions. For example, in CLS tasks, USP harnesses the negative entropy of probablity distributions over class labels to estimate confidence. SFG tasks, which entail diverse correct responses, necessitate multiple decoding rounds and an entropy-based scoring function, while LFG tasks rely on pairwise metric evaluations due to the high variability in responses. Importantly, USP adapts to these challenges while maintaining efficiency, requiring a small subset of the test set for pseudo-demo generation.

Evaluation and Results

Comparative analysis against standard baselines reveals that USP not only surpasses typical zero-shot prompting on over 40 tasks but also competes favorably with few-shot baselines. This demonstrates its efficacy in improving zero-shot generalizability using modest amounts of unlabeled data.

Key findings include its superior performance on generative tasks and consistency with larger or more advanced LLMs. The USP score correlates positively with ground-truth performance, indicating its effectiveness in identifying high-quality pseudo-demos, despite occasional underperformance against zero-shot baselines. This paper postulates that the amplitude of USP's benefit is proportionate to the model's uncertainty in zero-shot conditions, where increased guidance is most needed. The findings suggest the applicability of USP in cost-effectively enhancing zero-shot learning on a wide range of NLP tasks.