Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Better Zero-Shot Reasoning with Self-Adaptive Prompting (2305.14106v1)

Published 23 May 2023 in cs.CL, cs.AI, and cs.LG
Better Zero-Shot Reasoning with Self-Adaptive Prompting

Abstract: Modern LLMs have demonstrated impressive capabilities at sophisticated tasks, often through step-by-step reasoning similar to humans. This is made possible by their strong few and zero-shot abilities -- they can effectively learn from a handful of handcrafted, completed responses ("in-context examples"), or are prompted to reason spontaneously through specially designed triggers. Nonetheless, some limitations have been observed. First, performance in the few-shot setting is sensitive to the choice of examples, whose design requires significant human effort. Moreover, given the diverse downstream tasks of LLMs, it may be difficult or laborious to handcraft per-task labels. Second, while the zero-shot setting does not require handcrafting, its performance is limited due to the lack of guidance to the LLMs. To address these limitations, we propose Consistency-based Self-adaptive Prompting (COSP), a novel prompt design method for LLMs. Requiring neither handcrafted responses nor ground-truth labels, COSP selects and builds the set of examples from the LLM zero-shot outputs via carefully designed criteria that combine consistency, diversity and repetition. In the zero-shot setting for three different LLMs, we show that using only LLM predictions, COSP improves performance up to 15% compared to zero-shot baselines and matches or exceeds few-shot baselines for a range of reasoning tasks.

Better Zero-Shot Reasoning with Self-Adaptive Prompting

The paper "Better Zero-Shot Reasoning with Self-Adaptive Prompting" by Xingchen Wan et al. explores the enhancement of zero-shot reasoning capabilities of LLMs using a novel approach termed Self-Adaptive Prompting (COSP). The focus is on addressing the limitations inherent in zero-shot and few-shot reasoning setups and proposing a systematic methodology to improve performance without the reliance on ground-truth labels or extensive human annotation.

Key Issues in Current Approaches

The advent of large-scale LLMs has significantly advanced the state-of-the-art in NLP tasks. With techniques such as chain-of-thought (CoT) prompting, LLMs have demonstrated strong performance in tasks requiring step-by-step reasoning. However, the existing methods have conspicuous limitations:

  • Few-shot CoT, where models are prompted with examples, is highly sensitive to the choice of these examples. The example selection process is labor-intensive and requires domain-specific expertise.
  • Zero-shot CoT alleviates the need for labeled examples by using trigger phrases to elicit reasoning. However, it often underperforms few-shot methods due to the lack of tailored guidance for diverse tasks.

Proposed Method: COSP

COSP addresses these limitations by employing a two-stage algorithm that automatically selects high-quality in-context examples from the model's own zero-shot outputs:

  1. Stage 1 (Candidate Generation): This involves generating multiple reasoning paths for each test query via Zero-shot CoT and collecting a pool of candidate demonstrations. The outcomes are evaluated based on their consistency (using entropy as a proxy for model confidence) and the diversity of reasoning and responses.
  2. Stage 2 (Demonstration Utilization): The selected candidate examples from the first stage are used as in-context demonstrations for the LLM. The model is queried again, incorporating these demonstrations to improve the reasoning process.

Novel Contributions

  • The use of outcome entropy to score and select candidate demonstrations is critical to COSP. This metric assesses the reliability of self-generated answers by the model, fostering the selection of confident and consistent responses.
  • A penalty for repetitiveness ensures the diversity of examples, enhancing the robustness of the in-context learning process.
  • COSP leverages self-consistency to automatically curate effective demonstrations without human intervention or reliance on labeled data, thus significantly reducing the cost and effort involved in model guidance.

Empirical Results

The paper validates COSP across three LLMs (PaLM-62B, PaLM-540B, and GPT-3) and a variety of logical and arithmetic reasoning tasks (e.g., MultiArith, GSM-8K, CSQA). The results demonstrate that:

  • COSP improves zero-shot performance by up to 15% compared to baseline methods.
  • Performance parity is achieved or exceeded relative to few-shot baselines using manually selected examples, highlighting the efficacy of the method.
  • The method is particularly advantageous for smaller models (e.g., PaLM-62B), where it significantly reduces the performance gap with larger, more resource-intensive models (e.g., PaLM-540B).

The paper provides an extensive comparison with other adaptive zero-shot techniques like Auto-CoT, showing that COSP's sophisticated selection criteria yield more reliable and higher-quality demonstrations, leading to consistent performance improvements.

Implications and Future Directions

COSP's ability to enhance zero-shot reasoning has implications both practically and theoretically:

  • Practical Implications: COSP reduces the dependency on human-crafted examples and annotations, making it feasible to leverage LLMs for a wider range of tasks in a cost-effective manner. This approach democratizes access to advanced reasoning capabilities by enabling the use of smaller models efficiently.
  • Theoretical Implications: The method underscores the importance of model introspection—using models' own uncertainty measures to guide their learning processes. This approach paves the way for future research in self-aware and adaptive AI systems.

Speculations on Future Developments

Looking forward, the principles underpinning COSP could be extended to other types of NLP tasks beyond logical and arithmetic reasoning. Additionally, combining COSP with continual learning paradigms where models dynamically adapt to new data during deployment could further enhance zero-shot learning capabilities. Moreover, extending COSP to interact with external tools and datasets might improve its application scope, making it more versatile in real-world scenarios.

In conclusion, COSP presents a compelling step forward in zero-shot reasoning for LLMs, effectively balancing the trade-off between model performance and the need for human supervision. The two-stage, self-adaptive framework detailed in this paper offers a robust, scalable, and efficient method to harness the full potential of LLMs in reasoning tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xingchen Wan (31 papers)
  2. Ruoxi Sun (58 papers)
  3. Hanjun Dai (63 papers)
  4. Tomas Pfister (89 papers)
  5. Sercan O. Arik (40 papers)
Citations (43)