Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Boosted Prompt Ensembles for Large Language Models (2304.05970v1)

Published 12 Apr 2023 in cs.CL and cs.LG

Abstract: Methods such as chain-of-thought prompting and self-consistency have pushed the frontier of LLM reasoning performance with no additional training. To further improve performance, we propose a prompt ensembling method for LLMs, which uses a small dataset to construct a set of few shot prompts that together comprise a boosted prompt ensemble''. The few shot examples for each prompt are chosen in a stepwise fashion to behard'' examples on which the previous step's ensemble is uncertain. We show that this outperforms single-prompt output-space ensembles and bagged prompt-space ensembles on the GSM8k and AQuA datasets, among others. We propose both train-time and test-time versions of boosted prompting that use different levels of available annotation and conduct a detailed empirical study of our algorithm.

Boosted Prompt Ensembles for LLMs

The paper "Boosted Prompt Ensembles for LLMs" introduces a novel approach to enhance the performance of LLMs by employing a technique termed as "boosted prompting." This method specifically aims to improve reasoning performance through creating an ensemble of prompts that are strategically designed to cover a broader problem space.

Methodology

The paper builds upon existing methods such as chain-of-thought (CoT) prompting and self-consistency, advancing these by integrating boosted prompting to construct what the authors describe as "boosted prompt ensembles." The essence of this technique lies in selecting a series of few-shot prompts that target "hard" examples—those that present uncertainties to previous models in the ensemble. The prompts are iteratively added in a stagewise manner, inspired by classical boosting algorithms, to progressively enhance performance on datasets including GSM8K and AQuA.

Key Findings

  • Performance: The method demonstrated superior results over traditional single-prompt output-space ensembles and bagged prompt ensembles. On benchmarks like GSM8k and AQuA, the boosted prompt ensembles exhibited marked improvements.
  • Train-Time and Test-Time Variants: The paper proposes two algorithms—train-time, which requires labeled data for generating the prompts, and test-time, where the model self-generates prompts without annotations. The train-time version showcased more robust performance due to access to explicit labels.
  • Adaptive Example Selection: By focusing on "hard" problems—examples where the model showed uncertainty—the approach facilitates a form of curriculum learning. It selects high-impact examples with diverse reasoning paths, thereby strengthening the model’s reasoning capabilities.

Implications

The implications of this work are significant for the efficient utilization of LLMs in complex reasoning tasks. By reducing the manual effort involved in prompt engineering, the approach optimizes the automatic generation of reasoning paths, enhancing the applicability of LLMs for generating high-quality datasets.

Practical Relevance

Practically, this approach can lead to significant improvements in tasks requiring nuanced reasoning or domain-specific knowledge, by enabling LLMs to learn from complex examples over iterative stages. This potentially reduces the overhead of manually curated datasets in diverse applications.

Future Directions

Looking forward, there is potential for exploring test-time boosting as a means of online adaptation, offering a promising avenue for models to dynamically adjust to changing problem distributions. Moreover, leveraging verifiers or debaters might further refine the accuracy of generated prompts, thereby maximizing model performance.

Overall, the paper provides a comprehensive framework for improving LLM reasoning through methodical ensemble construction, highlighting an innovative path forward in AI model optimization.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Silviu Pitis (14 papers)
  2. Michael R. Zhang (13 papers)
  3. Andrew Wang (42 papers)
  4. Jimmy Ba (55 papers)
Citations (34)