Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An automatically discovered chain-of-thought prompt generalizes to novel models and datasets (2305.02897v2)

Published 4 May 2023 in cs.CL and cs.AI

Abstract: Emergent chain-of-thought (CoT) reasoning capabilities promise to improve performance and explainability of LLMs. However, uncertainties remain about how reasoning strategies formulated for previous model generations generalize to new model generations and different datasets. In this small-scale study, we compare different reasoning strategies induced by zero-shot prompting across six recently released LLMs (davinci-002, davinci-003, GPT-3.5-turbo, GPT-4, Flan-T5-xxl and Cohere command-xlarge) on a mixture of six question-answering datasets, including datasets from scientific and medical domains. Our findings demonstrate that while some variations in effectiveness occur, gains from CoT reasoning strategies remain robust across different models and datasets. GPT-4 has the most benefit from current state-of-the-art reasoning strategies and exhibits the best performance by applying a prompt previously discovered through automated discovery.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Konstantin Hebenstreit (4 papers)
  2. Robert Praas (3 papers)
  3. Louis P Kiesewetter (1 paper)
  4. Matthias Samwald (36 papers)
Citations (9)
Youtube Logo Streamline Icon: https://streamlinehq.com