Emergent Mind

Diversity of Thought Improves Reasoning Abilities of LLMs

(2310.07088)
Published Oct 11, 2023 in cs.CL and cs.AI

Abstract

Large language models (LLMs) are documented to struggle in settings that require complex reasoning. Nevertheless, instructing the model to break down the problem into smaller reasoning steps, or ensembling various generations through modifying decoding steps boosts performance. However, these methods assume that the input prompt is fixed and expect the decoding strategies to introduce the diversity needed for ensembling. In this work, we discuss how one can create and leverage variations of the input prompt as a means of diversity of thought. We propose a method that automatically improves prompt diversity by soliciting feedback from the LLM to ideate approaches that are apt for the problem. We then ensemble the diverse prompts in our method DIVSE (DIVerse reasoning path Self-Ensemble) across multiple inference calls, or use diverse approaches within a single inference call; we call the latter IDIV-SE (In-call DIVerse reasoning path Self-Ensemble). Apart from our approaches outperforming prior work, DIV-SE(in particular) advances state-of-the-art performance on the challenging planning and graph coloring benchmarks. Our results improve the Pareto frontier of the accuracy-cost trade-off.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Please try again later (sorry!).

Get summaries of trending AI papers delivered straight to your inbox

Unsubscribe anytime.

References
  1. PaLM 2 Technical Report
  2. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pp.  1877–1901. Curran Associates Inc., 2020. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.

  3. Sparks of Artificial General Intelligence: Early experiments with GPT-4
  4. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks
  5. Scaling Instruction-Finetuned Language Models
  6. Training verifiers to solve math word problems
  7. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
  8. 8-bit optimizers via block-wise quantization. 9th International Conference on Learning Representations, ICLR, 2022b.
  9. Automatically Auditing Large Language Models via Discrete Optimization
  10. Large language models are zero-shot reasoners. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho (eds.), Advances in Neural Information Processing Systems
  11. The Power of Scale for Parameter-Efficient Prompt Tuning
  12. Holistic Evaluation of Language Models
  13. Program induction by rationale generation: Learning to solve and explain algebraic word problems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  158–167, Vancouver, Canada, July 2017. Association for Computational Linguistics. doi: 10.18653/v1/P17-1015. https://aclanthology.org/P17-1015.

  14. Show Your Work: Scratchpads for Intermediate Computation with Language Models
  15. OpenAI. Introducing chatgpt. 2022. https://openai.com/blog/chatgpt/.

  16. GPT-4 Technical Report
  17. OpenAI. Gpt-4 technical report, 2023b.
  18. Automatic Prompt Optimization with "Gradient Descent" and Beam Search
  19. In-Context Impersonation Reveals Large Language Models' Strengths and Biases
  20. Retrieval Augmentation Reduces Hallucination in Conversation
  21. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp.  4149–4158, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1421. https://aclanthology.org/N19-1421.

  22. Llama 2: Open Foundation and Fine-Tuned Chat Models
  23. PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change
  24. On the planning abilities of large language models – a critical investigation
  25. Self-consistency improves chain of thought reasoning in language models
  26. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
  27. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT
  28. HuggingFace's Transformers: State-of-the-art Natural Language Processing
  29. Answering Questions by Meta-Reasoning over Multiple Chains of Thought
  30. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

Show All 30

Test Your Knowledge

You answered out of questions correctly.

Well done!