Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 91 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 33 tok/s
GPT-5 High 27 tok/s Pro
GPT-4o 102 tok/s
GPT OSS 120B 465 tok/s Pro
Kimi K2 205 tok/s Pro
2000 character limit reached

Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key? (2402.18272v1)

Published 28 Feb 2024 in cs.CL and cs.AI

Abstract: Recent progress in LLMs discussion suggests that multi-agent discussion improves the reasoning abilities of LLMs. In this work, we reevaluate this claim through systematic experiments, where we propose a novel group discussion framework to enrich the set of discussion mechanisms. Interestingly, our results show that a single-agent LLM with strong prompts can achieve almost the same performance as the best existing discussion approach on a wide range of reasoning tasks and backbone LLMs. We observe that the multi-agent discussion performs better than a single agent only when there is no demonstration in the prompt. Further study reveals the common interaction mechanisms of LLMs during the discussion.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Explanations for commonsenseqa: New dataset and models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3050–3065.
  2. Philip W Anderson. 1972. More is different: Broken symmetry and the nature of the hierarchical structure of science. Science, 177(4047):393–396.
  3. Palm 2 technical report. arXiv preprint arXiv:2305.10403.
  4. Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  6. Chateval: Towards better llm-based evaluators through multi-agent debate. arXiv preprint arXiv:2308.07201.
  7. Reconcile: Round-table conference improves reasoning via consensus among diverse llms. arXiv preprint arXiv:2309.13007.
  8. Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128.
  9. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  10. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  11. Constantinos Daskalakis and Seth Matthew Weinberg. 2012. Symmetries and optimal multi-dimensional mechanism design. In Proceedings of the 13th ACM conference on Electronic commerce, pages 370–387.
  12. Active prompting with chain-of-thought for large language models. arXiv preprint arXiv:2302.12246.
  13. Improving factuality and reasoning in language models through multiagent debate. arXiv preprint arXiv:2305.14325.
  14. Successive prompting for decomposing complex questions. arXiv preprint arXiv:2212.04092.
  15. Complexity-based prompting for multi-step reasoning. arXiv preprint arXiv:2210.00720.
  16. Folio: Natural language reasoning with first-order logic. arXiv preprint arXiv:2209.00840.
  17. Mathprompter: Mathematical reasoning using large language models. arXiv preprint arXiv:2303.05398.
  18. Decomposed prompting: A modular approach for solving complex tasks. arXiv preprint arXiv:2210.02406.
  19. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
  20. Jean-Jacques Laffont and David Martimort. 2000. Mechanism design with collusion and correlation. Econometrica, 68(2):309–342.
  21. On the advance of making language models better reasoners. arXiv preprint arXiv:2206.02336.
  22. Encouraging divergent thinking in large language models through multi-agent debate. arXiv preprint arXiv:2305.19118.
  23. Deductive verification of chain-of-thought reasoning. arXiv preprint arXiv:2306.03872.
  24. Mind’s eye: Grounded language model reasoning through simulation. arXiv preprint arXiv:2210.05359.
  25. Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842.
  26. Memory-assisted prompt editing to improve gpt-3 after deployment. arXiv preprint arXiv:2201.06009.
  27. Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651.
  28. Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837.
  29. Marvin Minsky. 1988. Society of mind. Simon and Schuster.
  30. OpenAI. 2022. Chatgpt. https://openai.com/blog/chatgpt.
  31. OpenAI. 2023. Gpt-4 technical report.
  32. Measuring and narrowing the compositionality gap in language models. arXiv preprint arXiv:2210.03350.
  33. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
  34. Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems.
  35. Kristopher Tapp. 2021. Symmetry. Springer.
  36. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
  37. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  38. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  39. Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models. arXiv preprint arXiv:2305.04091.
  40. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
  41. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  42. Lilian Weng. 2023. Llm-powered autonomous agents. lilianweng.github.io.
  43. Large language models are reasoners with self-verification. arXiv preprint arXiv:2212.09561.
  44. Self-polish: Enhance reasoning in large language models via problem refinement. arXiv preprint arXiv:2305.14497.
  45. Are large language models really good logical reasoners? a comprehensive evaluation from deductive, inductive and abductive views. arXiv preprint arXiv:2306.09841.
  46. Re-reading improves reasoning in language models. arXiv preprint arXiv:2309.06275.
  47. Rcot: Detecting and rectifying factual inconsistency in reasoning by reversing chain-of-thought. arXiv preprint arXiv:2305.11499.
  48. Logicsolver: Towards interpretable math word problem solving with logical prompt-enhanced learning. arXiv preprint arXiv:2205.08232.
  49. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
  50. Beyond chain-of-thought, effective graph-of-thought reasoning in large language models. arXiv preprint arXiv:2305.16582.
  51. Star: Bootstrapping reasoning with reasoning. Advances in Neural Information Processing Systems, 35:15476–15488.
  52. Exploring collaboration mechanisms for llm agents: A social psychology view. arXiv preprint arXiv:2310.02124.
  53. Opt: Open pre-trained transformer language models. arXiv e-prints, pages arXiv–2205.
  54. Cumulative reasoning with large language models. arXiv preprint arXiv:2308.04371.
  55. Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493.
Citations (25)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.