Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key? (2402.18272v1)
Abstract: Recent progress in LLMs discussion suggests that multi-agent discussion improves the reasoning abilities of LLMs. In this work, we reevaluate this claim through systematic experiments, where we propose a novel group discussion framework to enrich the set of discussion mechanisms. Interestingly, our results show that a single-agent LLM with strong prompts can achieve almost the same performance as the best existing discussion approach on a wide range of reasoning tasks and backbone LLMs. We observe that the multi-agent discussion performs better than a single agent only when there is no demonstration in the prompt. Further study reveals the common interaction mechanisms of LLMs during the discussion.
- Explanations for commonsenseqa: New dataset and models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3050–3065.
- Philip W Anderson. 1972. More is different: Broken symmetry and the nature of the hierarchical structure of science. Science, 177(4047):393–396.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403.
- Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Chateval: Towards better llm-based evaluators through multi-agent debate. arXiv preprint arXiv:2308.07201.
- Reconcile: Round-table conference improves reasoning via consensus among diverse llms. arXiv preprint arXiv:2309.13007.
- Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
- Constantinos Daskalakis and Seth Matthew Weinberg. 2012. Symmetries and optimal multi-dimensional mechanism design. In Proceedings of the 13th ACM conference on Electronic commerce, pages 370–387.
- Active prompting with chain-of-thought for large language models. arXiv preprint arXiv:2302.12246.
- Improving factuality and reasoning in language models through multiagent debate. arXiv preprint arXiv:2305.14325.
- Successive prompting for decomposing complex questions. arXiv preprint arXiv:2212.04092.
- Complexity-based prompting for multi-step reasoning. arXiv preprint arXiv:2210.00720.
- Folio: Natural language reasoning with first-order logic. arXiv preprint arXiv:2209.00840.
- Mathprompter: Mathematical reasoning using large language models. arXiv preprint arXiv:2303.05398.
- Decomposed prompting: A modular approach for solving complex tasks. arXiv preprint arXiv:2210.02406.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
- Jean-Jacques Laffont and David Martimort. 2000. Mechanism design with collusion and correlation. Econometrica, 68(2):309–342.
- On the advance of making language models better reasoners. arXiv preprint arXiv:2206.02336.
- Encouraging divergent thinking in large language models through multi-agent debate. arXiv preprint arXiv:2305.19118.
- Deductive verification of chain-of-thought reasoning. arXiv preprint arXiv:2306.03872.
- Mind’s eye: Grounded language model reasoning through simulation. arXiv preprint arXiv:2210.05359.
- Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842.
- Memory-assisted prompt editing to improve gpt-3 after deployment. arXiv preprint arXiv:2201.06009.
- Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651.
- Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837.
- Marvin Minsky. 1988. Society of mind. Simon and Schuster.
- OpenAI. 2022. Chatgpt. https://openai.com/blog/chatgpt.
- OpenAI. 2023. Gpt-4 technical report.
- Measuring and narrowing the compositionality gap in language models. arXiv preprint arXiv:2210.03350.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
- Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems.
- Kristopher Tapp. 2021. Symmetry. Springer.
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models. arXiv preprint arXiv:2305.04091.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
- Lilian Weng. 2023. Llm-powered autonomous agents. lilianweng.github.io.
- Large language models are reasoners with self-verification. arXiv preprint arXiv:2212.09561.
- Self-polish: Enhance reasoning in large language models via problem refinement. arXiv preprint arXiv:2305.14497.
- Are large language models really good logical reasoners? a comprehensive evaluation from deductive, inductive and abductive views. arXiv preprint arXiv:2306.09841.
- Re-reading improves reasoning in language models. arXiv preprint arXiv:2309.06275.
- Rcot: Detecting and rectifying factual inconsistency in reasoning by reversing chain-of-thought. arXiv preprint arXiv:2305.11499.
- Logicsolver: Towards interpretable math word problem solving with logical prompt-enhanced learning. arXiv preprint arXiv:2205.08232.
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
- Beyond chain-of-thought, effective graph-of-thought reasoning in large language models. arXiv preprint arXiv:2305.16582.
- Star: Bootstrapping reasoning with reasoning. Advances in Neural Information Processing Systems, 35:15476–15488.
- Exploring collaboration mechanisms for llm agents: A social psychology view. arXiv preprint arXiv:2310.02124.
- Opt: Open pre-trained transformer language models. arXiv e-prints, pages arXiv–2205.
- Cumulative reasoning with large language models. arXiv preprint arXiv:2308.04371.
- Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493.
Collections
Sign up for free to add this paper to one or more collections.