LLM Harmony: Multi-Agent Communication for Problem Solving (2401.01312v1)
Abstract: LLMs have revolutionized Natural Language Processing but exhibit limitations, particularly in autonomously addressing novel challenges such as reasoning and problem-solving. Traditional techniques like chain-of-thought prompting necessitate explicit human guidance. This paper introduces a novel multi-agent communication framework, inspired by the CAMEL model, to enhance LLMs' autonomous problem-solving capabilities. The framework employs multiple LLM agents, each with a distinct persona, engaged in role-playing communication, offering a nuanced and adaptable approach to diverse problem scenarios. Extensive experimentation demonstrates the framework's superior performance and adaptability, providing valuable insights into the collaborative potential of multiple agents in overcoming the limitations of individual models.
- Mathqa: Towards interpretable math word problem solving with operation-based formalisms. arXiv preprint arXiv:1905.13319.
- Large language models and the perils of their hallucinations. Critical Care, 27(1):1–2.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.
- Callison-Burch, C. (2009). Fast, cheap, and creative: Evaluating translation quality using amazon’s mechanical turk. In Proceedings of the 2009 conference on empirical methods in natural language processing, pages 286–295.
- Evaluation of text generation: A survey. corr abs/2006.14799 (2020). arXiv preprint arXiv:2006.14799.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
- Neural symbolic reader: Scalable integration of distributed and symbolic representations for reading comprehension. In International Conference on Learning Representations.
- Can large language models be an alternative to human evaluations? arXiv preprint arXiv:2305.01937.
- Semantically-aligned equation generation for solving and reasoning math word problems. arXiv preprint arXiv:1811.00720.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
- Lm vs lm: Detecting factual errors via cross examination. arXiv preprint arXiv:2305.13281.
- Cooperative ai: machines must learn to find common ground. Nature, 593(7857):33–36.
- Open problems in cooperative ai. arXiv preprint arXiv:2012.08630.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Improving factuality and reasoning in language models through multiagent debate. arXiv preprint arXiv:2305.14325.
- Human-like summarization evaluation with chatgpt. arXiv preprint arXiv:2304.02554.
- Emergent linguistic phenomena in multi-agent communication games.
- What would jiminy cricket do? towards agents that behave morally. arXiv preprint arXiv:2110.13136.
- The perils of using mechanical turk to evaluate open-ended text generation. arXiv preprint arXiv:2109.06835.
- Kondrak, G. (2005). N-gram similarity and distance. In International symposium on string processing and information retrieval, pages 115–126. Springer.
- Negotiation and honesty in artificial intelligence methods for the board game of diplomacy. Nature Communications, 13(1):7214.
- Multi-agent communication meets natural language: Synergies between functional and structural language learning.
- Emergent translation in multi-agent communication.
- Camel: Communicative agents for" mind" exploration of large scale language model society. arXiv preprint arXiv:2303.17760.
- Encouraging divergent thinking in large language models through multi-agent debate. arXiv preprint arXiv:2305.19118.
- Program induction by rationale generation: Learning to solve and explain algebraic word problems. arXiv preprint arXiv:1705.04146.
- Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. arXiv preprint arXiv:2305.01210.
- Gpteval: Nlg evaluation using gpt-4 with better human alignment. arXiv preprint arXiv:2303.16634.
- Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842.
- A synergistic core for human brain evolution and cognition. Nature Neuroscience, 25(6):771–782.
- Why we need new evaluation metrics for nlg. arXiv preprint arXiv:1707.06875.
- OpenAI (2023). Gpt-4 technical report. arxiv 2303.08774. View in Article, 2:13.
- Training language models to follow instructions with human feedback, 2022. URL https://arxiv. org/abs/2203.02155, 13.
- Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark. In International Conference on Machine Learning, pages 26837–26867. PMLR.
- Are nlp models really able to solve simple math word problems? arXiv preprint arXiv:2103.07191.
- Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334.
- Communicative agents for software development. arXiv preprint arXiv:2307.07924.
- Explain yourself! leveraging language models for commonsense reasoning. arXiv preprint arXiv:1906.02361.
- Beyond segmentation: Road network generation with multi-modal llms. arXiv preprint arXiv:2310.09755.
- ROUGE, L. C. (2004). A package for automatic evaluation of summaries. In Proceedings of Workshop on Text Summarization of ACL, Spain, volume 5.
- Solving general arithmetic word problems. arXiv preprint arXiv:1608.01413.
- Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207.
- Neural theory-of-mind? on the limits of social intelligence in large lms. arXiv preprint arXiv:2210.13312.
- Socialiqa: Commonsense reasoning about social interactions. arXiv preprint arXiv:1904.09728.
- Self-critiquing models for assisting human evaluators. arXiv preprint arXiv:2206.05802.
- Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
- Minding language models’(lack of) theory of mind: A plug-and-play multi-character belief tracker. arXiv preprint arXiv:2306.00924.
- Are large language models good evaluators for abstractive summarization? arXiv preprint arXiv:2305.13091.
- Susskind, L. E. (1985). Scorable games: A better way to teach negotiation. Negot. J., 1:205.
- Using simulations to teach negotiation: Pedagogical theory and practice. Teaching negotiation: Ideas and innovations, pages 285–310.
- Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937.
- Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Best practices for the human evaluation of automatically generated text. In Proceedings of the 12th International Conference on Natural Language Generation, pages 355–368.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
- Evidence for a collective intelligence factor in the performance of human groups. science, 330(6004):686–688.
- Large language models are diverse role-players for summarization evaluation. arXiv preprint arXiv:2303.15078.
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.
- Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685.
- I cast detect thoughts: Learning to converse and guide with intents and theory-of-mind in dungeons and dragons. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11136–11155.
- Sumedh Rasal (6 papers)