Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Verco: Learning Coordinated Verbal Communication for Multi-agent Reinforcement Learning (2404.17780v1)

Published 27 Apr 2024 in cs.MA and cs.AI

Abstract: In recent years, multi-agent reinforcement learning algorithms have made significant advancements in diverse gaming environments, leading to increased interest in the broader application of such techniques. To address the prevalent challenge of partial observability, communication-based algorithms have improved cooperative performance through the sharing of numerical embedding between agents. However, the understanding of the formation of collaborative mechanisms is still very limited, making designing a human-understandable communication mechanism a valuable problem to address. In this paper, we propose a novel multi-agent reinforcement learning algorithm that embeds LLMs into agents, endowing them with the ability to generate human-understandable verbal communication. The entire framework has a message module and an action module. The message module is responsible for generating and sending verbal messages to other agents, effectively enhancing information sharing among agents. To further enhance the message module, we employ a teacher model to generate message labels from the global view and update the student model through Supervised Fine-Tuning (SFT). The action module receives messages from other agents and selects actions based on current local observations and received messages. Experiments conducted on the Overcooked game demonstrate our method significantly enhances the learning efficiency and performance of existing methods, while also providing an interpretable tool for humans to understand the process of multi-agent cooperation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Reinforcement learning-based multi-agent system for network traffic signal control. IET Intelligent Transport Systems, 4(2):128–135, 2010.
  2. Coordinated multi-robot exploration under communication constraints using decentralized markov decision processes. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 26, pages 2017–2023, 2012.
  3. Coordinated multi-agent reinforcement learning in networked distributed pomdps. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 25, pages 764–770, 2011.
  4. Improving sample efficiency in model-free reinforcement learning from images. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 10674–10681, 2021.
  5. Adaptive parameter sharing for multi-agent reinforcement learning. In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6035–6039, 2024. doi: 10.1109/ICASSP48485.2024.10447262.
  6. Dealing with non-stationarity in multi-agent deep reinforcement learning. arXiv preprint arXiv:1906.04737, 2019.
  7. Maviper: Learning decision tree policies for interpretable multi-agent reinforcement learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 251–266. Springer, 2022.
  8. A survey on interpretable reinforcement learning. CoRR, abs/2112.13112, 2021. URL https://arxiv.org/abs/2112.13112.
  9. Value-decomposition networks for cooperative multi-agent learning. CoRR, abs/1706.05296, 2017.
  10. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In Jennifer G. Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 4292–4301. PMLR, 2018.
  11. Counterfactual multi-agent policy gradients. In Sheila A. McIlraith and Kilian Q. Weinberger, editors, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 2974–2982. AAAI Press, 2018.
  12. The surprising effectiveness of MAPPO in cooperative, multi-agent games. CoRR, abs/2103.01955, 2021.
  13. Multi-agent actor-critic for mixed cooperative-competitive environments. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 6379–6390, 2017.
  14. Sea: A spatially explicit architecture for multi-agent reinforcement learning. In 2023 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2023a. doi: 10.1109/IJCNN54540.2023.10191819.
  15. Communication and cooperation. Journal of Economic Behavior & Organization, 47(2):179–195, 2002.
  16. An overview of multi-agent reinforcement learning from game theoretical perspective. arXiv preprint arXiv:2011.00583, 2020.
  17. From explicit communication to tacit cooperation: A novel paradigm for cooperative marl. arXiv preprint arXiv:2304.14656, 2023b.
  18. Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. CoRR, abs/1703.10069, 2017. URL http://arxiv.org/abs/1703.10069.
  19. Learning multiagent communication with backpropagation. In Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett, editors, Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 2244–2252, 2016. URL https://proceedings.neurips.cc/paper/2016/hash/55b1927fdafef39c48e5b73b5d61ea60-Abstract.html.
  20. True knowledge comes from practice: Aligning large language models with embodied environments via reinforcement learning. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=hILVmJ4Uvu.
  21. Grounding large language models in interactive environments with online reinforcement learning. arXiv preprint arXiv:2302.02662, 2023.
  22. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  23. Roco: Dialectic multi-robot collaboration with large language models. arXiv preprint arXiv:2307.04738, 2023.
  24. Vima: General robot manipulation with multimodal prompts, 2023.
  25. Llm+ p: Empowering large language models with optimal planning proficiency. arXiv preprint arXiv:2304.11477, 2023.
  26. Do as i can, not as i say: Grounding language in robotic affordances, 2022.
  27. Proagent: Building proactive cooperative ai with large language models. arXiv preprint arXiv:2308.11339, 2023a.
  28. Controlling large language model-based agents for large-scale decision-making: An actor-critic approach. arXiv preprint arXiv:2311.13884, 2023b.
  29. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
  30. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
  31. Collaborating with language models for embodied reasoning. arXiv preprint arXiv:2302.00763, 2023.
  32. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  33. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  34. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft, 2022.
  35. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5(3):220–235, 2023.
  36. Multi-agent incentive communication via decentralized teammate modeling. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022, pages 9466–9474. AAAI Press, 2022. URL https://ojs.aaai.org/index.php/AAAI/article/view/21179.
  37. Leveraging large language models for optimised coordination in textual multi-agent reinforcement learning. 2023.
  38. Mapping language models to grounded conceptual spaces. In International conference on learning representations, 2021.
  39. Is independent learning all you need in the starcraft multi-agent challenge? arXiv preprint arXiv:2011.09533, 2020.
  40. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015.
  41. Gpt-4 technical report, 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Dapeng Li (32 papers)
  2. Hang Dong (65 papers)
  3. Lu Wang (329 papers)
  4. Bo Qiao (18 papers)
  5. Si Qin (24 papers)
  6. Qingwei Lin (81 papers)
  7. Dongmei Zhang (193 papers)
  8. Qi Zhang (784 papers)
  9. Zhiwei Xu (84 papers)
  10. Bin Zhang (227 papers)
  11. Guoliang Fan (23 papers)
Citations (2)