Cautiously-Optimistic Knowledge Sharing for Cooperative Multi-Agent Reinforcement Learning (2312.12095v1)
Abstract: While decentralized training is attractive in multi-agent reinforcement learning (MARL) for its excellent scalability and robustness, its inherent coordination challenges in collaborative tasks result in numerous interactions for agents to learn good policies. To alleviate this problem, action advising methods make experienced agents share their knowledge about what to do, while less experienced agents strictly follow the received advice. However, this method of sharing and utilizing knowledge may hinder the team's exploration of better states, as agents can be unduly influenced by suboptimal or even adverse advice, especially in the early stages of learning. Inspired by the fact that humans can learn not only from the success but also from the failure of others, this paper proposes a novel knowledge sharing framework called Cautiously-Optimistic kNowledge Sharing (CONS). CONS enables each agent to share both positive and negative knowledge and cautiously assimilate knowledge from others, thereby enhancing the efficiency of early-stage exploration and the agents' robustness to adverse advice. Moreover, considering the continuous improvement of policies, agents value negative knowledge more in the early stages of learning and shift their focus to positive knowledge in the later stages. Our framework can be easily integrated into existing Q-learning based methods without introducing additional training costs. We evaluate CONS in several challenging multi-agent tasks and find it excels in environments where optimal behavioral patterns are difficult to discover, surpassing the baselines in terms of convergence rate and final performance.
- An Enhanced Advising Model in Teacher-Student Framework using State Categorization. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 35(8): 6653–6660.
- Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, 10707–10717. Curran Associates, Inc.
- Simultaneously Learning and Advising in Multiagent Reinforcement Learning. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 1100–1108. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems.
- Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge? arXiv preprint arXiv:2011.09533.
- Learning Individually Inferred Communication for Multi-Agent Cooperation. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, 22069–22079. Curran Associates, Inc.
- Explainable Action Advising for Multi-Agent Reinforcement Learning. In 2023 IEEE International Conference on Robotics and Automation (ICRA), 5515–5521.
- Hammer: Multi-level coordination of reinforcement learning agents via learned messaging. arXiv preprint arXiv:2102.00824.
- Deep recurrent q-learning for partially observable mdps. In 2015 aaai fall symposium series.
- Action Advising with Advice Imitation in Deep Reinforcement Learning. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 629–637. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems.
- Learning Attentional Communication for Multi-Agent Cooperation. In Advances in Neural Information Processing Systems (NeurIPS), volume 31. Curran Associates, Inc.
- I2Q: A Fully Decentralized Q-Learning Algorithm. In Advances in Neural Information Processing Systems (NeurIPS), volume 35, 20469–20481. Curran Associates, Inc.
- Learning Hierarchical Teaching Policies for Cooperative Agents. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 620–628. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems.
- Littman, M. L. 1994. Markov games as a framework for multi-agent reinforcement learning. In Machine Learning Proceedings 1994, 157–163. San Francisco (CA): Morgan Kaufmann.
- Multi-Agent Game Abstraction via Graph Attention Neural Network. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 34(05): 7211–7218.
- Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In Advances in Neural Information Processing Systems (NeurIPS), volume 30. Curran Associates, Inc.
- MAVEN: Multi-Agent Variational Exploration. In Advances in Neural Information Processing Systems (NeurIPS), volume 32. Curran Associates, Inc.
- Hysteretic Q-learning : an algorithm for Decentralized Reinforcement Learning in Cooperative Multi-Agent Teams. In 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 64–69.
- Human-level control through deep reinforcement learning. nature, 518(7540): 529–533.
- Learning to Teach in Cooperative Multiagent Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 33(01): 6128–6136.
- Lenient Multi-Agent Deep Reinforcement Learning. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 443–451. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems.
- FACMAC: Factored Multi-Agent Centralised Policy Gradients. In Advances in Neural Information Processing Systems (NeurIPS), volume 34, 12208–12221. Curran Associates, Inc.
- QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning (ICML), volume 80, 4295–4304. PMLR.
- Individualized Controlled Continuous Communication Model for Multiagent Cooperative and Competitive Tasks. In International Conference on Learning Representations (ICLR).
- QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning. In Chaudhuri, K.; and Salakhutdinov, R., eds., Proceedings of the 36th International Conference on Machine Learning (ICML), volume 97, 5887–5896. PMLR.
- Learning Multiagent Communication with Backpropagation. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NeurIPS), 2252–2260. Red Hook, NY, USA: Curran Associates Inc.
- Multiagent cooperation and competition with deep reinforcement learning. PLOS ONE, 12(4): 1–15.
- Tan, M. 1993. Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents. In Proceedings of the 10th International Conference on Machine Learning (ICML), 330–337.
- MAMBPO: Sample-efficient multi-robot reinforcement learning using learned world models. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 5635–5640.
- Learning to Incentivize Other Learning Agents. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, 15208–15219. Curran Associates, Inc.
- CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario. In The World Wide Web Conference (WWW), 3620–3624. New York, NY, USA: Association for Computing Machinery.
- A Q-Values Sharing Framework for Multi-Agent Reinforcement Learning under Budget Constraint. ACM Trans. Auton. Adapt. Syst., 15(2).