Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Measuring Policy Distance for Multi-Agent Reinforcement Learning (2401.11257v2)

Published 20 Jan 2024 in cs.MA and cs.AI

Abstract: Diversity plays a crucial role in improving the performance of multi-agent reinforcement learning (MARL). Currently, many diversity-based methods have been developed to overcome the drawbacks of excessive parameter sharing in traditional MARL. However, there remains a lack of a general metric to quantify policy differences among agents. Such a metric would not only facilitate the evaluation of the diversity evolution in multi-agent systems, but also provide guidance for the design of diversity-based MARL algorithms. In this paper, we propose the multi-agent policy distance (MAPD), a general tool for measuring policy differences in MARL. By learning the conditional representations of agents' decisions, MAPD can computes the policy distance between any pair of agents. Furthermore, we extend MAPD to a customizable version, which can quantify differences among agent policies on specified aspects. Based on the online deployment of MAPD, we design a multi-agent dynamic parameter sharing (MADPS) algorithm as an example of the MAPD's applications. Extensive experiments demonstrate that our method is effective in measuring differences in agent policies and specific behavioral tendencies. Moreover, in comparison to other methods of parameter sharing, MADPS exhibits superior performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Learning transferable cooperative behavior in multi-agent teams. arXiv preprint arXiv:1906.01202 (2019).
  2. Heterogeneous Multi-Robot Reinforcement Learning. arXiv preprint arXiv:2301.07137 (2023).
  3. System Neural Diversity: Measuring Behavioral Heterogeneity in Multi-Agent Learning. arXiv preprint arXiv:2305.02128 (2023).
  4. Scaling multi-agent reinforcement learning with selective parameter sharing. In International Conference on Machine Learning. PMLR, 1989–1998.
  5. Benchmarking deep reinforcement learning for continuous control. In International conference on machine learning. PMLR, 1329–1338.
  6. Learning to communicate with deep multi-agent reinforcement learning. Advances in neural information processing systems 29 (2016).
  7. Learning policy representations in multiagent systems. In International conference on machine learning. PMLR, 1802–1811.
  8. Cooperative multi-agent control using deep reinforcement learning. In Autonomous Agents and Multiagent Systems: AAMAS 2017 Workshops, Best Papers, São Paulo, Brazil, May 8-12, 2017, Revised Selected Papers 16. Springer, 66–83.
  9. Policy diagnosis via measuring role diversity in cooperative multi-agent rl. In International Conference on Machine Learning. PMLR, 9041–9071.
  10. Metric Policy Representations for Opponent Modeling. arXiv preprint arXiv:2106.05802 (2021).
  11. Policy consolidation for continual reinforcement learning. arXiv preprint arXiv:1902.00255 (2019).
  12. Celebrating diversity in shared multi-agent reinforcement learning. Advances in Neural Information Processing Systems 34 (2021), 3991–4002.
  13. Reinforcement learning and deep learning based lateral control for autonomous driving [application notes]. IEEE Computational Intelligence Magazine 14, 2 (2019), 83–98.
  14. Towards unifying behavioral and response diversity for open-ended learning in zero-sum games. Advances in Neural Information Processing Systems 34 (2021), 941–952.
  15. ROGC: Role-Oriented Graph Convolution Based Multi-Agent Reinforcement Learning. In 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1–6.
  16. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems 30 (2017).
  17. Calvin Luo. 2022. Understanding diffusion models: A unified perspective. arXiv preprint arXiv:2208.11970 (2022).
  18. Muhammad A Masood and Finale Doshi-Velez. 2019. Diversity-inducing policy gradient: Using maximum mean discrepancy to find a set of diverse policies. arXiv preprint arXiv:1906.00088 (2019).
  19. Quantifying the effects of environment and population diversity in multi-agent reinforcement learning. Autonomous Agents and Multi-Agent Systems 36, 1 (2022), 21.
  20. Frank Nielsen and Sylvain Boltz. 2011. The burbea-rao and bhattacharyya centroids. IEEE Transactions on Information Theory 57, 8 (2011), 5455–5466.
  21. Monotonic value function factorisation for deep multi-agent reinforcement learning. The Journal of Machine Learning Research 21, 1 (2020), 7234–7284.
  22. The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043 (2019).
  23. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484–489.
  24. Designing neural networks through neuroevolution. Nature Machine Intelligence 1, 1 (2019), 24–35.
  25. Learning multiagent communication with backpropagation. Advances in neural information processing systems 29 (2016).
  26. Cédric Villani et al. 2009. Optimal transport: old and new. Vol. 338. Springer.
  27. Roma: Multi-agent reinforcement learning with emergent roles. arXiv preprint arXiv:2003.08039 (2020).
  28. Rode: Learning roles to decompose multi-agent tasks. arXiv preprint arXiv:2010.01523 (2020).
  29. Chongjie Zhang and Victor Lesser. 2013. Coordinating multi-agent reinforcement learning with limited communication. In Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems. 1101–1108.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Tianyi Hu (17 papers)
  2. Zhiqiang Pu (17 papers)
  3. Xiaolin Ai (7 papers)
  4. Tenghai Qiu (10 papers)
  5. Jianqiang Yi (9 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets