Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning (2403.08936v2)

Published 13 Mar 2024 in cs.MA, cs.AI, and cs.RO

Abstract: Multi-Agent Reinforcement Learning (MARL) algorithms face the challenge of efficient exploration due to the exponential increase in the size of the joint state-action space. While demonstration-guided learning has proven beneficial in single-agent settings, its direct applicability to MARL is hindered by the practical difficulty of obtaining joint expert demonstrations. In this work, we introduce a novel concept of personalized expert demonstrations, tailored for each individual agent or, more broadly, each individual type of agent within a heterogeneous team. These demonstrations solely pertain to single-agent behaviors and how each agent can achieve personal goals without encompassing any cooperative elements, thus naively imitating them will not achieve cooperation due to potential conflicts. To this end, we propose an approach that selectively utilizes personalized expert demonstrations as guidance and allows agents to learn to cooperate, namely personalized expert-guided MARL (PegMARL). This algorithm utilizes two discriminators: the first provides incentives based on the alignment of individual agent behavior with demonstrations, and the second regulates incentives based on whether the behaviors lead to the desired outcome. We evaluate PegMARL using personalized demonstrations in both discrete and continuous environments. The results demonstrate that PegMARL learns near-optimal policies even when provided with suboptimal demonstrations and outperforms state-of-the-art MARL algorithms in solving coordinated tasks. We also showcase PegMARL's capability of leveraging joint demonstrations in the StarCraft scenario and converging effectively even with demonstrations from non-co-trained policies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Heterogeneous multi-robot reinforcement learning. arXiv preprint arXiv:2301.07137, 2023.
  2. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016.
  3. Policy gradient from demonstration and curiosity. IEEE Transactions on Cybernetics, 2022.
  4. Learning altruistic behaviours in reinforcement learning without external rewards. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=KxbhdyiPHE.
  5. Deep q-learning from demonstrations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  6. Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016.
  7. Policy optimization with demonstrations. In International conference on machine learning, pp. 2469–2478. PMLR, 2018.
  8. Improved cooperative multi-agent reinforcement learning algorithm augmented by mixing demonstrations from centralized policy. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp.  1089–1098, 2019.
  9. Multi-agent actor-critic for mixed cooperative-competitive environments. Neural Information Processing Systems (NIPS), 2017.
  10. Emergence of grounded compositional language in multi-agent populations. arXiv preprint arXiv:1703.04908, 2017.
  11. Algorithms for inverse reinforcement learning. In Icml, volume 1, pp.  2, 2000.
  12. Dean A Pomerleau. Efficient training of artificial neural networks for autonomous navigation. Neural computation, 3(1):88–97, 1991.
  13. Sample-efficient multi-agent reinforcement learning with demonstrations for flocking control. arXiv preprint arXiv:2209.08351, 2022.
  14. Reinforcement learning with sparse rewards using guidance from offline demonstration. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=YJ1WzgMVsMt.
  15. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp.  627–635. JMLR Workshop and Conference Proceedings, 2011.
  16. The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019.
  17. Agent-time attention for sparse rewards multi-agent reinforcement learning. arXiv preprint arXiv:2210.17540, 2022.
  18. Multi-agent generative adversarial imitation learning. Advances in neural information processing systems, 31, 2018.
  19. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817, 2017.
  20. Dm2: Decentralized multi-agent reinforcement learning via distribution matching. 2023.
  21. The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955, 2021.
  22. Maximum entropy inverse reinforcement learning. In Aaai, volume 8, pp.  1433–1438. Chicago, IL, USA, 2008.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Peihong Yu (8 papers)
  2. Manav Mishra (4 papers)
  3. Alec Koppel (72 papers)
  4. Carl Busart (21 papers)
  5. Priya Narayan (1 paper)
  6. Dinesh Manocha (366 papers)
  7. Amrit Bedi (3 papers)
  8. Pratap Tokekar (96 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets