Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
11 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
40 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
37 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

From Centralized to Self-Supervised: Pursuing Realistic Multi-Agent Reinforcement Learning (2312.08662v1)

Published 14 Dec 2023 in cs.MA

Abstract: In real-world environments, autonomous agents rely on their egocentric observations. They must learn adaptive strategies to interact with others who possess mixed motivations, discernible only through visible cues. Several Multi-Agent Reinforcement Learning (MARL) methods adopt centralized approaches that involve either centralized training or reward-sharing, often violating the realistic ways in which living organisms, like animals or humans, process information and interact. MARL strategies deploying decentralized training with intrinsic motivation offer a self-supervised approach, enable agents to develop flexible social strategies through the interaction of autonomous agents. However, by contrasting the self-supervised and centralized methods, we reveal that populations trained with reward-sharing methods surpass those using self-supervised methods in a mixed-motive environment. We link this superiority to specialized role emergence and an agent's expertise in its role. Interestingly, this gap shrinks in pure-motive settings, emphasizing the need for evaluations in more complex, realistic environments (mixed-motive). Our preliminary results suggest a gap in population performance that can be closed by improving self-supervised methods and thereby pushing MARL closer to real-world readiness.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30, 2017.
  2. The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems, 35:24611–24624, 2022.
  3. Inequity aversion improves cooperation in intertemporal social dilemmas. Advances in neural information processing systems, 31, 2018.
  4. Social diversity and social preferences in mixed-motive reinforcement learning. arXiv preprint arXiv:2002.02325, 2020.
  5. A multi-agent reinforcement learning model of reputation and cooperation in human groups. arXiv e-prints, pages arXiv–2103, 2021.
  6. Bowen Baker. Emergent reciprocity and team formation from randomized uncertain social preferences. Advances in neural information processing systems, 33:15786–15799, 2020.
  7. Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pages 2778–2787. PMLR, 2017.
  8. Learning to play with intrinsically-motivated, self-aware agents. Advances in neural information processing systems, 31, 2018.
  9. Exploration by random network distillation. arXiv preprint arXiv:1810.12894, 2018.
  10. Self-supervised exploration via disagreement. In International conference on machine learning, pages 5062–5071. PMLR, 2019.
  11. Active world model learning with progress curiosity. In International conference on machine learning, pages 5306–5315. PMLR, 2020.
  12. Curious replay for model-based adaptation. 2023.
  13. Emergent social learning via multi-agent reinforcement learning. In International Conference on Machine Learning, pages 7991–8004. PMLR, 2021.
  14. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In International conference on machine learning, pages 3040–3049. PMLR, 2019.
  15. Elign: Expectation alignment as a multi-agent intrinsic reward. Advances in Neural Information Processing Systems, 35:8304–8317, 2022.
  16. Learning roles with emergent social value orientations. arXiv preprint arXiv:2301.13812, 2023.
  17. What is intrinsic motivation? a typology of computational approaches. Frontiers in neurorobotics, 1:6, 2007.
  18. Jürgen Schmidhuber. A possibility for implementing curiosity and boredom in model-building neural controllers. In Proc. of the international conference on simulation of adaptive behavior: From animals to animats, pages 222–227, 1991.
  19. Language grounding through social interactions and curiosity-driven multi-goal learning. arXiv preprint arXiv:1911.03219, 2019.
  20. Learning with amigo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122, 2020.
  21. Intrinsically motivated goal exploration processes with automatic curriculum learning. The Journal of Machine Learning Research, 23(1):6818–6858, 2022.
  22. Autotelic agents with intrinsically motivated goal-conditioned reinforcement learning: a short survey. Journal of Artificial Intelligence Research, 74:1159–1199, 2022.
  23. Empowerment: A universal agent-centric measure of control. In 2005 ieee congress on evolutionary computation, volume 1, pages 128–135. IEEE, 2005.
  24. Variational empowerment as representation learning for goal-conditioned reinforcement learning. In International Conference on Machine Learning, pages 1953–1963. PMLR, 2021.
  25. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  26. Scalable evaluation of multi-agent reinforcement learning with melting pot. In International conference on machine learning, pages 6187–6199. PMLR, 2021.

Summary

We haven't generated a summary for this paper yet.