From Centralized to Self-Supervised: Pursuing Realistic Multi-Agent Reinforcement Learning (2312.08662v1)
Abstract: In real-world environments, autonomous agents rely on their egocentric observations. They must learn adaptive strategies to interact with others who possess mixed motivations, discernible only through visible cues. Several Multi-Agent Reinforcement Learning (MARL) methods adopt centralized approaches that involve either centralized training or reward-sharing, often violating the realistic ways in which living organisms, like animals or humans, process information and interact. MARL strategies deploying decentralized training with intrinsic motivation offer a self-supervised approach, enable agents to develop flexible social strategies through the interaction of autonomous agents. However, by contrasting the self-supervised and centralized methods, we reveal that populations trained with reward-sharing methods surpass those using self-supervised methods in a mixed-motive environment. We link this superiority to specialized role emergence and an agent's expertise in its role. Interestingly, this gap shrinks in pure-motive settings, emphasizing the need for evaluations in more complex, realistic environments (mixed-motive). Our preliminary results suggest a gap in population performance that can be closed by improving self-supervised methods and thereby pushing MARL closer to real-world readiness.
- Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30, 2017.
- The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems, 35:24611–24624, 2022.
- Inequity aversion improves cooperation in intertemporal social dilemmas. Advances in neural information processing systems, 31, 2018.
- Social diversity and social preferences in mixed-motive reinforcement learning. arXiv preprint arXiv:2002.02325, 2020.
- A multi-agent reinforcement learning model of reputation and cooperation in human groups. arXiv e-prints, pages arXiv–2103, 2021.
- Bowen Baker. Emergent reciprocity and team formation from randomized uncertain social preferences. Advances in neural information processing systems, 33:15786–15799, 2020.
- Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pages 2778–2787. PMLR, 2017.
- Learning to play with intrinsically-motivated, self-aware agents. Advances in neural information processing systems, 31, 2018.
- Exploration by random network distillation. arXiv preprint arXiv:1810.12894, 2018.
- Self-supervised exploration via disagreement. In International conference on machine learning, pages 5062–5071. PMLR, 2019.
- Active world model learning with progress curiosity. In International conference on machine learning, pages 5306–5315. PMLR, 2020.
- Curious replay for model-based adaptation. 2023.
- Emergent social learning via multi-agent reinforcement learning. In International Conference on Machine Learning, pages 7991–8004. PMLR, 2021.
- Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In International conference on machine learning, pages 3040–3049. PMLR, 2019.
- Elign: Expectation alignment as a multi-agent intrinsic reward. Advances in Neural Information Processing Systems, 35:8304–8317, 2022.
- Learning roles with emergent social value orientations. arXiv preprint arXiv:2301.13812, 2023.
- What is intrinsic motivation? a typology of computational approaches. Frontiers in neurorobotics, 1:6, 2007.
- Jürgen Schmidhuber. A possibility for implementing curiosity and boredom in model-building neural controllers. In Proc. of the international conference on simulation of adaptive behavior: From animals to animats, pages 222–227, 1991.
- Language grounding through social interactions and curiosity-driven multi-goal learning. arXiv preprint arXiv:1911.03219, 2019.
- Learning with amigo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122, 2020.
- Intrinsically motivated goal exploration processes with automatic curriculum learning. The Journal of Machine Learning Research, 23(1):6818–6858, 2022.
- Autotelic agents with intrinsically motivated goal-conditioned reinforcement learning: a short survey. Journal of Artificial Intelligence Research, 74:1159–1199, 2022.
- Empowerment: A universal agent-centric measure of control. In 2005 ieee congress on evolutionary computation, volume 1, pages 128–135. IEEE, 2005.
- Variational empowerment as representation learning for goal-conditioned reinforcement learning. In International Conference on Machine Learning, pages 1953–1963. PMLR, 2021.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Scalable evaluation of multi-agent reinforcement learning with melting pot. In International conference on machine learning, pages 6187–6199. PMLR, 2021.