Imagine, Initialize, and Explore: An Effective Exploration Method in Multi-Agent Reinforcement Learning (2402.17978v2)
Abstract: Effective exploration is crucial to discovering optimal strategies for multi-agent reinforcement learning (MARL) in complex coordination tasks. Existing methods mainly utilize intrinsic rewards to enable committed exploration or use role-based learning for decomposing joint action spaces instead of directly conducting a collective search in the entire action-observation space. However, they often face challenges obtaining specific joint action sequences to reach successful states in long-horizon tasks. To address this limitation, we propose Imagine, Initialize, and Explore (IIE), a novel method that offers a promising solution for efficient multi-agent exploration in complex scenarios. IIE employs a transformer model to imagine how the agents reach a critical state that can influence each other's transition functions. Then, we initialize the environment at this state using a simulator before the exploration phase. We formulate the imagination as a sequence modeling problem, where the states, observations, prompts, actions, and rewards are predicted autoregressively. The prompt consists of timestep-to-go, return-to-go, influence value, and one-shot demonstration, specifying the desired state and trajectory as well as guiding the action generation. By initializing agents at the critical states, IIE significantly increases the likelihood of discovering potentially important under-explored regions. Despite its simplicity, empirical results demonstrate that our method outperforms multi-agent exploration baselines on the StarCraft Multi-Agent Challenge (SMAC) and SMACv2 environments. Particularly, IIE shows improved performance in the sparse-reward SMAC tasks and produces more effective curricula over the initialized states than other generative methods, such as CVAE-GAN and diffusion models.
- Is Conditional Generative Modeling all you need for Decision Making? In The Eleventh International Conference on Learning Representations.
- CVAE-GAN: fine-grained image generation through asymmetric training. In Proceedings of the IEEE international conference on computer vision, 2745–2754.
- Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34: 15084–15097.
- Liir: Learning individual intrinsic reward in multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 32.
- Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995.
- First return, then explore. Nature, 590(7847): 580–586.
- SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2212.07489.
- Memory based trajectory-conditioned policies for learning from sparse rewards. Advances in Neural Information Processing Systems, 33: 4333–4345.
- Deep recurrent q-learning for partially observable mdps. In 2015 aaai fall symposium series.
- Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In International conference on machine learning, 3040–3049. PMLR.
- MASER: Multi-Agent Reinforcement Learning with Subgoals Generated from Experience Replay Buffer. In Chaudhuri, K.; Jegelka, S.; Song, L.; Szepesvari, C.; Niu, G.; and Sabato, S., eds., Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, 10041–10052. PMLR.
- Qmdp-net: Deep learning for planning under partial observability. Advances in neural information processing systems, 30.
- Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190: 82–94.
- Multi-game decision transformers. Advances in Neural Information Processing Systems, 35: 27921–27936.
- Diffusion-lm improves controllable text generation. Advances in Neural Information Processing Systems, 35: 4328–4343.
- Maven: Multi-agent variational exploration. Advances in Neural Information Processing Systems, 32.
- Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896.
- Sources of Hallucination by Large Language Models on Inference Tasks. arXiv preprint arXiv:2305.14552.
- A concise introduction to decentralized POMDPs. Springer.
- Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation. In International Conference on Learning Representations.
- Stabilizing transformers for reinforcement learning. In International conference on machine learning, 7487–7498. PMLR.
- Improving language understanding by generative pre-training. In OpenAI. OpenAI.
- Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Advances in neural information processing systems, 33: 10199–10210.
- Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning, 4295–4304. PMLR.
- A generalist agent. arXiv preprint arXiv:2205.06175.
- The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043.
- Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International Conference on Machine Learning, 5887–5896. PMLR.
- QPLEX: Duplex Dueling Multi-Agent Q-Learning. In International Conference on Learning Representations.
- Multi-agent reinforcement learning for active voltage control on power distribution networks. Advances in Neural Information Processing Systems, 34: 3271–3284.
- Rode: Learning roles to decompose multi-agent tasks. arXiv preprint arXiv:2010.01523.
- Influence-Based Multi-Agent Exploration. In International Conference on Learning Representations.
- The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems, 35: 24611–24624.
- Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration. Advances in Neural Information Processing Systems, 34.
- SMARTS: An Open-Source Scalable Multi-Agent RL Training School for Autonomous Driving. In Conference on Robot Learning, 264–285. PMLR.
- Zeyang Liu (13 papers)
- Lipeng Wan (27 papers)
- Xinrui Yang (14 papers)
- Zhuoran Chen (3 papers)
- Xingyu Chen (98 papers)
- Xuguang Lan (34 papers)