Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure (2405.00902v1)

Published 1 May 2024 in cs.LG, cs.AI, and cs.MA

Abstract: Multi-agent reinforcement learning (MARL) algorithms often struggle to find strategies close to Pareto optimal Nash Equilibrium, owing largely to the lack of efficient exploration. The problem is exacerbated in sparse-reward settings, caused by the larger variance exhibited in policy learning. This paper introduces MESA, a novel meta-exploration method for cooperative multi-agent learning. It learns to explore by first identifying the agents' high-rewarding joint state-action subspace from training tasks and then learning a set of diverse exploration policies to "cover" the subspace. These trained exploration policies can be integrated with any off-policy MARL algorithm for test-time tasks. We first showcase MESA's advantage in a multi-step matrix game. Furthermore, experiments show that with learned exploration policies, MESA achieves significantly better performance in sparse-reward tasks in several multi-agent particle environments and multi-agent MuJoCo environments, and exhibits the ability to generalize to more challenging tasks at test time.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Modular multitask reinforcement learning with policy sketches. In ICML. PMLR, 166–175.
  2. Hindsight experience replay. arXiv preprint arXiv:1707.01495 (2017).
  3. Unifying count-based exploration and intrinsic motivation. NeurIPS 29 (2016), 1471–1479.
  4. Avrim Blum and Ronald Rivest. 1988. Training a 3-node neural network is NP-complete. NeurIPS 1 (1988).
  5. Exploration by random network distillation. In ICLR.
  6. GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms. In ICML.
  7. Offline Meta Learning of Exploration. arXiv preprint arXiv:2008.02598 (2020).
  8. Simon Du and Jason Lee. 2018. On the power of over-parametrization in neural networks with quadratic activation. In ICML. PMLR, 1329–1338.
  9. Rl 2: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779 (2016).
  10. First return, then explore. Nature 590, 7847 (2021), 580–586.
  11. Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In ICML. PMLR, 1407–1416.
  12. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In Proceedings of the 34th ICML (PMLR, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 1126–1135. https://proceedings.mlr.press/v70/finn17a.html
  13. Continuous deep q-learning with model-based acceleration. In ICML. PMLR, 2829–2838.
  14. Meta-Reinforcement Learning of Structured Exploration Strategies. NIPS 2018 (2018), 5302–5311.
  15. Uneven: Universal value exploration for multi-agent reinforcement learning. In ICML. PMLR, 3930–3941.
  16. Multi-task deep reinforcement learning with popart. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3796–3803.
  17. Inequity aversion improves cooperation in intertemporal social dilemmas. NIPS 2018 31 (2018).
  18. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In ICML. PMLR, 3040–3049.
  19. Intrinsic social motivation via causal influence in multi-agent RL. (2018).
  20. Meta reinforcement learning with task embedding and shared policy. arXiv preprint arXiv:1905.06527 (2019).
  21. Cooperative exploration for multi-agent deep reinforcement learning. In ICML. PMLR, 6826–6836.
  22. Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE transactions on information theory 28, 2 (1982), 129–137.
  23. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In NIPS.
  24. MAVEN: Multi-Agent Variational Exploration. In NeurIPS, Vol. 32. 7613–7624.
  25. Count-based exploration with neural density models. In ICML. PMLR, 2721–2730.
  26. Interesting object, curious agent: Learning task-agnostic exploration. NeurIPS 34 (2021), 20516–20530.
  27. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning. In ICLR (Poster).
  28. Curiosity-driven exploration by self-supervised prediction. In ICML. PMLR, 2778–2787.
  29. Facmac: Factored multi-agent centralised policy gradients. NeurIPS 34 (2021).
  30. Dean A Pomerleau. 1991. Efficient training of artificial neural networks for autonomous navigation. Neural computation 3, 1 (1991), 88–97.
  31. Offline Meta-Reinforcement Learning with Online Self-Supervision. arXiv preprint arXiv:2107.03974 (2021).
  32. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In ICML. PMLR, 5331–5340.
  33. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In ICML. PMLR, 4295–4304.
  34. Learning by playing solving sparse reward tasks from scratch. In ICML. PMLR, 4344–4353.
  35. Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated Exploration. In Proceedings of the 21st AAMAS. 1146–1154.
  36. Parrot: Data-Driven Behavioral Priors for Reinforcement Learning. In ICLR.
  37. # exploration: A study of count-based exploration for deep reinforcement learning. In 31st NIPS, Vol. 30. 1–18.
  38. Quadratic Q-network for learning continuous control for autonomous vehicles. arXiv preprint arXiv:1912.00074 (2019).
  39. Influence-Based Multi-Agent Exploration. In ICLR.
  40. Learning to explore with meta-policy gradient. arXiv preprint arXiv:1803.05044 (2018).
  41. The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games. arXiv:2103.01955 [cs.LG]
  42. Metacure: Meta reinforcement learning with empowerment-driven exploration. In ICML. PMLR, 12600–12610.
  43. Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration. NeurIPS 34 (2021).
  44. VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning. In ICLR.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zhicheng Zhang (76 papers)
  2. Yancheng Liang (8 papers)
  3. Yi Wu (171 papers)
  4. Fei Fang (103 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets