Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Episodic Memory Utilization of Cooperative Multi-Agent Reinforcement Learning (2403.01112v2)

Published 2 Mar 2024 in cs.LG and cs.MA

Abstract: In cooperative multi-agent reinforcement learning (MARL), agents aim to achieve a common goal, such as defeating enemies or scoring a goal. Existing MARL algorithms are effective but still require significant learning time and often get trapped in local optima by complex tasks, subsequently failing to discover a goal-reaching policy. To address this, we introduce Efficient episodic Memory Utilization (EMU) for MARL, with two primary objectives: (a) accelerating reinforcement learning by leveraging semantically coherent memory from an episodic buffer and (b) selectively promoting desirable transitions to prevent local convergence. To achieve (a), EMU incorporates a trainable encoder/decoder structure alongside MARL, creating coherent memory embeddings that facilitate exploratory memory recall. To achieve (b), EMU introduces a novel reward structure called episodic incentive based on the desirability of states. This reward improves the TD target in Q-learning and acts as an additional incentive for desirable transitions. We provide theoretical support for the proposed incentive and demonstrate the effectiveness of EMU compared to conventional episodic control. The proposed method is evaluated in StarCraft II and Google Research Football, and empirical results indicate further performance improvement over state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Unifying count-based exploration and intrinsic motivation. Advances in neural information processing systems, 29, 2016.
  2. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
  3. Model-free episodic control. arXiv preprint arXiv:1606.04460, 2016.
  4. Exploration by random network distillation. arXiv preprint arXiv:1810.12894, 2018.
  5. Celebrating diversity in shared multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 34:3991–4002, 2021.
  6. Dynamic self-optimization of the antenna tilt for best trade-off between coverage and capacity in mobile networks. Wireless Personal Communications, 92(1):251–278, 2017.
  7. An elementary proof of a theorem of johnson and lindenstrauss. Random Structures & Algorithms, 22(1):60–65, 2003.
  8. Cooperative multi-agent system for production control using reinforcement learning. CIRP Annals, 69(1):389–392, 2020.
  9. Liir: Learning individual intrinsic reward in multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 32, 2019.
  10. Addressing function approximation error in actor-critic methods. In International conference on machine learning, pp. 1587–1596. PMLR, 2018.
  11. Cooperative multi-agent control using deep reinforcement learning. In International conference on autonomous agents and multiagent systems, pp.  66–83. Springer, 2017.
  12. Deep recurrent q-learning for partially observable mdps. In 2015 aaai fall symposium series, 2015.
  13. Exploration via elliptical episodic bonuses. Advances in Neural Information Processing Systems, 35:37631–37646, 2022.
  14. Vime: Variational information maximizing exploration. Advances in neural information processing systems, 29, 2016.
  15. Generalizable episodic memory for deep reinforcement learning. International conference on machine learning, 2021.
  16. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In International conference on machine learning, pp. 3040–3049. PMLR, 2019.
  17. Emi: Exploration with mutual information. arXiv preprint arXiv:1810.01176, 2018.
  18. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  19. Google research football: A novel reinforcement learning environment. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp.  4501–4510, 2020.
  20. Supervised autoencoders: Improving generalization performance with unsupervised regularizers. Advances in neural information processing systems, 31, 2018.
  21. Hippocampal contributions to control: the third way. Advances in neural information processing systems, 20, 2007.
  22. Episodic memory deep q-networks. arXiv preprint arXiv:1805.07603, 2018.
  23. Cooperative exploration for multi-agent deep reinforcement learning. In International Conference on Machine Learning, pp. 6826–6836. PMLR, 2021.
  24. Maven: Multi-agent variational exploration. Advances in Neural Information Processing Systems, 32, 2019.
  25. Ligs: Learnable intrinsic-reward generation selection for multi-agent learning. arXiv preprint arXiv:2112.02618, 2021.
  26. Variational information maximisation for intrinsically motivated reinforcement learning. Advances in neural information processing systems, 28, 2015.
  27. A concise introduction to decentralized POMDPs. Springer, 2016.
  28. Optimal and approximate q-value functions for decentralized pomdps. Journal of Artificial Intelligence Research, 32:289–353, 2008.
  29. Count-based exploration with neural density models. In International conference on machine learning, pp. 2721–2730. PMLR, 2017.
  30. Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pp. 2778–2787. PMLR, 2017.
  31. Neural episodic control. In International Conference on Machine Learning, pp. 2827–2836. PMLR, 2017.
  32. Habitat-matterport 3d dataset (hm3d): 1000 large-scale 3d environments for embodied ai. arXiv preprint arXiv:2109.08238, 2021.
  33. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International conference on machine learning, pp. 4295–4304. PMLR, 2018.
  34. Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Advances in neural information processing systems, 33:10199–10210, 2020.
  35. The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019.
  36. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015.
  37. Learning structured output representation using deep conditional generative models. Advances in neural information processing systems, 28, 2015.
  38. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International conference on machine learning, pp. 5887–5896. PMLR, 2019.
  39. Incentivizing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814, 2015.
  40. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296, 2017.
  41. Reinforcement learning: An introduction. MIT press, 2018.
  42. # exploration: A study of count-based exploration for deep reinforcement learning. Advances in neural information processing systems, 30, 2017.
  43. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.  5026–5033. IEEE, 2012. doi: 10.1109/IROS.2012.6386109.
  44. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  45. Mobile robot path planning in dynamic environments through globally guided reinforcement learning. IEEE Robotics and Automation Letters, 5(4):6932–6939, 2020a.
  46. Qplex: Duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062, 2020b.
  47. Influence-based multi-agent exploration. arXiv preprint arXiv:1910.05512, 2019.
  48. Rode: Learning roles to decompose multi-agent tasks. In Proceedings of the International Conference on Learning Representations (ICLR), 2021.
  49. Marco A Wiering et al. Multi-agent reinforcement learning for traffic light control. In Machine Learning: Proceedings of the Seventeenth International Conference (ICML’2000), pp.  1151–1158, 2000.
  50. Hierarchical cooperative multi-agent reinforcement learning with skill discovery. arXiv preprint arXiv:1912.03558, 2019.
  51. Qatten: A general framework for cooperative multiagent reinforcement learning. arXiv preprint arXiv:2002.03939, 2020.
  52. The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems, 35:24611–24624, 2022.
  53. Episodic multi-agent reinforcement learning with curiosity-driven exploration. Advances in Neural Information Processing Systems, 34:3757–3769, 2021.
  54. Episodic reinforcement learning with associative memory. International conference on learning representations, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Hyungho Na (3 papers)
  2. Yunkyeong Seo (1 paper)
  3. Il-Chul Moon (39 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.