Multi-agent Deep Covering Skill Discovery (2210.03269v3)
Abstract: The use of skills (a.k.a., options) can greatly accelerate exploration in reinforcement learning, especially when only sparse reward signals are available. While option discovery methods have been proposed for individual agents, in multi-agent reinforcement learning settings, discovering collaborative options that can coordinate the behavior of multiple agents and encourage them to visit the under-explored regions of their joint state space has not been considered. In this case, we propose Multi-agent Deep Covering Option Discovery, which constructs the multi-agent options through minimizing the expected cover time of the multiple agents' joint state space. Also, we propose a novel framework to adopt the multi-agent options in the MARL process. In practice, a multi-agent task can usually be divided into some sub-tasks, each of which can be completed by a sub-group of the agents. Therefore, our algorithm framework first leverages an attention mechanism to find collaborative agent sub-groups that would benefit most from coordinated actions. Then, a hierarchical algorithm, namely HA-MSAC, is developed to learn the multi-agent options for each sub-group to complete their sub-tasks first, and then to integrate them through a high-level policy as the solution of the whole task. This hierarchical option construction allows our framework to strike a balance between scalability and effective collaboration among the agents. The evaluation based on multi-agent collaborative tasks shows that the proposed algorithm can effectively capture the agent interactions with the attention mechanism, successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options, in terms of both faster exploration and higher task rewards.
- R. S. Sutton, D. Precup, and S. P. Singh, “Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning,” Artif. Intell., vol. 112, no. 1-2, pp. 181–211, 1999.
- Y. Jinnai, J. W. Park, D. Abel, and G. D. Konidaris, “Discovering options for exploration by minimizing cover time,” in Proceedings of the 36th International Conference on Machine Learning, ICML 2019, vol. 97. PMLR, 2019, pp. 3130–3139.
- A. Ghosh and S. P. Boyd, “Growing well-connected graphs,” in 45th IEEE Conference on Decision and Control, CDC 2006. IEEE, 2006, pp. 6605–6611.
- Y. Jinnai, J. W. Park, M. C. Machado, and G. D. Konidaris, “Exploration in reinforcement learning with deep covering options,” in 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net, 2020.
- J. Chakravorty, P. N. Ward, J. Roy, M. Chevalier-Boisvert, S. Basu, A. Lupu, and D. Precup, “Option-critic in cooperative multi-agent systems,” in Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’20, 2020, pp. 1792–1794.
- Y. Lee, J. Yang, and J. J. Lim, “Learning to coordinate manipulation skills via skill behavior diversification,” in 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net, 2020.
- J. Yang, I. Borovikov, and H. Zha, “Hierarchical cooperative multi-agent reinforcement learning with skill discovery,” in Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’20, 2020, pp. 1566–1574.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017, pp. 5998–6008.
- I. Menache, S. Mannor, and N. Shimkin, “Q-cut—dynamic discovery of sub-goals in reinforcement learning,” in European Conference on Machine Learning. Springer, 2002, pp. 295–306.
- G. Konidaris and A. Barto, “Skill discovery in continuous reinforcement learning domains using skill chaining,” Advances in neural information processing systems, vol. 22, pp. 1015–1023, 2009.
- J. Harb, P.-L. Bacon, M. Klissarov, and D. Precup, “When waiting is not an option: Learning options with a deliberation cost,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, 2018.
- S. Tiwari and P. S. Thomas, “Natural option critic,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 5175–5182.
- B. Eysenbach, A. Gupta, J. Ibarz, and S. Levine, “Diversity is all you need: Learning skills without a reward function,” in 7th International Conference on Learning Representations, ICLR 2019. OpenReview.net, 2019.
- M. C. Machado, M. G. Bellemare, and M. H. Bowling, “A laplacian framework for option discovery in reinforcement learning,” in Proceedings of the 34th International Conference on Machine Learning, ICML 2017, vol. 70. PMLR, 2017, pp. 2295–2304.
- C. Amato, G. D. Konidaris, and L. P. Kaelbling, “Planning with macro-actions in decentralized pomdps,” in International conference on Autonomous Agents and Multi-Agent Systems, AAMAS ’14. IFAAMAS/ACM, 2014, pp. 1273–1280.
- C. Amato, G. Konidaris, L. P. Kaelbling, and J. P. How, “Modeling and planning with macro-actions in decentralized pomdps,” Journal of Artificial Intelligence Research, vol. 64, pp. 817–859, 2019.
- J. Shen, G. Gu, and H. Liu, “Multi-agent hierarchical reinforcement learning by integrating options into maxq,” in First international multi-symposiums on computer and computational sciences (IMSCCS’06), vol. 1. IEEE, 2006, pp. 676–682.
- P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. F. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls et al., “Value-decomposition networks for cooperative multi-agent learning based on team reward.” in AAMAS, 2018, pp. 2085–2087.
- T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning,” arXiv preprint arXiv:1803.11485, 2018.
- K. Son, D. Kim, W. J. Kang, D. E. Hostallero, and Y. Yi, “Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning,” arXiv preprint arXiv:1905.05408, 2019.
- A. Mahajan, T. Rashid, M. Samvelyan, and S. Whiteson, “Maven: Multi-agent variational exploration,” in Advances in Neural Information Processing Systems, 2019, pp. 7613–7624.
- S. Iqbal and F. Sha, “Actor-attention-critic for multi-agent reinforcement learning,” in Proceedings of the 36th International Conference on Machine Learning, ICML 2019, vol. 97. PMLR, 2019, pp. 2961–2970.
- P.-L. Bacon, J. Harb, and D. Precup, “The option-critic architecture,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, 2017.
- M. Liu, M. C. Machado, G. Tesauro, and M. Campbell, “The eigenoption-critic framework,” CoRR, vol. abs/1712.04065, 2017. [Online]. Available: http://arxiv.org/abs/1712.04065
- A. Graves, G. Wayne, and I. Danihelka, “Neural turing machines,” CoRR, vol. abs/1410.5401, 2014. [Online]. Available: http://arxiv.org/abs/1410.5401
- J. Oh, V. Chockalingam, S. P. Singh, and H. Lee, “Control of memory, active perception, and action in minecraft,” in Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, ser. JMLR Workshop and Conference Proceedings, M. Balcan and K. Q. Weinberger, Eds., vol. 48. JMLR.org, 2016, pp. 2790–2799. [Online]. Available: http://proceedings.mlr.press/v48/oh16.html
- J. Chung, Ç. Gülçehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” CoRR, vol. abs/1412.3555, 2014. [Online]. Available: http://arxiv.org/abs/1412.3555
- Y. Liu, W. Wang, Y. Hu, J. Hao, X. Chen, and Y. Gao, “Multi-agent game abstraction via graph attention neural network,” in The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020. AAAI Press, 2020, pp. 7211–7218.
- T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proceedings of the 35th International Conference on Machine Learning, ICML 2018, vol. 80. PMLR, 2018, pp. 1856–1865.
- R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” Neural Information Processing Systems (NIPS), 2017.
- Z. Chen, F. Silvestri, G. Tolomei, J. Wang, H. Zhu, and H. Ahn, “Explain the explainer: Interpreting model-agnostic counterfactual explanations of a deep reinforcement learning agent,” IEEE Transactions on Artificial Intelligence, 2022.
- Z. Chen, F. Silvestri, J. Wang, H. Zhu, H. Ahn, and G. Tolomei, “Relax: Reinforcement learning agent explainer for arbitrary predictive models,” in Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022, pp. 252–261.
- X. Ma, A. Karimpour, and Y.-J. Wu, “Statistical evaluation of data requirement for ramp metering performance assessment,” Transportation Research Part A: Policy and Practice, vol. 141, pp. 248–261, 2020.
- X. Luo, X. Ma, M. Munden, Y.-J. Wu, and Y. Jiang, “A multisource data approach for estimating vehicle queue length at metered on-ramps,” Journal of Transportation Engineering, Part A: Systems, vol. 148, no. 2, p. 04021117, 2022.
- Jiayu Chen (51 papers)
- Marina Haliem (5 papers)
- Tian Lan (162 papers)
- Vaneet Aggarwal (222 papers)