Physics-Informed Multi-Agent Reinforcement Learning for Distributed Multi-Robot Problems (2401.00212v2)
Abstract: The networked nature of multi-robot systems presents challenges in the context of multi-agent reinforcement learning. Centralized control policies do not scale with increasing numbers of robots, whereas independent control policies do not exploit the information provided by other robots, exhibiting poor performance in cooperative-competitive tasks. In this work we propose a physics-informed reinforcement learning approach able to learn distributed multi-robot control policies that are both scalable and make use of all the available information to each robot. Our approach has three key characteristics. First, it imposes a port-Hamiltonian structure on the policy representation, respecting energy conservation properties of physical robot systems and the networked nature of robot team interactions. Second, it uses self-attention to ensure a sparse policy representation able to handle time-varying information at each robot from the interaction graph. Third, we present a soft actor-critic reinforcement learning algorithm parameterized by our self-attention port-Hamiltonian control policy, which accounts for the correlation among robots during training while overcoming the need of value function factorization. Extensive simulations in different multi-robot scenarios demonstrate the success of the proposed approach, surpassing previous multi-robot reinforcement learning solutions in scalability, while achieving similar or superior performance (with averaged cumulative reward up to x2 greater than the state-of-the-art with robot teams x6 larger than the number of robots at training time).
- N. Atanasov, J. Le Ny, K. Daniilidis, and G. J. Pappas, “Decentralized active information acquisition: Theory and application to multi-robot SLAM,” in IEEE International Conference on Robotics and Automation, 2015, pp. 4775–4782.
- Y. Tian, Y. Chang, F. H. Arias, C. Nieto-Granda, J. P. How, and L. Carlone, “Kimera-multi: Robust, distributed, dense metric-semantic SLAM for multi-robot systems,” IEEE Transactions on Robotics, 2022.
- X. Kan, T. C. Thayer, S. Carpin, and K. Karydis, “Task planning on stochastic aisle graphs for precision agriculture,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 3287–3294, 2021.
- A. Pierson and M. Schwager, “Bio-inspired non-cooperative multi-robot herding,” in IEEE International Conference on Robotics and Automation, 2015, pp. 1843–1849.
- E. Sebastián and E. Montijano, “Multi-robot implicit control of herds,” in IEEE International Conference on Robotics and Automation, 2021, pp. 1601–1607.
- E. Sebastián, E. Montijano, and C. Sagüés, “Adaptive multirobot implicit control of heterogeneous herds,” IEEE Transactions on Robotics, 2022.
- L. Heintzman, A. Hashimoto, N. Abaid, and R. K. Williams, “Anticipatory planning and dynamic lost person models for human-robot search and rescue,” in IEEE International Conference on Robotics and Automation, 2021, pp. 8252–8258.
- M. J. Matarić, “Reinforcement learning in the multi-robot domain,” Autonomous Robots, vol. 1, no. 4, pp. 73–83, 1997.
- L. Matignon, G. J. Laurent, and N. Le Fort-Piat, “Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2007, pp. 64–69.
- L. Matignon, L. Jeanpierre, and A.-I. Mouaddib, “Coordinated multi-robot exploration under communication constraints using decentralized markov decision processes,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 26, no. 1, 2012, pp. 2017–2023.
- S. Munikoti, D. Agarwal, L. Das, M. Halappanavar, and B. Natarajan, “Challenges and opportunities in deep reinforcement learning with graph neural networks: A comprehensive review of algorithms and applications,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
- Á. Serra-Gómez, H. Zhu, B. Brito, W. Böhmer, and J. Alonso-Mora, “Learning scalable and efficient communication policies for multi-robot collision avoidance,” Autonomous Robots, pp. 1–23, 2023.
- Y. L. Lo, C. S. de Witt, S. Sokota, J. N. Foerster, and S. Whiteson, “Cheap talk discovery and utilization in multi-agent reinforcement learning,” in The Eleventh International Conference on Learning Representations, 2023.
- G. Qu, A. Wierman, and N. Li, “Scalable reinforcement learning for multiagent networked systems,” Operations Research, vol. 70, no. 6, pp. 3601–3628, 2022.
- T. Beckers, T. Z. Jiahao, and G. J. Pappas, “Learning switching port-hamiltonian systems with uncertainty quantification,” arXiv preprint arXiv:2305.09689, 2023.
- C. Neary and U. Topcu, “Compositional learning of dynamical system models using port-hamiltonian neural networks,” in Learning for Dynamics and Control Conference. PMLR, 2023, pp. 679–691.
- E. Sebastián, T. Duong, N. Atanasov, E. Montijano, and C. Sagüés, “LEMURS: Learning Distributed Multi-Robot Interactions,” in IEEE International Conference on Robotics and Automation, 2023, pp. 7713–7719.
- T. X. Nghiem, J. Drgoňa, C. Jones, Z. Nagy, R. Schwan, B. Dey, A. Chakrabarty, S. Di Cairano, J. A. Paulson, A. Carron, et al., “Physics-informed machine learning for modeling and control of dynamical systems,” arXiv preprint arXiv:2306.13867, 2023.
- S. Sanyal and K. Roy, “Ramp-net: A robust adaptive mpc for quadrotors via physics-informed neural network,” in IEEE International Conference on Robotics and Automation, 2023, pp. 1019–1025.
- C. Rodwell and P. Tallapragada, “Physics-informed reinforcement learning for motion control of a fish-like swimming robot,” Scientific Reports, vol. 13, no. 1, p. 10754, 2023.
- S. Cuomo, V. S. Di Cola, F. Giampaolo, G. Rozza, M. Raissi, and F. Piccialli, “Scientific machine learning through physics–informed neural networks: Where we are and what’s next,” Journal of Scientific Computing, vol. 92, no. 3, p. 88, 2022.
- Y. Xu, S. Kohtz, J. Boakye, P. Gardoni, and P. Wang, “Physics-informed machine learning for reliability and systems safety applications: State of the art and challenges,” Reliability Engineering & System Safety, p. 108900, 2022.
- D. Bloembergen, K. Tuyls, D. Hennes, and M. Kaisers, “Evolutionary dynamics of multi-agent learning: A survey,” Journal of Artificial Intelligence Research, vol. 53, pp. 659–697, 2015.
- P. Long, T. Fan, X. Liao, W. Liu, H. Zhang, and J. Pan, “Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning,” in IEEE International Conference on Robotics and Automation, 2018, pp. 6252–6259.
- S. H. Semnani, H. Liu, M. Everett, A. De Ruiter, and J. P. How, “Multi-agent motion planning for dense and dynamic environments via deep reinforcement learning,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 3221–3226, 2020.
- A. Y. Ng, S. Russell, et al., “Algorithms for inverse reinforcement learning,” in International Conference on Machine Learning, vol. 1, 2000, p. 2.
- S. Dasari, F. Ebert, S. Tian, S. Nair, B. Bucher, K. Schmeckpeper, S. Singh, S. Levine, and C. Finn, “Robonet: Large-scale multi-robot learning,” in Conference on Robot Learning, 2020, pp. 885–897.
- K. Bogert and P. Doshi, “Multi-robot inverse reinforcement learning under occlusion with estimation of state transitions,” Artificial Intelligence, vol. 263, pp. 46–73, 2018.
- R. Han, S. Chen, and Q. Hao, “Cooperative multi-robot navigation in dynamic environment with deep reinforcement learning,” in IEEE International Conference on Robotics and Automation, 2020, pp. 448–454.
- I. Gharbi, J. Kuckling, D. G. Ramos, and M. Birattari, “Show me what you want: Inverse reinforcement learning to automatically design robot swarms by demonstration,” arXiv preprint arXiv:2301.06864, 2023.
- H. Zhu, F. M. Claramunt, B. Brito, and J. Alonso-Mora, “Learning interaction-aware trajectory predictions for decentralized multi-robot motion planning in dynamic environments,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2256–2263, 2021.
- S. Zhou, M. J. Phielipp, J. A. Sefair, S. I. Walker, and H. B. Amor, “Clone swarms: Learning to predict and control multi-robot systems by imitation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019, pp. 4092–4099.
- G. Qu, A. Wierman, and N. Li, “Scalable reinforcement learning of localized policies for multi-agent networked systems,” in Learning for Dynamics and Control. PMLR, 2020, pp. 256–266.
- V. Zambaldi, D. Raposo, A. Santoro, V. Bapst, Y. Li, I. Babuschkin, K. Tuyls, D. Reichert, T. Lillicrap, E. Lockhart, et al., “Deep reinforcement learning with relational inductive biases,” in International conference on learning representations, 2018.
- S. Iqbal and F. Sha, “Actor-attention-critic for multi-agent reinforcement learning,” in International conference on machine learning. PMLR, 2019, pp. 2961–2970.
- G. Li, B. Jiang, H. Zhu, Z. Che, and Y. Liu, “Generative attention networks for multi-agent behavioral modeling,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, 2020, pp. 7195–7202.
- P. Parnika, R. B. Diddigi, S. K. R. Danda, and S. Bhatnagar, “Attention actor-critic algorithm for multi-agent constrained co-operative reinforcement learning,” in International Conference on Autonomous Agents and Multiagent Systems, 2021.
- A. Marino, C. Pacchierotti, and P. R. Giordano, “On the stability of gated graph neural networks,” arXiv preprint arXiv:2305.19235, 2023.
- Q. Li, W. Lin, Z. Liu, and A. Prorok, “Message-aware graph attention networks for large-scale multi-robot path planning,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 5533–5540, 2021.
- A. Khan, E. Tolstaya, A. Ribeiro, and V. Kumar, “Graph policy gradients for large scale robot control,” in Conference on Robot Learning, 2020, pp. 823–834.
- E. Tolstaya, F. Gama, J. Paulos, G. Pappas, V. Kumar, and A. Ribeiro, “Learning decentralized controllers for robot swarms with graph neural networks,” in Conference on Robot Learning, 2020, pp. 671–682.
- E. Tolstaya, J. Paulos, V. Kumar, and A. Ribeiro, “Multi-robot coverage and exploration using spatial graph neural networks,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021, pp. 8944–8950.
- F. Yang and N. Matni, “Communication topology co-design in graph recurrent neural network based distributed control,” in IEEE Conference on Decision and Control, 2021, pp. 3619–3626.
- F. Gama, Q. Li, E. Tolstaya, A. Prorok, and A. Ribeiro, “Synthesizing decentralized controllers with graph neural networks and imitation learning,” IEEE Transactions on Signal Processing, vol. 70, pp. 1932–1946, 2022.
- L. Kuyer, S. Whiteson, B. Bakker, and N. Vlassis, “Multiagent reinforcement learning for urban traffic control using coordination graphs,” in Machine Learning and Knowledge Discovery in Databases: European Conference. Springer, 2008, pp. 656–671.
- L. Buşoniu, R. Babuška, and B. De Schutter, “Multi-agent reinforcement learning: An overview,” Innovations in multi-agent systems and applications-1, pp. 183–221, 2010.
- O. Vinyals, T. Ewalds, S. Bartunov, P. Georgiev, A. S. Vezhnevets, M. Yeo, A. Makhzani, H. Küttler, J. Agapiou, J. Schrittwieser, et al., “Starcraft ii: A new challenge for reinforcement learning,” arXiv preprint arXiv:1708.04782, 2017.
- B. Ellis, S. Moalla, M. Samvelyan, M. Sun, A. Mahajan, J. N. Foerster, and S. Whiteson, “SMACv2: An improved benchmark for cooperative multi-agent reinforcement learning,” arXiv preprint arXiv:2212.07489, 2022.
- S. Gronauer and K. Diepold, “Multi-agent deep reinforcement learning: a survey,” Artificial Intelligence Review, pp. 1–49, 2022.
- A. Oroojlooy and D. Hajinezhad, “A review of cooperative multi-agent deep reinforcement learning,” Applied Intelligence, vol. 53, no. 11, pp. 13 677–13 722, 2023.
- L. Matignon, G. J. Laurent, and N. Le Fort-Piat, “Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems,” The Knowledge Engineering Review, vol. 27, no. 1, pp. 1–31, 2012.
- G. Papoudakis, F. Christianos, A. Rahman, and S. V. Albrecht, “Dealing with non-stationarity in multi-agent deep reinforcement learning,” arXiv preprint arXiv:1906.04737, 2019.
- W. Böhmer, V. Kurin, and S. Whiteson, “Deep coordination graphs,” in International Conference on Machine Learning. PMLR, 2020, pp. 980–991.
- T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, et al., “Soft actor-critic algorithms and applications,” arXiv preprint arXiv:1812.05905, 2018.
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
- R. Lowe, Y. I. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” Advances in neural information processing systems, vol. 30, 2017.
- J. Bloom, P. Paliwal, A. Mukherjee, and C. Pinciroli, “Decentralized multi-agent reinforcement learning with global state prediction,” arXiv preprint arXiv:2306.12926, 2023.
- Y. Yang, R. Luo, M. Li, M. Zhou, W. Zhang, and J. Wang, “Mean field multi-agent reinforcement learning,” in International Conference on Machine Learning. PMLR, 2018, pp. 5571–5580.
- B. Wang, J. Xie, and N. Atanasov, “DARL1N: Distributed multi-agent reinforcement learning with one-hop neighbors,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022.
- Y. Motokawa and T. Sugawara, “Interpretability for conditional coordinated behavior in multi-agent reinforcement learning,” arXiv preprint arXiv:2304.10375, 2023.
- R. Kortvelesy and A. Prorok, “QGNN: Value Function Factorisation with Graph Neural Networks,” arXiv preprint arXiv:2205.13005, 2022.
- Y. Hu, J. Fu, and G. Wen, “Graph soft actor-critic reinforcement learning for large-scale distributed multirobot coordination,” IEEE transactions on neural networks and learning systems, 2023.
- P. Zhao and Y. Liu, “Physics informed deep reinforcement learning for aircraft conflict resolution,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 8288–8301, 2021.
- G. Sartoretti, Y. Wu, W. Paivine, T. S. Kumar, S. Koenig, and H. Choset, “Distributed reinforcement learning for multi-robot decentralized collective construction,” in Distributed Autonomous Robotic Systems: The 14th International Symposium. Springer, 2019, pp. 35–49.
- A. Van Der Schaft and D. Jeltsema, “Port-Hamiltonian systems theory: An introductory overview,” Foundations and Trends in Systems and Control, vol. 1, no. 2-3, pp. 173–378, 2014.
- L. Furieri, C. L. Galimberti, M. Zakwan, and G. Ferrari-Trecate, “Distributed neural network control with dependability guarantees: a compositional port-hamiltonian approach,” in Learning for Dynamics and Control Conference, 2022, pp. 571–583.
- C. L. Galimberti, L. Furieri, L. Xu, and G. Ferrari-Trecate, “Hamiltonian deep neural networks guaranteeing nonvanishing gradients by design,” IEEE Transactions on Automatic Control, vol. 68, no. 5, pp. 3155–3162, 2023.
- G. Shi, W. Hönig, Y. Yue, and S.-J. Chung, “Neural-swarm: Decentralized close-proximity multirotor control using learned interactions,” in IEEE International Conference on Robotics and Automation, 2020, pp. 3241–3247.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
- A. J. van der Schaft, “Port-Hamiltonian systems: network modeling and control of nonlinear physical systems,” in Advanced dynamics and control of structures and machines. Springer, 2004, pp. 127–167.
- G. Blankenstein, R. Ortega, and A. J. Van Der Schaft, “The matching conditions of controlled lagrangians and ida-passivity based control,” International Journal of Control, vol. 75, no. 9, pp. 645–665, 2002.
- T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International conference on machine learning. PMLR, 2018, pp. 1861–1870.
- M. Bettini, R. Kortvelesy, J. Blumenkamp, and A. Prorok, “VMAS: a vectorized multi-agent simulator for collective robot learning,” arXiv preprint arXiv:2207.03530, 2022.
- T. Haarnoja, S. Ha, A. Zhou, J. Tan, G. Tucker, and S. Levine, “Learning to walk via deep reinforcement learning,” arXiv preprint arXiv:1812.11103, 2018.
- I. Mordatch and P. Abbeel, “Emergence of grounded compositional language in multi-agent populations,” in Proceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018.
- Q. Long, Z. Zhou, A. Gupta, F. Fang, Y. Wu, and X. Wang, “Evolutionary population curriculum for scaling multi-agent reinforcement learning,” in International Conference on Learning Representations, 2019.
- P. Ramachandran, B. Zoph, and Q. V. Le, “Searching for activation functions,” arXiv preprint arXiv:1710.05941, 2017.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference for Learning Presentations, 2015.
- Eduardo Sebastian (13 papers)
- Thai Duong (17 papers)
- Nikolay Atanasov (101 papers)
- Eduardo Montijano (34 papers)
- Carlos Sagues (34 papers)