Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SACHA: Soft Actor-Critic with Heuristic-Based Attention for Partially Observable Multi-Agent Path Finding (2307.02691v1)

Published 5 Jul 2023 in cs.RO, cs.AI, and cs.MA

Abstract: Multi-Agent Path Finding (MAPF) is a crucial component for many large-scale robotic systems, where agents must plan their collision-free paths to their given goal positions. Recently, multi-agent reinforcement learning has been introduced to solve the partially observable variant of MAPF by learning a decentralized single-agent policy in a centralized fashion based on each agent's partial observation. However, existing learning-based methods are ineffective in achieving complex multi-agent cooperation, especially in congested environments, due to the non-stationarity of this setting. To tackle this challenge, we propose a multi-agent actor-critic method called Soft Actor-Critic with Heuristic-Based Attention (SACHA), which employs novel heuristic-based attention mechanisms for both the actors and critics to encourage cooperation among agents. SACHA learns a neural network for each agent to selectively pay attention to the shortest path heuristic guidance from multiple agents within its field of view, thereby allowing for more scalable learning of cooperation. SACHA also extends the existing multi-agent actor-critic framework by introducing a novel critic centered on each agent to approximate $Q$-values. Compared to existing methods that use a fully observable critic, our agent-centered multi-agent actor-critic method results in more impartial credit assignment and better generalizability of the learned policy to MAPF instances with varying numbers of agents and types of environments. We also implement SACHA(C), which embeds a communication module in the agent's policy network to enable information exchange among agents. We evaluate both SACHA and SACHA(C) on a variety of MAPF instances and demonstrate decent improvements over several state-of-the-art learning-based MAPF methods with respect to success rate and solution quality.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. R. Stern, N. R. Sturtevant, A. Felner, S. Koenig, H. Ma, T. T. Walker, J. Li, D. Atzmon, L. Cohen, T. S. Kumar, et al., “Multi-agent pathfinding: Definitions, variants, and benchmarks,” in SoCS, 2019.
  2. P. R. Wurman, R. D’Andrea, and M. Mountz, “Coordinating hundreds of cooperative, autonomous vehicles in warehouses,” AI magazine, vol. 29, no. 1, pp. 9–9, 2008.
  3. M. Veloso, J. Biswas, B. Coltin, and S. Rosenthal, “Cobots: Robust symbiotic autonomous mobile service robots,” in IJCAI, 2015.
  4. R. Morris, C. S. Pasareanu, K. Luckow, W. Malik, H. Ma, T. S. Kumar, and S. Koenig, “Planning, scheduling and monitoring for airport surface operations,” in AAAI Workshops, 2016.
  5. A. Gautam and S. Mohan, “A review of research in multi-robot systems,” in ICIIS.   IEEE, 2012, pp. 1–5.
  6. J. Yu and S. M. LaValle, “Structure and intractability of optimal multi-robot path planning on graphs,” in AAAI, 2013.
  7. J. Banfi, N. Basilico, and F. Amigoni, “Intractability of time-optimal multirobot path planning on 2d grid graphs with holes,” IEEE Robotics and Automation Letters, vol. 2, no. 4, pp. 1941–1947, 2017.
  8. G. Sartoretti, J. Kerr, Y. Shi, G. Wagner, T. S. Kumar, S. Koenig, and H. Choset, “Primal: Pathfinding via reinforcement and imitation multi-agent learning,” IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 2378–2385, 2019.
  9. Z. Ma, Y. Luo, and H. Ma, “Distributed heuristic multi-agent path finding with communication,” in ICRA, 2021, pp. 8699–8705.
  10. Z. Ma, Y. Luo, and J. Pan, “Learning selective communication for multi-agent path finding,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 1455–1462, 2021.
  11. M. L. Littman, “Markov games as a framework for multi-agent reinforcement learning,” in Machine learning proceedings, 1994, pp. 157–163.
  12. J. Li, A. Tinka, S. Kiesel, J. W. Durham, T. S. Kumar, and S. Koenig, “Lifelong multi-agent path finding in large-scale warehouses,” in AAAI, 2021, pp. 11 272–11 281.
  13. Z. Liu, B. Chen, H. Zhou, G. Koushik, M. Hebert, and D. Zhao, “Mapper: Multi-agent path planning with evolutionary reinforcement learning in mixed dynamic environments,” in IROS, 2020, pp. 11 748–11 754.
  14. M. Tan, “Multi-agent reinforcement learning: Independent vs. cooperative agents,” in ICML, 1993, pp. 330–337.
  15. V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in ICML, 2016, pp. 1928–1937.
  16. G. Wagner and H. Choset, “M*: A complete multirobot path planning algorithm with performance bounds,” in IROS.   IEEE, 2011, pp. 3260–3267.
  17. A. Tampuu, T. Matiisen, D. Kodelja, I. Kuzovkin, K. Korjus, J. Aru, J. Aru, and R. Vicente, “Multiagent cooperation and competition with deep reinforcement learning,” PloS one, vol. 12, no. 4, p. e0172395, 2017.
  18. J. K. Gupta, M. Egorov, and M. Kochenderfer, “Cooperative multi-agent control using deep reinforcement learning,” in Autonomous Agents and Multiagent Systems: AAMAS 2017 Workshops, Best Papers.   Springer, 2017, pp. 66–83.
  19. A. Oroojlooy and D. Hajinezhad, “A review of cooperative multi-agent deep reinforcement learning,” Applied Intelligence, pp. 1–46, 2022.
  20. R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” NeurIPS, vol. 30, 2017.
  21. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
  22. J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, “Counterfactual multi-agent policy gradients,” in AAAI, 2018.
  23. S. Iqbal and F. Sha, “Actor-attention-critic for multi-agent reinforcement learning,” in ICML, 2019, pp. 2961–2970.
  24. G. Sharon, R. Stern, A. Felner, and N. R. Sturtevant, “Conflict-based search for optimal multi-agent pathfinding,” Artificial Intelligence, vol. 219, pp. 40–66, 2015.
  25. C.-A. Cheng, A. Kolobov, and A. Swaminathan, “Heuristic-guided reinforcement learning,” NeurIPS, pp. 13 550–13 563, 2021.
  26. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” NeurIPS, 2017.
  27. T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
  28. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in ICML.   PMLR, 2018, pp. 1861–1870.
  29. R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” NeurIPS, 1999.
  30. Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in ICML, 2009, pp. 41–48.
  31. G. Wagner and H. Choset, “Subdimensional expansion for multirobot path planning,” Artificial intelligence, vol. 219, pp. 1–24, 2015.
  32. H. Ma, D. Harabor, P. J. Stuckey, J. Li, and S. Koenig, “Searching with consistent prioritization for multi-agent path finding,” in AAAI, 2019, pp. 7643–7650.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Qiushi Lin (2 papers)
  2. Hang Ma (33 papers)
Citations (14)

Summary

We haven't generated a summary for this paper yet.