Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Collision Avoidance and Navigation for a Quadrotor Swarm Using End-to-end Deep Reinforcement Learning (2309.13285v2)

Published 23 Sep 2023 in cs.RO, cs.AI, and cs.MA

Abstract: End-to-end deep reinforcement learning (DRL) for quadrotor control promises many benefits -- easy deployment, task generalization and real-time execution capability. Prior end-to-end DRL-based methods have showcased the ability to deploy learned controllers onto single quadrotors or quadrotor teams maneuvering in simple, obstacle-free environments. However, the addition of obstacles increases the number of possible interactions exponentially, thereby increasing the difficulty of training RL policies. In this work, we propose an end-to-end DRL approach to control quadrotor swarms in environments with obstacles. We provide our agents a curriculum and a replay buffer of the clipped collision episodes to improve performance in obstacle-rich environments. We implement an attention mechanism to attend to the neighbor robots and obstacle interactions - the first successful demonstration of this mechanism on policies for swarm behavior deployed on severely compute-constrained hardware. Our work is the first work that demonstrates the possibility of learning neighbor-avoiding and obstacle-avoiding control policies trained with end-to-end DRL that transfers zero-shot to real quadrotors. Our approach scales to 32 robots with 80% obstacle density in simulation and 8 robots with 20% obstacle density in physical deployment. Video demonstrations are available on the project website at: https://sites.google.com/view/obst-avoid-swarm-rl.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. R. D’Andrea, “Guest editorial can drones deliver?” IEEE Transactions on Automation Science and Engineering, vol. 11, no. 3, pp. 647–648, 2014.
  2. A. V. Borkar, S. Hangal, H. Arya, A. Sinha, and L. Vachhani, “Reconfigurable formations of quadrotors on lissajous curves for surveillance applications,” European Journal of Control, vol. 56, pp. 274–288, 2020.
  3. H. Liu, Q. Chen, N. Pan, Y. Sun, Y. An, and D. Pan, “Uav stocktaking task-planning for industrial warehouses based on the improved hybrid differential evolution algorithm,” IEEE Transactions on Industrial Informatics, vol. 18, no. 1, pp. 582–591, 2021.
  4. H. A. F. Almurib, P. T. Nathan, and T. N. Kumar, “Control and path planning of quadrotor aerial vehicles for search and rescue,” in SICE Annual Conference 2011, 2011, pp. 700–705.
  5. D. Zhou, Z. Wang, S. Bandyopadhyay, and M. Schwager, “Fast, on-line collision avoidance for dynamic vehicles using buffered voronoi cells,” IEEE Robotics and Automation Letters, vol. 2, no. 2, pp. 1047–1054, 2017.
  6. B. Şenbaşlar, W. Hönig, and N. Ayanian, “Robust trajectory execution for multi-robot teams using distributed real-time replanning,” in Distributed Autonomous Robotic Systems (DARS), 2019, pp. 167–181.
  7. ——, “RLSS: real-time, decentralized, cooperative, networkless multi-robot trajectory planning using linear spatial separations,” Autonomous Robots, 2023.
  8. B. Şenbaşlar and G. S. Sukhatme, “Asynchronous real-time decentralized multi-robot trajectory planning,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 9972–9979.
  9. J. Tordesillas and J. P. How, “MADER: Trajectory planner in multiagent and dynamic environments,” IEEE Transactions on Robotics, vol. 38, no. 1, pp. 463–476, 2022.
  10. K. Kondo, J. Tordesillas, R. Figueroa, J. Rached, J. Merkel, P. C. Lusk, and J. P. How, “Robust MADER: Decentralized and asynchronous multiagent trajectory planner robust to communication delay,” arXiv preprint arXiv:2209.13667, 2022.
  11. C. Luis, M. Vukosavljev, and A. Schoellig, “Online trajectory generation with distributed model predictive control for multi-robot motion planning,” IEEE Robotics and Automation Letters, vol. PP, pp. 1–1, 2020.
  12. X. Wang, L. Xi, Y. Chen, S. Lai, F. Lin, and B. M. Chen, “Decentralized mpc-based trajectory generation for multiple quadrotors in cluttered environments,” Guidance, Navigation and Control, vol. 01, no. 02, p. 2150007, 2021.
  13. J. Park and H. J. Kim, “Online trajectory planning for multiple quadrotors in dynamic environments using relative safe flight corridor,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 659–666, 2021.
  14. L. Wang, A. D. Ames, and M. Egerstedt, “Safety barrier certificates for collisions-free multirobot systems,” IEEE Transactions on Robotics, vol. 33, no. 3, pp. 661–674, 2017.
  15. J. Alonso-Mora, A. Breitenmoser, M. Rufli, P. Beardsley, and R. Siegwart, “Optimal reciprocal collision avoidance for multiple non-holonomic robots,” in Distributed Autonomous Robotic Systems: The 10th International Symposium, 2013, pp. 203–216.
  16. S. Batra, Z. Huang, A. Petrenko, T. Kumar, A. Molchanov, and G. S. Sukhatme, “Decentralized control of quadrotor swarms with end-to-end deep reinforcement learning,” in Conference on Robot Learning.   PMLR, 2022, pp. 576–586.
  17. M. Jiang, E. Grefenstette, and T. Rocktäschel, “Prioritized level replay,” in International Conference on Machine Learning.   PMLR, 2021, pp. 4940–4950.
  18. J. Yu and S. LaValle, “Structure and intractability of optimal multi-robot path planning on graphs,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 27, no. 1, 2013, pp. 1443–1449.
  19. J. Hopcroft, J. Schwartz, and M. Sharir, “On the complexity of motion planning for multiple independent objects; pspace- hardness of the ”warehouseman’s problem”,” The International Journal of Robotics Research, vol. 3, no. 4, pp. 76–88, 1984.
  20. B. Şenbaşlar and G. S. Sukhatme, “Probabilistic trajectory planning for static and interaction-aware dynamic obstacle avoidance,” arXiv preprint arXiv:2302.12873, 2023.
  21. B. Riviere, W. Hönig, Y. Yue, and S.-J. Chung, “Glas: Global-to-local safe autonomy synthesis for multi-robot motion planning with end-to-end learning,” IEEE Robotics and Automation Letters, vol. PP, pp. 1–1, 2020.
  22. S. Feng, B. Sebastian, and P. Ben-Tzvi, “A collision avoidance method based on deep reinforcement learning,” Robotics, vol. 10, no. 2, p. 73, 2021.
  23. Y. F. Chen, M. Liu, M. Everett, and J. P. How, “Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 285–292.
  24. Y. Cui, L. Lin, X. Huang, D. Zhang, Y. Wang, W. Jing, J. Chen, R. Xiong, and Y. Wang, “Learning observation-based certifiable safe policy for decentralized multi-robot navigation,” in 2022 International Conference on Robotics and Automation (ICRA), 2022, pp. 5518–5524.
  25. H. Hua and Y. Fang, “A novel learning-based trajectory generation strategy for a quadrotor,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–12, 2022.
  26. C. Yan, C. Wang, X. Xiang, K. H. Low, X. Wang, X. Xu, and L. Shen, “Collision-avoiding flocking with multiple fixed-wing uavs in obstacle-cluttered environments: A task-specific curriculum-based madrl approach,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–15, 2023.
  27. E. Kaufmann, L. Bauersfeld, and D. Scaramuzza, “A benchmark comparison of learned control policies for agile quadrotor flight,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 10 504–10 510.
  28. Z. Huang, S. Batra, T. Chen, R. Krupani, T. Kumar, A. Molchanov, A. Petrenko, J. A. Preiss, Z. Yang, and G. S. Sukhatme, “Quadswarm: A modular multi-quadrotor simulator for deep reinforcement learning with direct thrust control,” arXiv preprint arXiv:2306.09537, 2023.
  29. A. Molchanov, T. Chen, W. Hönig, J. A. Preiss, N. Ayanian, and G. S. Sukhatme, “Sim-to-(multi)-real: Transfer of low-level robust control policies to multiple quadrotors,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2019, pp. 59–66.
  30. D. Mellinger and V. Kumar, “Minimum snap trajectory generation and control for quadrotors,” in IEEE International Conference on Robotics and Automation (ICRA), 2011, pp. 2520–2525.
  31. T. Lee, M. Leok, and N. H. McClamroch, “Geometric tracking control of a quadrotor uav on se (3),” in 49th IEEE conference on decision and control (CDC).   IEEE, 2010, pp. 5420–5425.
  32. P. F. Felzenszwalb and D. P. Huttenlocher, “Distance transforms of sampled functions,” Theory of computing, vol. 8, no. 1, pp. 415–428, 2012.
  33. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  34. A. Petrenko, Z. Huang, T. Kumar, G. Sukhatme, and V. Koltun, “Sample factory: Egocentric 3d control from pixels at 100000 fps with asynchronous reinforcement learning,” in International Conference on Machine Learning.   PMLR, 2020, pp. 7652–7662.
  35. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  36. A. A. Rusu, S. G. Colmenarejo, C. Gulcehre, G. Desjardins, J. Kirkpatrick, R. Pascanu, V. Mnih, K. Kavukcuoglu, and R. Hadsell, “Policy distillation,” arXiv preprint arXiv:1511.06295, 2015.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zhehui Huang (10 papers)
  2. Zhaojing Yang (4 papers)
  3. Rahul Krupani (2 papers)
  4. Baskın Şenbaşlar (8 papers)
  5. Sumeet Batra (9 papers)
  6. Gaurav S. Sukhatme (88 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com