Learning Multi-Pursuit Evasion for Safe Targeted Navigation of Drones (2304.03443v2)
Abstract: Safe navigation of drones in the presence of adversarial physical attacks from multiple pursuers is a challenging task. This paper proposes a novel approach, asynchronous multi-stage deep reinforcement learning (AMS-DRL), to train adversarial neural networks that can learn from the actions of multiple evolved pursuers and adapt quickly to their behavior, enabling the drone to avoid attacks and reach its target. Specifically, AMS-DRL evolves adversarial agents in a pursuit-evasion game where the pursuers and the evader are asynchronously trained in a bipartite graph way during multiple stages. Our approach guarantees convergence by ensuring Nash equilibrium among agents from the game-theory analysis. We evaluate our method in extensive simulations and show that it outperforms baselines with higher navigation success rates. We also analyze how parameters such as the relative maximum speed affect navigation performance. Furthermore, we have conducted physical experiments and validated the effectiveness of the trained policies in real-time flights. A success rate heatmap is introduced to elucidate how spatial geometry influences navigation outcomes. Project website: https://github.com/NTU-ICG/AMS-DRL-for-Pursuit-Evasion.
- M. Doole, J. Ellerbroek, and J. Hoekstra, “Estimation of traffic density from drone-based delivery in very low level urban airspace,” Journal of Air Transport Management, vol. 88, p. 101862, 2020.
- B. Lagan, “Bird attacks on drones force Google to suspend home deliveries,” 2021. [Online]. Available: https://www.thetimes.co.uk/article/bird-attacks-on-drones-forces-suspension-of-home-deliveries-jpf56j0lk
- M. Kratky and J. Farlik, “Countering uavs-the mover of research in military technology,” Defence Science Journal, vol. 68, no. 5, p. 460, 2018.
- J. Xiao and M. Feroskhan, “Cyber attack detection and isolation for a quadrotor uav with modified sliding innovation sequences,” IEEE Transactions on Vehicular Technology, vol. 71, no. 7, pp. 7202–7214, 2022.
- S. Park, H. T. Kim, S. Lee, H. Joo, and H. Kim, “Survey on anti-drone systems: Components, designs, and challenges,” IEEE Access, vol. 9, pp. 42 635–42 659, 2021.
- B. Zhou, F. Gao, L. Wang, C. Liu, and S. Shen, “Robust and efficient quadrotor trajectory generation for fast autonomous flight,” IEEE Robotics and Automation Letters, vol. 4, no. 4, pp. 3529–3536, 2019.
- E. Kaufmann, M. Gehrig, P. Foehn, R. Ranftl, A. Dosovitskiy, V. Koltun, and D. Scaramuzza, “Beauty and the beast: Optimal methods meet learning for drone racing,” in 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 690–696.
- D. Falanga, K. Kleber, and D. Scaramuzza, “Dynamic obstacle avoidance for quadrotors with event cameras,” Science Robotics, vol. 5, no. 40, p. eaaz9712, 2020.
- Y. Song, M. Steinweg, E. Kaufmann, and D. Scaramuzza, “Autonomous drone racing with deep reinforcement learning,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 1205–1212.
- C. Xiao, P. Lu, and Q. He, “Flying through a narrow gap using end-to-end deep reinforcement learning augmented with curriculum learning and sim2real,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 5, pp. 2701–2708, 2023.
- A. Loquercio, E. Kaufmann, R. Ranftl, M. Müller, V. Koltun, and D. Scaramuzza, “Learning high-speed flight in the wild,” Science Robotics, vol. 6, no. 59, p. eabg5810, 2021.
- X. Zhou, J. Zhu, H. Zhou, C. Xu, and F. Gao, “Ego-swarm: A fully autonomous and decentralized quadrotor swarm system in cluttered environments,” in 2021 IEEE international conference on robotics and automation (ICRA). IEEE, 2021, pp. 4101–4107.
- M. Lu, H. Chen, and P. Lu, “Perception and avoidance of multiple small fast moving objects for quadrotors with only low-cost rgbd camera,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 11 657–11 664, 2022.
- H. Chen and P. Lu, “Real-time identification and avoidance of simultaneous static and dynamic obstacles on point cloud for uavs navigation,” Robotics and Autonomous Systems, vol. 154, p. 104124, 2022.
- X. Fang, C. Wang, L. Xie, and J. Chen, “Cooperative pursuit with multi-pursuer and one faster free-moving evader,” IEEE Transactions on Cybernetics, vol. 52, no. 3, pp. 1405–1414, 2022.
- B. Vlahov, E. Squires, L. Strickland, and C. Pippin, “On developing a uav pursuit-evasion policy using reinforcement learning,” in 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 2018, pp. 859–864.
- Y. Wang, L. Dong, and C. Sun, “Cooperative control for multi-player pursuit-evasion games with reinforcement learning,” Neurocomputing, vol. 412, pp. 101–114, 2020.
- C. de Souza, R. Newbury, A. Cosgun, P. Castillo, B. Vidolov, and D. Kulić, “Decentralized multi-agent pursuit using deep reinforcement learning,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4552–4559, 2021.
- R. Zhang, Q. Zong, X. Zhang, L. Dou, and B. Tian, “Game of drones: Multi-uav pursuit-evasion game with online motion planning by deep reinforcement learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 10, pp. 7900–7909, 2023.
- K. O. Stanley, J. Clune, J. Lehman, and R. Miikkulainen, “Designing neural networks through neuroevolution,” Nature Machine Intelligence, vol. 1, no. 1, pp. 24–35, 2019.
- M. O’Connell, G. Shi, X. Shi, K. Azizzadenesheli, A. Anandkumar, Y. Yue, and S.-J. Chung, “Neural-fly enables rapid learning for agile flight in strong winds,” Science Robotics, vol. 7, no. 66, p. eabm6597, 2022.
- J. Xiao, Y. X. M. Tan, X. Zhou, and M. Feroskhan, “Learning collaborative multi-target search for a visual drone swarm,” in 2023 IEEE Conference on Artificial Intelligence (CAI), 2023, pp. 5–7.
- J. Zhao, Y. Wang, Z. Cai, N. Liu, K. Wu, and Y. Wang, “Learning visual representation for autonomous drone navigation via a contrastive world model,” IEEE Transactions on Artificial Intelligence, pp. 1–14, 2023.
- J. Xiao, P. Pisutsin, and M. Feroskhan, “Collaborative target search with a visual drone swarm: An adaptive curriculum embedded multistage reinforcement learning approach,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–15, 2023.
- G. Wu, M. Fan, J. Shi, and Y. Feng, “Reinforcement learning based truck-and-drone coordinated delivery,” IEEE Transactions on Artificial Intelligence, vol. 4, no. 4, pp. 754–763, 2023.
- E. Kaufmann, L. Bauersfeld, A. Loquercio, M. Müller, V. Koltun, and D. Scaramuzza, “Champion-level drone racing using deep reinforcement learning,” Nature, vol. 620, no. 7976, pp. 982–987, 2023.
- D. Mellinger and V. Kumar, “Minimum snap trajectory generation and control for quadrotors,” in 2011 IEEE international conference on robotics and automation. IEEE, 2011, pp. 2520–2525.
- K. O. Stanley, “Why open-endedness matters,” Artificial life, vol. 25, no. 3, pp. 232–235, 2019.
- C. A. Holt and A. E. Roth, “The nash equilibrium: A perspective,” Proceedings of the National Academy of Sciences, vol. 101, no. 12, pp. 3999–4002, 2004.
- B. Zhou, J. Pan, F. Gao, and S. Shen, “Raptor: Robust and perception-aware trajectory replanning for quadrotor fast flight,” IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1992–2009, 2021.
- L. Quan, Z. Zhang, X. Zhong, C. Xu, and F. Gao, “Eva-planner: Environmental adaptive quadrotor planning,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 398–404.
- E. W. Dijkstra, “A note on two problems in connexion with graphs,” Numerische Mathematik, vol. 1, pp. 269–271, 1959.
- P. E. Hart, N. J. Nilsson, and B. Raphael, “A formal basis for the heuristic determination of minimum cost paths,” IEEE Transactions on Systems Science and Cybernetics, vol. 4, no. 2, pp. 100–107, 1968.
- F. Augugliaro, A. P. Schoellig, and R. D’Andrea, “Generation of collision-free trajectories for a quadrocopter fleet: A sequential convex programming approach,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012, pp. 1917–1922.
- Y.-b. Chen, G.-c. Luo, Y.-s. Mei, J.-q. Yu, and X.-l. Su, “Uav path planning using artificial potential field method updated by optimal control theory,” International Journal of Systems Science, vol. 47, no. 6, pp. 1407–1420, 2016.
- M. A. Anwar and A. Raychowdhury, “Navren-rl: Learning to fly in real environment via end-to-end deep reinforcement learning using monocular images,” in 2018 25th International Conference on Mechatronics and Machine Vision in Practice (M2VIP), 2018, pp. 1–6.
- X. Dai, Y. Mao, T. Huang, N. Qin, D. Huang, and Y. Li, “Automatic obstacle avoidance of quadrotor uav via cnn-based learning,” Neurocomputing, vol. 402, pp. 346–358, 2020.
- A. M. Andrew, “Reinforcement Learning: An Introduction,” pp. 1093–1096, 1998.
- J. F. Fisac and S. S. Sastry, “The pursuit-evasion-defense differential game in dynamic constrained environments,” in 2015 54th IEEE Conference on Decision and Control (CDC). IEEE, 2015, pp. 4549–4556.
- Z. Pu, H. Wang, Z. Liu, J. Yi, and S. Wu, “Attention enhanced reinforcement learning for multi agent cooperation,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 11, pp. 8235–8249, 2023.
- N.-M. T. Kokolakis and K. G. Vamvoudakis, “Safety-aware pursuit-evasion games in unknown environments using gaussian processes and finite-time convergent reinforcement learning,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–14, 2022.
- J. Heinrich, M. Lanctot, and D. Silver, “Fictitious self-play in extensive-form games,” in International conference on machine learning. PMLR, 2015, pp. 805–813.
- J. Heinrich and D. Silver, “Deep reinforcement learning from self-play in imperfect-information games,” arXiv preprint arXiv:1603.01121, 2016.
- D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel et al., “A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,” Science, vol. 362, no. 6419, pp. 1140–1144, 2018.
- O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., “Grandmaster level in starcraft ii using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019.
- C. Berner, G. Brockman, B. Chan, V. Cheung, P. Debiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse et al., “Dota 2 with large scale deep reinforcement learning,” arXiv preprint arXiv:1912.06680, 2019.
- B. Baker, I. Kanitscheider, T. Markov, Y. Wu, G. Powell, B. McGrew, and I. Mordatch, “Emergent tool use from multi-agent autocurricula,” in International Conference on Learning Representations, 2019.
- V. Konda and J. Tsitsiklis, “Actor-critic algorithms,” Advances in neural information processing systems, vol. 12, 1999.
- R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” Advances in neural information processing systems, vol. 12, 1999.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
- A. Juliani, V.-P. Berges, E. Teng, A. Cohen, J. Harper, C. Elion, C. Goy, Y. Gao, H. Henry, M. Mattar, and D. Lange, “Unity: A general platform for intelligent agents,” arXiv preprint arXiv:1809.02627, 2020.
- H. Choset, “Robotic motion planning: Potential functions,” Robotics Institute, Carnegie Mellon University, 2010.