Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Collaborative Target Search with a Visual Drone Swarm: An Adaptive Curriculum Embedded Multistage Reinforcement Learning Approach (2204.12181v3)

Published 26 Apr 2022 in cs.RO and cs.AI

Abstract: Equipping drones with target search capabilities is highly desirable for applications in disaster rescue and smart warehouse delivery systems. Multiple intelligent drones that can collaborate with each other and maneuver among obstacles show more effectiveness in accomplishing tasks in a shorter amount of time. However, carrying out collaborative target search (CTS) without prior target information is extremely challenging, especially with a visual drone swarm. In this work, we propose a novel data-efficient deep reinforcement learning (DRL) approach called adaptive curriculum embedded multistage learning (ACEMSL) to address these challenges, mainly 3-D sparse reward space exploration with limited visual perception and collaborative behavior requirements. Specifically, we decompose the CTS task into several subtasks including individual obstacle avoidance, target search, and inter-agent collaboration, and progressively train the agents with multistage learning. Meanwhile, an adaptive embedded curriculum (AEC) is designed, where the task difficulty level (TDL) can be adaptively adjusted based on the success rate (SR) achieved in training. ACEMSL allows data-efficient training and individual-team reward allocation for the visual drone swarm. Furthermore, we deploy the trained model over a real visual drone swarm and perform CTS operations without fine-tuning. Extensive simulations and real-world flight tests validate the effectiveness and generalizability of ACEMSL. The project is available at https://github.com/NTU-UAVG/CTS-visual-drone-swarm.git.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. E. T. Alotaibi, S. S. Alqefari, and A. Koubaa, “Lsar: Multi-uav collaboration for search and rescue missions,” IEEE Access, vol. 7, pp. 55 817–55 832, 2019.
  2. Z. Chen, J. Alonso-Mora, X. Bai, D. D. Harabor, and P. J. Stuckey, “Integrated task assignment and path planning for capacitated multi-agent pickup and delivery,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 5816–5823, 2021.
  3. L. Chen, K. Cao, L. Xie, X. Li, and M. Feroskhan, “3-d network localization using angle measurements and reduced communication,” IEEE Transactions on Signal Processing, vol. 70, pp. 2402–2415, 2022.
  4. L. Chen, J. Xiao, R. C. H. Lin, and M. Feroskhan, “Angle-constrained formation maneuvering of unmanned aerial vehicles,” IEEE Transactions on Control Systems Technology, vol. 31, no. 4, pp. 1733–1746, 2023.
  5. J. Spaethe, L. Chittka, and P. Skorupski, “Visual search and decision making in bees: time, speed and accuracy,” International Journal of Comparative Psychology, pp. 342–357, 2006.
  6. J. P. Queralta, J. Taipalmaa, B. C. Pullinen, V. K. Sarker, T. N. Gia, H. Tenhunen, M. Gabbouj, J. Raitoharju, and T. Westerlund, “Collaborative multi-robot search and rescue: Planning, coordination, perception, and active vision,” IEEE Access, vol. 8, pp. 191 617–191 643, 2020.
  7. A. D. Haumann, K. D. Listmann, and V. Willert, “DisCoverage: A new paradigm for multi-robot exploration,” in Proceedings - IEEE International Conference on Robotics and Automation, 2010, pp. 929–934.
  8. E. Soria, F. Schiano, and D. Floreano, “Predictive control of aerial swarms in cluttered environments,” Nature Machine Intelligence, vol. 3, no. 6, pp. 545–554, 2021.
  9. T. Miki, M. Popović, A. Gawel, G. Hitz, and R. Siegwart, “Multi-agent time-based decision-making for the search and action problem,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 2365–2372.
  10. A. Dutta, A. Ghosh, and O. P. Kreidl, “Multi-robot informative path planning with continuous connectivity constraints,” in 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 3245–3251.
  11. X. Cao, D. Zhu, and S. X. Yang, “Multi-auv target search based on bioinspired neurodynamics model in 3-d underwater environments,” IEEE transactions on neural networks and learning systems, vol. 27, no. 11, pp. 2364–2374, 2015.
  12. W. Zhao, Q. Meng, and P. W. H. Chung, “A heuristic distributed task allocation method for multivehicle multitask problems and its application to search and rescue scenario,” IEEE Transactions on Cybernetics, vol. 46, no. 4, pp. 902–915, 2016.
  13. S. Hayat, E. Yanmaz, T. X. Brown, and C. Bettstetter, “Multi-objective uav path planning for search and rescue,” in 2017 IEEE international conference on robotics and automation (ICRA).   IEEE, 2017, pp. 5569–5574.
  14. Y.-J. Zheng, Y.-C. Du, H.-F. Ling, W.-G. Sheng, and S.-Y. Chen, “Evolutionary collaborative human-uav search for escaped criminals,” IEEE Transactions on Evolutionary Computation, vol. 24, no. 2, pp. 217–231, 2019.
  15. B. P. Duisterhof, S. Li, J. Burgués, V. J. Reddi, and G. C. de Croon, “Sniffy bug: A fully autonomous swarm of gas-seeking nano quadcopters in cluttered environments,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2021, pp. 9099–9106.
  16. W. Deng, J. Xu, X.-Z. Gao, and H. Zhao, “An enhanced msiqde algorithm with novel multiple strategies for global optimization problems,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 3, pp. 1578–1587, 2022.
  17. W. Deng, J. Xu, H. Zhao, and Y. Song, “A novel gate resource allocation method using improved pso-based qea,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 3, pp. 1737–1745, 2022.
  18. T. Luo, B. Subagdja, D. Wang, and A.-H. Tan, “Multi-agent collaborative exploration through graph-based deep reinforcement learning,” in 2019 IEEE International Conference on Agents (ICA), 2019, pp. 2–7.
  19. J. Hu, H. Niu, J. Carrasco, B. Lennox, and F. Arvin, “Voronoi-based multi-robot autonomous exploration in unknown environments via deep reinforcement learning,” IEEE Transactions on Vehicular Technology, vol. 69, no. 12, pp. 14 413–14 423, 2020.
  20. X. Liu and Y. Tan, “Feudal latent space exploration for coordinated multi-agent reinforcement learning,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–9, 2022.
  21. Y. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, L. Fei-Fei, and A. Farhadi, “Target-driven visual navigation in indoor scenes using deep reinforcement learning,” in 2017 IEEE international conference on robotics and automation (ICRA).   IEEE, 2017, pp. 3357–3364.
  22. Q. Wu, J. Wang, J. Liang, X. Gong, and D. Manocha, “Image-goal navigation in complex environments via modular learning,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 6902–6909, 2022.
  23. Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 41–48.
  24. C. Xiao, P. Lu, and Q. He, “Flying through a narrow gap using end-to-end deep reinforcement learning augmented with curriculum learning and sim2real,” IEEE Transactions on Neural Networks and Learning Systems, 2021.
  25. M. Damani, Z. Luo, E. Wenzel, and G. Sartoretti, “Primal _⁢2_2\_2_ 2: Pathfinding via reinforcement and imitation multi-agent learning-lifelong,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2666–2673, 2021.
  26. J. Yang, A. Nakhaei, D. Isele, K. Fujimura, and H. Zha, “Cm3: Cooperative multi-goal multi-stage multi-agent reinforcement learning,” in International Conference on Learning Representations, 2019.
  27. C. Wu, B. Ju, Y. Wu, X. Lin, N. Xiong, G. Xu, H. Li, and X. Liang, “Uav autonomous target search based on deep reinforcement learning in complex disaster scene,” IEEE Access, vol. 7, pp. 117 227–117 245, 2019.
  28. Y. Song, M. Steinweg, E. Kaufmann, and D. Scaramuzza, “Autonomous drone racing with deep reinforcement learning,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2021, pp. 1205–1212.
  29. A. Loquercio, E. Kaufmann, R. Ranftl, M. Müller, V. Koltun, and D. Scaramuzza, “Learning high-speed flight in the wild,” Science Robotics, vol. 6, no. 59, p. eabg5810, 2021.
  30. Z. Shen, L. Tan, S. Yu, and Y. Song, “Fault-tolerant adaptive learning control for quadrotor uavs with the time-varying cog and full-state constraints,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 12, pp. 5610–5622, 2021.
  31. M. O’Connell, G. Shi, X. Shi, K. Azizzadenesheli, A. Anandkumar, Y. Yue, and S.-J. Chung, “Neural-fly enables rapid learning for agile flight in strong winds,” Science Robotics, vol. 7, no. 66, p. eabm6597, 2022.
  32. C. Berner, G. Brockman, B. Chan, V. Cheung, P. Debiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse et al., “Dota 2 with large scale deep reinforcement learning,” arXiv preprint arXiv:1912.06680, 2019.
  33. G. Sartoretti, J. Kerr, Y. Shi, G. Wagner, T. S. Kumar, S. Koenig, and H. Choset, “Primal: Pathfinding via reinforcement and imitation multi-agent learning,” IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 2378–2385, 2019.
  34. Y. Liu, H. Liu, Y. Tian, and C. Sun, “Reinforcement learning based two-level control framework of uav swarm for cooperative persistent surveillance in an unknown urban area,” Aerospace Science and Technology, vol. 98, p. 105671, 2020.
  35. W. Gao, M. Mynuddin, D. C. Wunsch, and Z.-P. Jiang, “Reinforcement learning-based cooperative optimal output regulation via distributed adaptive internal model,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 10, pp. 5229–5240, 2022.
  36. V. Konda and J. Tsitsiklis, “Actor-critic algorithms,” Advances in neural information processing systems, vol. 12, 1999.
  37. R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” Advances in neural information processing systems, vol. 12, 1999.
  38. Y.-h. Chang, T. Ho, and L. Kaelbling, “All learning is local: Multi-agent learning in global reward games,” in Advances in Neural Information Processing Systems, S. Thrun, L. Saul, and B. Schölkopf, Eds., vol. 16.   MIT Press, 2003.
  39. J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, “Counterfactual multi-agent policy gradients,” in Proceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018.
  40. P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger, “Deep reinforcement learning that matters,” in Proceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018.
  41. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, feb 2015.
  42. L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning et al., “Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures,” in International Conference on Machine Learning.   PMLR, 2018, pp. 1407–1416.
  43. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  44. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International conference on machine learning.   PMLR, 2018, pp. 1861–1870.
  45. J. Foerster, N. Nardelli, G. Farquhar, T. Afouras, P. H. Torr, P. Kohli, and S. Whiteson, “Stabilising experience replay for deep multi-agent reinforcement learning,” in International conference on machine learning.   PMLR, 2017, pp. 1146–1155.
  46. A. Juliani, V.-P. Berges, E. Teng, A. Cohen, J. Harper, C. Elion, C. Goy, Y. Gao, H. Henry, M. Mattar et al., “Unity: A general platform for intelligent agents,” arXiv preprint arXiv:1809.02627, 2018.
  47. J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS).   IEEE, 2017, pp. 23–30.
  48. C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu, “The surprising effectiveness of ppo in cooperative multi-agent games,” Advances in Neural Information Processing Systems, vol. 35, pp. 24 611–24 624, 2022.
  49. E. Olson, “AprilTag: A robust and flexible visual fiducial system,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA).   IEEE, May 2011, pp. 3400–3407.
  50. M. Luo, X. Chang, L. Nie, Y. Yang, A. G. Hauptmann, and Q. Zheng, “An adaptive semisupervised feature analysis for video semantic recognition,” IEEE Transactions on Cybernetics, vol. 48, no. 2, pp. 648–660, 2018.
  51. J. Xiao and M. Feroskhan, “Cyber attack detection and isolation for a quadrotor uav with modified sliding innovation sequences,” IEEE Transactions on Vehicular Technology, vol. 71, no. 7, pp. 7202–7214, 2022.
  52. D. Zhang, L. Yao, K. Chen, S. Wang, X. Chang, and Y. Liu, “Making sense of spatio-temporal preserving representations for eeg-based human intention recognition,” IEEE Transactions on Cybernetics, vol. 50, no. 7, pp. 3033–3044, 2020.
  53. K. Chen, L. Yao, D. Zhang, X. Wang, X. Chang, and F. Nie, “A semisupervised recurrent convolutional attention model for human activity recognition,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 5, pp. 1747–1756, 2020.
Citations (8)

Summary

We haven't generated a summary for this paper yet.