Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

D-VAT: End-to-End Visual Active Tracking for Micro Aerial Vehicles (2308.16874v2)

Published 31 Aug 2023 in cs.RO

Abstract: Visual active tracking is a growing research topic in robotics due to its key role in applications such as human assistance, disaster recovery, and surveillance. In contrast to passive tracking, active tracking approaches combine vision and control capabilities to detect and actively track the target. Most of the work in this area focuses on ground robots, while the very few contributions on aerial platforms still pose important design constraints that limit their applicability. To overcome these limitations, in this paper we propose D-VAT, a novel end-to-end visual active tracking methodology based on deep reinforcement learning that is tailored to micro aerial vehicle platforms. The D-VAT agent computes the vehicle thrust and angular velocity commands needed to track the target by directly processing monocular camera measurements. We show that the proposed approach allows for precise and collision-free tracking operations, outperforming different state-of-the-art baselines on simulated environments which differ significantly from those encountered during training. Moreover, we demonstrate a smooth real-world transition to a quadrotor platform with mixed-reality.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. B. J. Emran and H. Najjaran, “A review of quadrotor: An underactuated mechanical system,” Annual Reviews in Control, vol. 46, 2018.
  2. D. Murray and A. Basu, “Motion tracking with an active camera,” IEEE transactions on pattern analysis and machine intelligence, vol. 16, no. 5, pp. 449–459, 1994.
  3. D. K. Das, M. Laha, S. Majumder, and D. Ray, “Stable and consistent object tracking: An active vision approach,” in Advanced Computational and Communication Paradigms.   Springer, 2018, pp. 299–308.
  4. C. Berner, G. Brockman, B. Chan, V. Cheung, P. Dębiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, et al., “Dota 2 with large scale deep reinforcement learning,” arXiv preprint:1912.06680, 2019.
  5. D. S. Chaplot, R. Salakhutdinov, A. Gupta, and S. Gupta, “Neural topological slam for visual navigation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12 875–12 884.
  6. A. Devo, G. Costante, and P. Valigi, “Deep reinforcement learning for instruction following visual navigation in 3d maze-like environments,” IEEE Robotics and Automation Letters, vol. 5, no. 2.
  7. A. Devo, A. Dionigi, and G. Costante, “Enhancing continuous control of mobile robots for end-to-end visual active tracking,” Robotics and Autonomous Systems, vol. 142, p. 103799, 2021.
  8. A. Dionigi, A. Devo, L. Guiducci, and G. Costante, “E-vat: An asymmetric end-to-end approach to visual active exploration and tracking,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4259–4266, 2022.
  9. M. Xi, Y. Zhou, Z. Chen, W. Zhou, and H. Li, “Anti-distractor active object tracking in 3d environments,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 6, pp. 3697–3707, 2021.
  10. W. Zhao, Z. Meng, K. Wang, J. Zhang, and S. Lu, “Hierarchical active tracking control for UAVs via deep reinforcement learning,” Applied Sciences, vol. 11, no. 22, 2021.
  11. E. Kaufmann, L. Bauersfeld, and D. Scaramuzza, “A benchmark comparison of learned control policies for agile quadrotor flight,” in 2022 International Conference on Robotics and Automation (ICRA), 2022.
  12. N. Bellotto, B. Benfold, H. Harland, H.-H. Nagel, N. Pirlo, I. Reid, E. Sommerlade, and C. Zhao, “Cognitive visual tracking and camera control,” Computer Vision and Image Understanding, vol. 116, no. 3, pp. 457–471, 2012.
  13. S. Ribaric, G. Adrinek, and S. Segvic, “Real-time active visual tracking system,” in Proceedings of the 12th IEEE Mediterranean Electrotechnical Conference (IEEE Cat. No. 04CH37521), vol. 1, 2004.
  14. Z.-W. Hong, C. Yu-Ming, S.-Y. Su, T.-Y. Shann, Y.-H. Chang, H.-K. Yang, B. H.-L. Ho, C.-C. Tu, Y.-C. Chang, T.-C. Hsiao, et al., “Virtual-to-real: Learning to control in visual semantic segmentation,” Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), 2018.
  15. A. Devo, G. Mezzetti, G. Costante, M. L. Fravolini, and P. Valigi, “Towards generalization in target-driven visual navigation by using deep reinforcement learning,” IEEE Transactions on Robotics, 2020.
  16. D. S. Chaplot, D. P. Gandhi, A. Gupta, and R. R. Salakhutdinov, “Object goal navigation using goal-oriented semantic exploration,” Advances in Neural Information Processing Systems, vol. 33, 2020.
  17. W. Luo, P. Sun, F. Zhong, W. Liu, T. Zhang, and Y. Wang, “End-to-end active object tracking and its real-world deployment via reinforcement learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
  18. F. Zhong, P. Sun, W. Luo, T. Yan, and Y. Wang, “Ad-vat+: An asymmetric dueling mechanism for learning and understanding visual active tracking,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
  19. J. Hwangbo, I. Sa, R. Siegwart, and M. Hutter, “Control of a quadrotor with reinforcement learning,” IEEE Robotics and Automation Letters, vol. 2, no. 4, pp. 2096–2103, 2017.
  20. C. Sampedro, A. Rodriguez-Ramos, I. Gil, L. Mejias, and P. Campoy, “Image-based visual servoing controller for multirotor aerial robots using deep reinforcement learning,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2018, pp. 979–986.
  21. G. Wang, J. Qin, Q. Liu, Q. Ma, and C. Zhang, “Image-based visual servoing of quadrotors to arbitrary flight targets,” IEEE Robotics and Automation Letters, vol. 8, no. 4, pp. 2022–2029, 2023.
  22. M. W. Mueller, M. Hehn, and R. D’Andrea, “A computationally efficient motion primitive for quadrocopter trajectory generation,” IEEE transactions on robotics, vol. 31, no. 6, pp. 1294–1310, 2015.
  23. L. Pinto, M. Andrychowicz, P. Welinder, W. Zaremba, and P. Abbeel, “Asymmetric actor critic for image-based robot learning,” arXiv preprint arXiv:1710.06542, 2017.
  24. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
  25. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International conference on machine learning.   PMLR, 2018, pp. 1861–1870.
  26. J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in 2017 IEEE/RSJ international conference on intelligent robots and systems, 2017, pp. 23–30.
  27. A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, “Stable-baselines3: Reliable reinforcement learning implementations,” Journal of Machine Learning Research, vol. 22, no. 268, 2021.
  28. B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan, “Siamrpn++: Evolution of siamese visual tracking with very deep networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4282–4291.
  29. M. Leomanni, F. Ferrante, N. Cartocci, G. Costante, M. L. Fravolini, K. M. Dogan, and T. Yucelen, “Robust output feedback control of a quadrotor UAV for autonomous vision-based target tracking,” in AIAA SCITECH 2023 Forum, 2023.
  30. A. Devo, J. Mao, G. Costante, and G. Loianno, “Autonomous single-image drone exploration with deep reinforcement learning and mixed reality,” IEEE Robotics and Automation Letters, vol. 7, no. 2, 2022.
  31. A. Dionigi, M. Leomanni, A. Saviolo, G. Loianno, and G. Costante, “Exploring deep reinforcement learning for robust target tracking using micro aerial vehicles,” arXiv preprint arXiv:2312.17552, 2023.
Citations (5)

Summary

We haven't generated a summary for this paper yet.