Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unifying Foundation Models with Quadrotor Control for Visual Tracking Beyond Object Categories (2310.04781v3)

Published 7 Oct 2023 in cs.RO

Abstract: Visual control enables quadrotors to adaptively navigate using real-time sensory data, bridging perception with action. Yet, challenges persist, including generalization across scenarios, maintaining reliability, and ensuring real-time responsiveness. This paper introduces a perception framework grounded in foundation models for universal object detection and tracking, moving beyond specific training categories. Integral to our approach is a multi-layered tracker integrated with the foundation detector, ensuring continuous target visibility, even when faced with motion blur, abrupt light shifts, and occlusions. Complementing this, we introduce a model-free controller tailored for resilient quadrotor visual tracking. Our system operates efficiently on limited hardware, relying solely on an onboard camera and an inertial measurement unit. Through extensive validation in diverse challenging indoor and outdoor environments, we demonstrate our system's effectiveness and adaptability. In conclusion, our research represents a step forward in quadrotor visual tracking, moving from task-specific methods to more versatile and adaptable operations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (80)
  1. B. J. Emran and H. Najjaran, “A review of quadrotor: An underactuated mechanical system,” Annual Reviews in Control, vol. 46, pp. 165–180, 2018.
  2. P. P. Rao, F. Qiao, W. Zhang, Y. Xu, Y. Deng, G. Wu, and Q. Zhang, “Quadformer: Quadruple transformer for unsupervised domain adaptation in power line segmentation of aerial images,” arXiv preprint arXiv:2211.16988, 2022.
  3. L. Morando, C. T. Recchiuto, J. Calla, P. Scuteri, and A. Sgorbissa, “Thermal and visual tracking of photovoltaic plants for autonomous uav inspection,” Drones, vol. 6, no. 11, p. 347, 2022.
  4. A. Saviolo, J. Mao, R. B. TMB, V. Radhakrishnan, and G. Loianno, “Autocharge: Autonomous charging for perpetual quadrotor missions,” in IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 5400–5406.
  5. E. S. Lee, G. Loianno, D. Jayaraman, and V. Kumar, “Vision-based perimeter defense via multiview pose estimation,” arXiv preprint arXiv:2209.12136, 2022.
  6. R. Fan, J. Jiao, J. Pan, H. Huang, S. Shen, and M. Liu, “Real-time dense stereo embedded in a uav for road inspection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 06 2019, pp. 535–543.
  7. T. Sikora, L. Markovic, and S. Bogdan, “Towards operating wind turbine inspections using a lidar-equipped uav,” arXiv preprint arXiv:2306.14637, 2023.
  8. J. Courbon, Y. Mezouar, N. Guenard, and P. Martinet, “Visual navigation of a quadrotor aerial vehicle,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2009, pp. 5315–5320.
  9. W. Zheng, F. Zhou, and Z. Wang, “Robust and accurate monocular visual navigation combining imu for a quadrotor,” IEEE/CAA Journal of Automatica Sinica, vol. 2, no. 1, pp. 33–44, 2015.
  10. D. Scaramuzza and E. Kaufmann, “Learning agile, vision-based drone flight: From simulation to reality,” in Robotics Research, A. Billard, T. Asfour, and O. Khatib, Eds.   Cham: Springer Nature Switzerland, 2023, pp. 11–18.
  11. S. Bouabdallah and R. Siegwart, “Full control of a quadrotor,” in IEEE/RSJ international conference on intelligent robots and systems (IROS), 2007, pp. 153–158.
  12. C. Eilers, J. Eschmann, R. Menzenbach, B. Belousov, F. Muratore, and J. Peters, “Underactuated waypoint trajectory optimization for light painting photography,” in IEEE International Conference on Robotics and Automation (ICRA), 2020, pp. 1505–1510.
  13. L. Quan, L. Yin, T. Zhang, M. Wang, R. Wang, S. Zhong, X. Zhou, Y. Cao, C. Xu, and F. Gao, “Robust and efficient trajectory planning for formation flight in dense environments,” IEEE Transactions on Robotics, vol. 39, no. 6, pp. 4785–4804, 2023.
  14. A. Saviolo, G. Li, and G. Loianno, “Physics-inspired temporal learning of quadrotor dynamics for accurate model predictive trajectory tracking,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10 256–10 263, 2022.
  15. G. Cioffi, L. Bauersfeld, and D. Scaramuzza, “HDVIO: Improving Localization and Disturbance Estimation with Hybrid Dynamics VIO,” in Proceedings of Robotics: Science and Systems (RSS), 2023.
  16. D. Guo and K. K. Leang, “Image-based estimation, planning, and control for high-speed flying through multiple openings,” The International Journal of Robotics Research, vol. 39, no. 9, pp. 1122–1137, 2020.
  17. A. Saviolo and G. Loianno, “Learning quadrotor dynamics for precise, safe, and agile flight control,” Annual Reviews in Control, vol. 55, pp. 45–60, 2023.
  18. J. Yeom, G. Li, and G. Loianno, “Geometric fault-tolerant control of quadrotors in case of rotor failures: An attitude based comparative study,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023, pp. 4974–4980.
  19. A. Saviolo, J. Frey, A. Rathod, M. Diehl, and G. Loianno, “Active learning of discrete-time dynamics for uncertainty-aware model predictive control,” IEEE Transactions on Robotics, vol. 40, pp. 1273–1291, 2024.
  20. D. Zheng, H. Wang, W. Chen, and Y. Wang, “Planning and tracking in image space for image-based visual servoing of a quadrotor,” IEEE Transactions on Industrial Electronics, vol. 65, no. 4, pp. 3376–3385, 2018.
  21. L. Manuelli, Y. Li, P. Florence, and R. Tedrake, “Keypoints into the future: Self-supervised correspondence in model-based reinforcement learning,” in Proceedings of the 2020 Conference on Robot Learning (CoRL), ser. Proceedings of Machine Learning Research, J. Kober, F. Ramos, and C. Tomlin, Eds., vol. 155.   PMLR, 2021, pp. 693–710.
  22. Z. Han, R. Zhang, N. Pan, C. Xu, and F. Gao, “Fast-tracker: A robust aerial system for tracking agile target in cluttered environments,” in IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 328–334.
  23. N. Pan, R. Zhang, T. Yang, C. Cui, C. Xu, and F. Gao, “Fast-tracker 2.0: Improving autonomy of aerial tracking with active vision and human location regression,” IET Cyber-Systems and Robotics, vol. 3, no. 4, pp. 292–301, 2021.
  24. B. Espiau, F. Chaumette, and P. Rives, “A new approach to visual servoing in robotics,” IEEE Transactions on Robotics and Automation, vol. 8, no. 3, pp. 313–326, 1992.
  25. S. Darma, J. L. Buessler, G. Hermann, J. P. Urban, and B. Kusumoputro, “Visual servoing quadrotor control in autonomous target search,” in IEEE 3rd International Conference on System Engineering and Technology, 2013, pp. 319–324.
  26. F. Chaumette, S. Hutchinson, and P. Corke, “Visual servoing,” Springer handbook of robotics, pp. 841–866, 2016.
  27. J. Lin, Y. Wang, Z. Miao, S. Fan, and H. Wang, “Robust observer-based visual servo control for quadrotors tracking unknown moving targets,” IEEE/ASME Transactions on Mechatronics, 2022.
  28. C. Qin, Q. Yu, H. S. H. Go, and H. H.-T. Liu, “Perception-aware image-based visual servoing of aggressive quadrotor uavs,” IEEE/ASME Transactions on Mechatronics, vol. 28, no. 4, pp. 2020–2028, 2023.
  29. P. Wu, A. Escontrela, D. Hafner, P. Abbeel, and K. Goldberg, “Daydreamer: World models for physical robot learning,” in Conference on Robot Learning (CoRL).   PMLR, 2023, pp. 2226–2240.
  30. A. Loquercio, E. Kaufmann, R. Ranftl, M. Müller, V. Koltun, and D. Scaramuzza, “Learning high-speed flight in the wild,” Science Robotics, vol. 6, no. 59, p. eabg5810, 2021.
  31. A. Saxena, H. Pandya, G. Kumar, A. Gaud, and K. M. Krishna, “Exploring convolutional networks for end-to-end visual servoing,” in IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 3817–3823.
  32. M. Shirzadeh, H. J. Asl, A. Amirkhani, and A. A. Jalali, “Vision-based control of a quadrotor utilizing artificial neural networks for tracking of moving targets,” Engineering Applications of Artificial Intelligence, vol. 58, pp. 34–48, 2017.
  33. T. D. Kulkarni, A. Gupta, C. Ionescu, S. Borgeaud, M. Reynolds, A. Zisserman, and V. Mnih, “Unsupervised learning of object keypoints for perception and control,” Advances in neural information processing systems, vol. 32, 2019.
  34. F. Sadeghi, “Divis: Domain invariant visual servoing for collision-free goal reaching,” in Proceedings of Robotics: Science and Systems (RSS), FreiburgimBreisgau, Germany, June 2019.
  35. A. Rajguru, C. Collander, and W. J. Beksi, “Camera-based adaptive trajectory guidance via neural networks,” in 2020 6th International Conference on Mechatronics and Robotics Engineering (ICMRE), 2020, pp. 155–159.
  36. A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollar, and R. Girshick, “Segment anything,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 4015–4026.
  37. X. Zhao, W. Ding, Y. An, Y. Du, T. Yu, M. Li, M. Tang, and J. Wang, “Fast segment anything,” arXiv preprint arXiv:2306.12156, 2023.
  38. J. Yang, M. Gao, Z. Li, S. Gao, F. Wang, and F. Zheng, “Track anything: Segment anything meets videos,” arXiv preprint arXiv:2304.11968, 2023.
  39. F. Rajič, L. Ke, Y.-W. Tai, C.-K. Tang, M. Danelljan, and F. Yu, “Segment anything meets point tracking,” arXiv preprint arXiv:2307.01197, 2023.
  40. Y. Zhang, J. Wang, and X. Yang, “Real-time vehicle detection and tracking in video based on faster r-cnn,” in Journal of Physics: Conference Series, vol. 887, no. 1.   IOP Publishing, 2017, p. 012068.
  41. G. Brasó and L. Leal-Taixé, “Learning a neural solver for multiple object tracking,” in IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 6247–6257.
  42. H.-K. Jung and G.-S. Choi, “Improved yolov5: Efficient object detection using drone images under various conditions,” Applied Sciences, vol. 12, no. 14, p. 7255, 2022.
  43. D. Reis, J. Kupec, J. Hong, and A. Daoudi, “Real-time flying object detection with yolov8,” arXiv preprint arXiv:2305.09972, 2023.
  44. D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “Yolact: Real-time instance segmentation,” in IEEE/CVF International Conference on Computer Vision (CVPR), 2019, pp. 9157–9166.
  45. Q. Song, S. Li, Q. Bai, J. Yang, X. Zhang, Z. Li, and Z. Duan, “Object detection method for grasping robot based on improved yolov5,” Micromachines, vol. 12, no. 11, p. 1273, 2021.
  46. J. Terven, D.-M. Córdova-Esparza, and J.-A. Romero-González, “A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas,” Machine Learning and Knowledge Extraction, vol. 5, no. 4, pp. 1680–1716, 2023.
  47. T. Zhang, J. Xiao, L. Li, C. Wang, and G. Xie, “Toward coordination control of multiple fish-like robots: Real-time vision-based pose estimation and tracking via deep neural networks.” IEEE CAA J. Autom. Sinica, vol. 8, no. 12, pp. 1964–1976, 2021.
  48. N. Aharon, R. Orfaig, and B.-Z. Bobrovsky, “Bot-sort: Robust associations multi-pedestrian tracking,” arXiv preprint arXiv:2206.14651, 2022.
  49. Y. Du, Z. Zhao, Y. Song, Y. Zhao, F. Su, T. Gong, and H. Meng, “Strongsort: Make deepsort great again,” IEEE Transactions on Multimedia, pp. 1–14, 2023.
  50. A. Jadhav, P. Mukherjee, V. Kaushik, and B. Lall, “Aerial multi-object tracking by detection using deep association networks,” in 2020 National Conference on Communications (NCC), 2020, pp. 1–6.
  51. H. K. Cheng and A. G. Schwing, “Xmem: Long-term video object segmentation with an atkinson-shiffrin memory model,” in European Conference on Computer Vision.   Springer, 2022, pp. 640–658.
  52. A. W. Harley, Z. Fang, and K. Fragkiadaki, “Particle video revisited: Tracking through occlusions using point trajectories,” in European Conference on Computer Vision.   Springer, 2022, pp. 59–75.
  53. J. Thomas, G. Loianno, K. Sreenath, and V. Kumar, “Toward image based visual servoing for aerial grasping and perching,” in IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 2113–2118.
  54. S. Hutchinson, G. D. Hager, and P. I. Corke, “A tutorial on visual servo control,” IEEE Transactions on Robotics and Automation, vol. 12, no. 5, pp. 651–670, 1996.
  55. E. Malis, “Hybrid vision-based robot control robust to large calibration errors on both intrinsic and extrinsic camera parameters,” in European Control Conference (ECC), 2001, pp. 2898–2903.
  56. F. Chaumette and E. Marchand, “Recent results in visual servoing for robotics applications,” in 8th ESA Workshop on Advanced Space Technologies for Robotics and Automation (ASTRA), 2004, pp. 471–478.
  57. S. Azrad, F. Kendoul, and K. Nonami, “Visual servoing of quadrotor micro-air vehicle using color-based tracking algorithm,” Journal of System Design and Dynamics, vol. 4, no. 2, pp. 255–268, 2010.
  58. O. Bourquardez, R. Mahony, N. Guenard, F. Chaumette, T. Hamel, and L. Eck, “Image-based visual servo control of the translation kinematics of a quadrotor aerial vehicle,” IEEE Transactions on Robotics, vol. 25, no. 3, pp. 743–749, 2009.
  59. D. Falanga, A. Zanchettin, A. Simovic, J. Delmerico, and D. Scaramuzza, “Vision-based autonomous quadrotor landing on a moving platform,” in IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR), 2017, pp. 200–207.
  60. P. Roque, E. Bin, P. Miraldo, and D. V. Dimarogonas, “Fast model predictive image-based visual servoing for quadrotors,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 7566–7572.
  61. H. Sheng, E. Shi, and K. Zhang, “Image-based visual servoing of a quadrotor with improved visibility using model predictive control,” in IEEE 28th International Symposium on Industrial Electronics (ISIE), 2019, pp. 551–556.
  62. W. Zhao, H. Liu, F. L. Lewis, K. P. Valavanis, and X. Wang, “Robust visual servoing control for ground target tracking of quadrotors,” IEEE Transactions on Control Systems Technology, vol. 28, no. 5, pp. 1980–1987, 2019.
  63. X. Yi, B. Luo, and Y. Zhao, “Neural network-based robust guaranteed cost control for image-based visual servoing of quadrotor,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
  64. M. Leomanni, F. Ferrante, N. Cartocci, G. Costante, M. L. Fravolini, K. M. Dogan, and T. Yucelen, “Robust output feedback control of a quadrotor uav for autonomous vision-based target tracking,” in AIAA SCITECH 2023 Forum, 2023, p. 1632.
  65. C. Sampedro, A. Rodriguez-Ramos, I. Gil, L. Mejias, and P. Campoy, “Image-based visual servoing controller for multirotor aerial robots using deep reinforcement learning,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 979–986.
  66. S. Bhagat and P. Sujit, “Uav target tracking in urban environments using deep reinforcement learning,” in International Conference on Unmanned Aircraft Systems (ICUAS), 2020, pp. 694–701.
  67. G. Fu, H. Chu, L. Liu, L. Fang, and X. Zhu, “Deep reinforcement learning for the visual servoing control of uavs with fov constraint,” Drones, vol. 7, no. 6, p. 375, 2023.
  68. T. Gervet, S. Chintala, D. Batra, J. Malik, and D. S. Chaplot, “Navigating to objects in the real world,” Science Robotics, vol. 8, no. 79, p. eadf6991, 2023.
  69. L. Kaiser, M. Babaeizadeh, P. Milos, B. Osinski, R. H. Campbell, K. Czechowski, D. Erhan, C. Finn, P. Kozakowski, S. Levine, A. Mohiuddin, R. Sepassi, G. Tucker, and H. Michalewski, “Model based reinforcement learning for atari,” in 8th International Conference on Learning Representations (ICLR), 2020.
  70. D. Dugas, O. Andersson, R. Siegwart, and J. J. Chung, “Navdreams: Towards camera-only rl navigation among humans,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 2504–2511.
  71. A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsu, et al., “Rt-1: Robotics transformer for real-world control at scale,” arXiv preprint arXiv:2212.06817, 2022.
  72. A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn, et al., “Rt-2: Vision-language-action models transfer web knowledge to robotic control,” arXiv preprint arXiv:2307.15818, 2023.
  73. A. Keipour, G. A. Pereira, R. Bonatti, R. Garg, P. Rastogi, G. Dubey, and S. Scherer, “Visual servoing approach to autonomous uav landing on a moving vehicle,” Sensors, vol. 22, no. 17, p. 6549, 2022.
  74. G. Cho, J. Choi, G. Bae, and H. Oh, “Autonomous ship deck landing of a quadrotor uav using feed-forward image-based visual servoing,” Aerospace Science and Technology, vol. 130, p. 107869, 2022.
  75. G. Wang, J. Qin, Q. Liu, Q. Ma, and C. Zhang, “Image-based visual servoing of quadrotors to arbitrary flight targets,” IEEE Robotics and Automation Letters, vol. 8, no. 4, pp. 2022–2029, 2023.
  76. R. Ge, M. Lee, V. Radhakrishnan, Y. Zhou, G. Li, and G. Loianno, “Vision-based relative detection and tracking for teams of micro aerial vehicles,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 380–387.
  77. G. Loianno, C. Brunner, G. McGrath, and V. Kumar, “Estimation, control, and planning for aggressive flight with a small quadrotor with a single camera and imu,” IEEE Robotics and Automation Letters, vol. 2, no. 2, pp. 404–411, 2016.
  78. G. Jocher, A. Chaurasia, and J. Qiu, “YOLO by Ultralytics,” Jan. 2023. [Online]. Available: https://github.com/ultralytics/ultralytics
  79. NVIDIA Corporation, “Nvidia tensorrt,” Year of the version you’re referencing, e.g., 2021. [Online]. Available: https://developer.nvidia.com/tensorrt
  80. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds.   Cham: Springer International Publishing, 2014, pp. 740–755.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com