Towards Real-World Aerial Vision Guidance with Categorical 6D Pose Tracker (2401.04377v2)
Abstract: Tracking the object 6-DoF pose is crucial for various downstream robot tasks and real-world applications. In this paper, we investigate the real-world robot task of aerial vision guidance for aerial robotics manipulation, utilizing category-level 6-DoF pose tracking. Aerial conditions inevitably introduce special challenges, such as rapid viewpoint changes in pitch and roll and inter-frame differences. To support these challenges in task, we firstly introduce a robust category-level 6-DoF pose tracker (Robust6DoF). This tracker leverages shape and temporal prior knowledge to explore optimal inter-frame keypoint pairs, generated under a priori structural adaptive supervision in a coarse-to-fine manner. Notably, our Robust6DoF employs a Spatial-Temporal Augmentation module to deal with the problems of the inter-frame differences and intra-class shape variations through both temporal dynamic filtering and shape-similarity filtering. We further present a Pose-Aware Discrete Servo strategy (PAD-Servo), serving as a decoupling approach to implement the final aerial vision guidance task. It contains two servo action policies to better accommodate the structural properties of aerial robotics manipulation. Exhaustive experiments on four well-known public benchmarks demonstrate the superiority of our Robust6DoF. Real-world tests directly verify that our Robust6DoF along with PAD-Servo can be readily used in real-world aerial robotic applications.
- T. Zhu, R. Wu, J. Hang, X. Lin, and Y. Sun, “Toward human-like grasp: Functional grasp by dexterous robotic hand via object-hand semantic representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 10, pp. 4394–4408, 2023.
- B. Huang, J. Li, J. Chen, G. Wang, J. Zhao, and T. Xu, “Anti-uav410: A thermal infrared benchmark and customized scheme for tracking drones in the wild,” IEEE Trans. Pattern Anal. Mach. Intell., 2023.
- Z. Cao, Z. Huang, L. Pan, S. Zhang, Z. Liu, and C. Fu, “Towards real-world visual tracking with temporal contexts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 12, pp. 15834–15849, 2023.
- Y. Ma, J. He, D. Yang, T. Zhang, and F. Wu, “Adaptive part mining for robust visual tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 10, pp. 11443–11457, 2023.
- C. Wang, R. Martín-Martín, D. Xu, J. Lv, C. Lu, L. Fei-Fei, S. Savarese, and Y. Zhu, “6-pack: Category-level 6d pose tracker with anchor-based keypoints,” in Proc. IEEE Int. Conf. Robot. Automat., 2020, pp. 10 059–10 066.
- Y. Weng, H. Wang, Q. Zhou, Y. Qin, Y. Duan, Q. Fan, B. Chen, H. Su, and L. J. Guibas, “Captra: Category-level pose tracking for rigid and articulated objects from point clouds,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 13 209–13 218.
- J. Sun, Y. Wang, M. Feng, D. Wang, J. Zhao, C. Stachniss, and X. Chen, “Ick-track: A category-level 6-dof pose tracker using inter-frame consistent keypoints for aerial manipulation,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2022, pp. 1556–1563.
- Y. Lin, J. Tremblay, S. Tyree, P. A. Vela, and S. Birchfield, “Keypoint-based category-level object pose tracking from an rgb sequence with uncertainty estimation,” in Proc. IEEE Int. Conf. Robot. Automat., 2022, pp. 1258–1264.
- S. Yu, D.-H. Zhai, Y. Xia, D. Li, and S. Zhao, “Cattrack: Single-stage category-level 6d object pose tracking via convolution and vision transformer,” IEEE Trans. Multimedia, 2023.
- H. Wang, S. Sridhar, J. Huang, J. Valentin, S. Song, and L. J. Guibas, “Normalized object coordinate space for category-level 6d object pose and size estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2642–2651.
- D. Chen, J. Li, Z. Wang, and K. Xu, “Learning canonical shape space for category-level 6d object pose and size estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 11 973–11 982.
- M. Tian, M. H. Ang, and G. H. Lee, “Shape prior deformation for categorical 6d object pose and size estimation,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 530–546.
- K. Chen and Q. Dou, “Sgpa: Structure-guided prior adaptation for category-level 6d object pose estimation,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 2773–2782.
- Y. Chen, L. Lan, X. Liu, G. Zeng, C. Shang, Z. Miao, H. Wang, Y. Wang, and Q. Shen, “Adaptive stiffness visual servoing for unmanned aerial manipulators with prescribed performance,” IEEE Trans. Ind. Electron., 2024.
- G. He, Y. Jangir, J. Geng, M. Mousaei, D. Bai, and S. Scherer, “Image-based visual servo control for aerial manipulation using a fully-actuated uav,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2023, pp. 5042–5049.
- O. A. Hay, M. Chehadeh, A. Ayyad, M. Wahbah, M. A. Humais, I. Boiko, L. Seneviratne, and Y. Zweiri, “Noise-tolerant identification and tuning approach using deep neural networks for visual servoing applications,” IEEE Trans. Robot., 2023.
- Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes,” arXiv preprint arXiv:1711.00199, 2017.
- B. Wen, C. Mitash, B. Ren, and K. E. Bekris, “se (3)-tracknet: Data-driven 6d pose tracking by calibrating image residuals in synthetic domains,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2020, pp. 10 367–10 373.
- Y. Ze and X. Wang, “Category-level 6d object pose estimation in the wild: A semi-supervised learning approach and a new dataset,” Proc. Adv. Neural Inf. Process. Syst., vol. 35, pp. 27 469–27 483, 2022.
- X. Xue, Y. Li, X. Yin, C. Shang, T. Peng, and Q. Shen, “Semantic-aware real-time correlation tracking framework for uav videos,” IEEE Trans. Cybern., vol. 52, no. 4, pp. 2418–2429, 2020.
- Z. Huang, C. Fu, Y. Li, F. Lin, and P. Lu, “Learning aberrance repressed correlation filters for real-time uav tracking,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 2891–2900.
- J. Ye, C. Fu, F. Lin, F. Ding, S. An, and G. Lu, “Multi-regularized correlation filter for uav tracking and self-localization,” IEEE Trans. Ind. Electron., vol. 69, no. 6, pp. 6004–6014, 2021.
- Y. Li, C. Fu, F. Ding, Z. Huang, and G. Lu, “Autotrack: Towards high-performance visual tracking for uav with automatic spatio-temporal regularization,” in Proc. IEEE Int. Conf. Comput. Vis., 2020, pp. 11 923–11 932.
- Z. Cao, Z. Huang, L. Pan, S. Zhang, Z. Liu, and C. Fu, “Tctrack: Temporal contexts for aerial tracking. 2022 ieee,” in Proc. IEEE Int. Conf. Comput. Vis., 2022, pp. 14 778–14 788.
- T. Cao, W. Zhang, Y. Fu, S. Zheng, F. Luo, and C. Xiao, “Dgecn++: A depth-guided edge convolutional network for end-to-end 6d pose estimation via attention mechanism,” IEEE Trans. Circuits Syst. Video Technol., 2023.
- H. Jiang, Z. Dang, S. Gu, J. Xie, M. Salzmann, and J. Yang, “Center-based decoupled point-cloud registration for 6d object pose estimation,” in Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 3427–3437.
- H. Zhao, S. Wei, D. Shi, W. Tan, Z. Li, Y. Ren, X. Wei, Y. Yang, and S. Pu, “Learning symmetry-aware geometry correspondences for 6d object pose estimation,” in Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 14 045–14 054.
- G. Zhou, N. Gothoskar, L. Wang, J. B. Tenenbaum, D. Gutfreund, M. Lázaro-Gredilla, D. George, and V. K. Mansinghka, “3d neural embedding likelihood: Probabilistic inverse graphics for robust 6d pose estimation,” in Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 21 625–21 636.
- R. Chen, I. Liu, E. Yang, J. Tao, X. Zhang, Q. Ran, Z. Liu, J. Xu, and H. Su, “Activezero++: Mixed domain learning stereo and confidence-based depth completion with zero annotation,” IEEE Trans. Pattern Anal. Mach. Intell., 2023.
- G. Wang, F. Manhardt, X. Liu, X. Ji, and F. Tombari, “Occlusion-aware self-supervised monocular 6d object pose estimation,” IEEE Trans. Pattern Anal. Mach. Intell., 2021.
- D. Wang, G. Zhou, Y. Yan, H. Chen, and Q. Chen, “Geopose: Dense reconstruction guided 6d object pose estimation with geometric consistency,” IEEE Trans. Multimedia, vol. 24, pp. 4394–4408, 2021.
- I. Shugurov, S. Zakharov, and S. Ilic, “Dpodv2: Dense correspondence-based 6 dof pose estimation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 11, pp. 7417–7435, 2021.
- L. Zou, Z. Huang, N. Gu, and G. Wang, “Gpt-cope: A graph-guided point transformer for category-level object pose estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, pp. 6907–6921, 2023.
- L. Zou, Z. Huang, N. Gu, and G. Wang, “6d-vit: Category-level 6d object pose estimation via transformer-based instance representation learning,” IEEE Trans. Image Processing, vol. 31, pp. 6907–6921, 2022.
- J. Lin, Z. Wei, Y. Zhang, and K. Jia, “Vi-net: Boosting category-level 6d object pose estimation via learning decoupled rotations on the spherical representations,” in Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 14 001–14 011.
- J. Lin, Z. Wei, Z. Li, S. Xu, K. Jia, and Y. Li, “Dualposenet: Category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 3560–3569.
- W. Chen, X. Jia, H. J. Chang, J. Duan, L. Shen, and A. Leonardis, “Fs-net: Fast shape-based network for category-level 6d object pose estimation with decoupled rotation mechanism,” in IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 1581–1590.
- L. Liu, H. Xue, W. Xu, H. Fu, and C. Lu, “Toward real-world category-level articulation pose estimation,” IEEE Trans. Image Processing, vol. 31, pp. 1072–1083, 2022.
- H. Lin, Z. Liu, C. Cheang, Y. Fu, G. Guo, and X. Xue, “Sar-net: Shape alignment and recovery network for category-level 6d object pose and size estimation,” in IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 6707–6717.
- R. Wang, X. Wang, T. Li, R. Yang, M. Wan, and W. Liu, “Query6dof: Learning sparse queries as implicit shape prior for category-level 6dof pose estimation,” in Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 14 055–14 064.
- S. Yu, D.-H. Zhai, Y. Guan, and Y. Xia, “Category-level 6-d object pose estimation with shape deformation for robotic grasp detection,” IEEE Trans. Neural Netw. Learn. Syst., 2023.
- T. Lee, B.-U. Lee, I. Shin, J. Choe, U. Shin, I. S. Kweon, and K.-J. Yoon, “Uda-cope: unsupervised domain adaptation for category-level object pose estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 14 891–14 900.
- T. Lee, J. Tremblay, V. Blukis, B. Wen, B.-U. Lee, I. Shin, S. Birchfield, I. S. Kweon, and K.-J. Yoon, “Tta-cope: Test-time adaptation for category-level object pose estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 21 285–21 295.
- L. Zheng, C. Wang, Y. Sun, E. Dasgupta, H. Chen, A. Leonardis, W. Zhang, and H. J. Chang, “Hs-pose: Hybrid scope feature extraction for category-level object pose estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 17 163–17 173.
- Z. Liu, Q. Wang, D. Liu, and J. Tan, “Pa-pose: Partial point cloud fusion based on reliable alignment for 6d pose tracking,” Pattern Recognit., p. 110151, 2023.
- L. Wang, S. Yan, J. Zhen, Y. Liu, M. Zhang, G. Zhang, and X. Zhou, “Deep active contours for real-time 6-dof object tracking,” in Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 14 034–14 044.
- B. Wen, J. Tremblay, V. Blukis, S. Tyree, T. Müller, A. Evans, D. Fox, J. Kautz, and S. Birchfield, “Bundlesdf: Neural 6-dof tracking and 3d reconstruction of unknown objects,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 606–617.
- X. Deng, A. Mousavian, Y. Xiang, F. Xia, T. Bretl, and D. Fox, “Poserbpf: A rao–blackwellized particle filter for 6-d object pose tracking,” IEEE Trans. Robot., vol. 37, no. 5, pp. 1328–1342, 2021.
- A. Santamaria-Navarro, P. Grosch, V. Lippiello, J. Solà, and J. Andrade-Cetto, “Uncalibrated visual servo for unmanned aerial manipulation,” IEEE/ASME Trans. Mechatron., vol. 22, no. 4, pp. 1610–1621, 2017.
- S. Kim, H. Seo, S. Choi, and H. J. Kim, “Vision-guided aerial manipulation using a multirotor with a robotic arm,” IEEE/ASME Trans. Mechatron., vol. 21, no. 4, pp. 1912–1923, 2016.
- C. Gabellieri, Y. S. Sarkisov, A. Coelho, L. Pallottino, K. Kondak, and M. J. Kim, “Compliance control of a cable-suspended aerial manipulator using hierarchical control framework,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2020, pp. 7196–7202.
- G. Zhang, Y. He, B. Dai, F. Gu, J. Han, and G. Liu, “Robust control of an aerial manipulator based on a variable inertia parameters model,” IEEE Trans. Ind. Electron., vol. 67, no. 11, pp. 9515–9525, 2019.
- C. Wang, D. Xu, Y. Zhu, R. Martín-Martín, C. Lu, L. Fei-Fei, and S. Savarese, “Densefusion: 6d object pose estimation by iterative dense fusion,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3343–3352.
- S. Wang, B. Z. Li, M. Khabsa, H. Fang, and H. Ma, “Linformer: Self-attention with linear complexity,” arXiv preprint arXiv:2006.04768, 2020.
- E. R. Chan, C. Z. Lin, M. A. Chan, K. Nagano, B. Pan, S. De Mello, O. Gallo, L. J. Guibas, J. Tremblay, S. Khamis et al., “Efficient geometry-aware 3d generative adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 16 123–16 133.
- J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “Loftr: Detector-free local feature matching with transformers,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 8922–8931.
- M. Tyszkiewicz, P. Fua, and E. Trulls, “Disk: Learning local features with policy gradient,” Proc. Adv. Neural Inf. Process. Syst., vol. 33, pp. 14 254–14 265, 2020.
- E. Malis, F. Chaumette, and S. Boudet, “2 1/2 d visual servoing,” IEEE Trans. Robot. Auto., vol. 15, no. 2, pp. 238–250, 1999.
- J. Wang, K. Chen, and Q. Dou, “Category-level 6d object pose estimation via cascaded relation and recurrent reconstruction networks,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2021, pp. 4807–4814.
- M. Z. Irshad, T. Kollar, M. Laskey, K. Stone, and Z. Kira, “Centersnap: Single-shot multi-object 3d shape reconstruction and categorical 6d pose and size estimation,” in Proc. IEEE Int. Conf. Robot. Automat., 2022, pp. 10 632–10 640.
- M. Z. Irshad, S. Zakharov, R. Ambrus, T. Kollar, Z. Kira, and A. Gaidon, “Shapo: Implicit representations for multi-object shape, appearance, and pose optimization,” in Proc. Eur. Conf. Comput. Vis., 2022, pp. 275–292.
- J. Liu, Y. Chen, X. Ye, and X. Qi, “Prior-free category-level pose estimation with implicit space transformation,” arXiv preprint arXiv:2303.13479, 2023.
- Y. Di, R. Zhang, Z. Lou, F. Manhardt, X. Ji, N. Navab, and F. Tombari, “Gpv-pose: Category-level object pose estimation via geometry-guided point-wise voting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 6781–6791.
- M. Runz, M. Buffier, and L. Agapito, “Maskfusion: Real-time recognition, tracking and reconstruction of multiple moving objects,” in IEEE ISMAR, 2018, pp. 10–20.
- J. Issac, M. Wüthrich, C. G. Cifuentes, J. Bohg, S. Trimpe, and S. Schaal, “Depth-based object tracking using a robust gaussian filter,” in Proc. IEEE Int. Conf. Robot. Automat., 2016, pp. 608–615.
- M. Wüthrich, P. Pastor, M. Kalakrishnan, J. Bohg, and S. Schaal, “Probabilistic object tracking using a range camera,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2013, pp. 3195–3202.
- H. Yang, J. Shi, and L. Carlone, “Teaser: Fast and certifiable point cloud registration,” IEEE Trans. Robot., vol. 37, no. 2, pp. 314–333, 2020.
- B. Wen and K. Bekris, “Bundletrack: 6d pose tracking for novel objects without instance or category-level 3d models,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2021, pp. 8067–8074.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2017.