FlowTrack: Point-level Flow Network for 3D Single Object Tracking (2407.01959v1)
Abstract: 3D single object tracking (SOT) is a crucial task in fields of mobile robotics and autonomous driving. Traditional motion-based approaches achieve target tracking by estimating the relative movement of target between two consecutive frames. However, they usually overlook local motion information of the target and fail to exploit historical frame information effectively. To overcome the above limitations, we propose a point-level flow method with multi-frame information for 3D SOT task, called FlowTrack. Specifically, by estimating the flow for each point in the target, our method could capture the local motion details of target, thereby improving the tracking performance. At the same time, to handle scenes with sparse points, we present a learnable target feature as the bridge to efficiently integrate target information from past frames. Moreover, we design a novel Instance Flow Head to transform dense point-level flow into instance-level motion, effectively aggregating local motion information to obtain global target motion. Finally, our method achieves competitive performance with improvements of 5.9% on the KITTI dataset and 2.9% on NuScenes. The code will be made publicly available soon.
- L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S. Torr, “Fully-convolutional siamese networks for object tracking,” in ECCV Workshops (G. Hua and H. Jégou, eds.), pp. 850–865, 2016.
- B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu, “High performance visual tracking with siamese region proposal network,” in IEEE/CVF CVPR, pp. 8971–8980, 2018.
- D. Guo, J. Wang, Y. Cui, Z. Wang, and S. Chen, “Siamcar: Siamese fully convolutional classification and regression for visual tracking,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6269–6277, 2020.
- S. Giancola, J. Zarzar, and B. Ghanem, “Leveraging shape completion for 3d siamese tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
- H. Qi, C. Feng, Z. Cao, F. Zhao, and Y. Xiao, “P2b: Point-to-box network for 3d object tracking in point clouds,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
- L. Hui, L. Wang, L.-Y. Tang, K. Lan, J. Xie, and J. Yang, “3d siamese transformer network for single object tracking on point clouds,” in European Conference on Computer Vision, 2022.
- C. Zheng, X. Yan, H. Zhang, B. Wang, S. Cheng, S. Cui, and Z. Li, “Beyond 3d siamese tracking: A motion-centric paradigm for 3d single object tracking in point clouds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8111–8120, 2022.
- Y. Xia, Q. Wu, W. Li, A. B. Chan, and U. Stilla, “A lightweight and detector-free 3d single object tracker on point clouds,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 5, pp. 5543–5554, 2023.
- Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pp. 402–419, Springer, 2020.
- A. Luo, F. Yang, X. Li, L. Nie, C. Lin, H. Fan, and S. Liu, “Gaflow: Incorporating gaussian attention into optical flow,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9642–9651, 2023.
- Y. Yan, Y. Mao, and B. Li, “Second: Sparsely embedded convolutional detection,” in Sensors, vol. 18, 2018.
- L. Hui, L. Wang, M. Cheng, J. Xie, and J. Yang, “3d siamese voxel-to-bev tracker for sparse point clouds,” in Neural Information Processing Systems, 2021.
- C. R. Qi, O. Litany, K. He, and L. J. Guibas, “Deep hough voting for 3d object detection in point clouds,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.
- C. Zhou, Z. Luo, Y. Luo, T. Liu, L. Pan, Z. Cai, H. Zhao, and S. Lu, “Pttr: Relational 3d point cloud object tracking with transformer,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8521–8530, 2021.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, vol. 30, Curran Associates, Inc., 2017.
- Y. Cui, Z. Li, and Z. Fang, “Sttracker: Spatio-temporal tracker for 3d single object tracking,” IEEE Robotics and Automation Letters, vol. 8, pp. 4967–4974, 2023.
- M. J. Black and P. Anandan, “A framework for the robust estimation of optical flow,” in 1993 (4th) International Conference on Computer Vision, pp. 231–236, IEEE, 1993.
- A. Bruhn, J. Weickert, and C. Schnörr, “Lucas/kanade meets horn/schunck: Combining local and global optic flow methods,” International journal of computer vision, vol. 61, pp. 211–231, 2005.
- A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. Van Der Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional networks,” in Proceedings of the IEEE international conference on computer vision, pp. 2758–2766, 2015.
- Z. Huang, X. Shi, C. Zhang, Q. Wang, K. C. Cheung, H. Qin, J. Dai, and H. Li, “Flowformer: A transformer architecture for optical flow,” in European conference on computer vision, pp. 668–685, Springer, 2022.
- T. Meinhardt, A. Kirillov, L. Leal-Taixé, and C. Feichtenhofer, “Trackformer: Multi-object tracking with transformers,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8834–8844, 2021.
- Y. Zhou and O. Tuzel, “Voxelnet: End-to-end learning for point cloud based 3d object detection,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4490–4499, 2017.
- J. Shan, S. Zhou, Z. Fang, and Y. Cui, “Ptt: Point-track-transformer module for 3d single object tracking in point clouds,” 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1310–1316, 2021.
- C. Zheng, X. Yan, J. Gao, W. Zhao, W. Zhang, Z. Li, and S. Cui, “Box-aware feature enhancement for single object tracking on point clouds,” 2021 IEEE/CVF International Conference on Computer Vision, pp. 13179–13188, 2021.
- Y. Cui, J. Shan, Z. Gu, Z. Li, and Z. Fang, “Exploiting more information in sparse point cloud for 3d single object tracking,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 11926–11933, 2022.
- J. Nie, Z. He, Y. Yang, M. Gao, and J. Zhang, “Glt-t: Global-local transformer voting for 3d single object tracking in point clouds,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 1957–1965, 2023.
- J. Gao, X. Yan, W. Zhao, Z. Lyu, Y. Liao, and C. Zheng, “Spatio-temporal contextual learning for single object tracking on point clouds,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–13, 2023.
- T. Yin, X. Zhou, and P. Krahenbuhl, “Center-based 3d object detection and tracking,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11784–11793, 2021.
- A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361, 2012.
- H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
- M. Kristan, J. Matas, A. Leonardis, T. Vojíř, R. Pflugfelder, G. Fernández, G. Nebehay, F. Porikli, and L. Čehovin, “A novel performance evaluation methodology for single-target trackers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 11, pp. 2137–2155, 2016.
- S. Gao, C. Zhou, C. Ma, X. Wang, and J. Yuan, “Aiatrack: Attention in attention for transformer visual tracking,” in European Conference on Computer Vision, pp. 146–164, Springer, 2022.