Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SeqTrack3D: Exploring Sequence Information for Robust 3D Point Cloud Tracking (2402.16249v1)

Published 26 Feb 2024 in cs.CV

Abstract: 3D single object tracking (SOT) is an important and challenging task for the autonomous driving and mobile robotics. Most existing methods perform tracking between two consecutive frames while ignoring the motion patterns of the target over a series of frames, which would cause performance degradation in the scenes with sparse points. To break through this limitation, we introduce Sequence-to-Sequence tracking paradigm and a tracker named SeqTrack3D to capture target motion across continuous frames. Unlike previous methods that primarily adopted three strategies: matching two consecutive point clouds, predicting relative motion, or utilizing sequential point clouds to address feature degradation, our SeqTrack3D combines both historical point clouds and bounding box sequences. This novel method ensures robust tracking by leveraging location priors from historical boxes, even in scenes with sparse points. Extensive experiments conducted on large-scale datasets show that SeqTrack3D achieves new state-of-the-art performances, improving by 6.00% on NuScenes and 14.13% on Waymo dataset. The code will be made public at https://github.com/aron-lin/seqtrack3d.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S. Torr, “Fully-convolutional siamese networks for object tracking,” in European Conference on Computer Vision Workshops (G. Hua and H. Jégou, eds.), pp. 850–865, 2016.
  2. B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu, “High performance visual tracking with siamese region proposal network,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8971–8980, 2018.
  3. Z. Zhang and H. Peng, “Deeper and wider siamese networks for real-time visual tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4591–4600, 2019.
  4. H. Qi, C. Feng, Z. Cao, F. Zhao, and Y. Xiao, “P2b: Point-to-box network for 3d object tracking in point clouds,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
  5. C. Zheng, X. Yan, J. Gao, W. Zhao, W. Zhang, Z. Li, and S. Cui, “Box-aware feature enhancement for single object tracking on point clouds,” 2021.
  6. J. Shan, S. Zhou, Z. Fang, and Y. Cui, “Ptt: Point-track-transformer module for 3d single object tracking in point clouds,” 2021.
  7. Y. Cui, Z. Fang, J. Shan, Z. Gu, and S. Zhou, “3d object tracking with transformer,” 2021.
  8. C. Zhou, Z. Luo, Y. Luo, T. Liu, L. Pan, Z. Cai, H. Zhao, and S. Lu, “Pttr: Relational 3d point cloud object tracking with transformer,” in CVPR, pp. 8531–8540, 2022.
  9. J. Nie, Z. He, Y. Yang, M. Gao, and J. Zhang, “Glt-t: Global-local transformer voting for 3d single object tracking in point clouds,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 1957–1965, 2023.
  10. S. Giancola, J. Zarzar, and B. Ghanem, “Leveraging shape completion for 3d siamese tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
  11. C. Zheng, X. Yan, H. Zhang, B. Wang, S. Cheng, S. Cui, and Z. Li, “Beyond 3d siamese tracking: A motion-centric paradigm for 3d single object tracking in point clouds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8111–8120, 2022.
  12. K. Lan, H. Jiang, and J. Xie, “Temporal-aware siamese tracker: Integrate temporal context for 3d object tracking,” in Proceedings of the Asian Conference on Computer Vision, pp. 399–414, 2022.
  13. Y. Cui, Z. Li, and Z. Fang, “Sttracker: Spatio-temporal tracker for 3d single object tracking,” IEEE Robotics and Automation Letters, vol. 8, no. 8, pp. 4967–4974, 2023.
  14. C. R. Qi, O. Litany, K. He, and L. J. Guibas, “Deep hough voting for 3d object detection in point clouds,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.
  15. T. Yin, X. Zhou, and P. Krahenbuhl, “Center-based 3d object detection and tracking,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11784–11793, 2021.
  16. Y. Cui, J. Shan, Z. Gu, Z. Li, and Z. Fang, “Exploiting more information in sparse point cloud for 3d single object tracking,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 11926–11933, 2022.
  17. H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
  18. P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, et al., “Scalability in perception for autonomous driving: Waymo open dataset,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2446–2454, 2020.
  19. D. Guo, J. Wang, Y. Cui, Z. Wang, and S. Chen, “Siamcar: Siamese fully convolutional classification and regression for visual tracking,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6269–6277, 2020.
  20. L. Hui, L. Wang, L. Tang, K. Lan, J. Xie, and J. Yang, “3d siamese transformer network for single object tracking on point clouds,” arXiv preprint arXiv:2207.11995, 2022.
  21. X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,” arXiv preprint arXiv:2010.04159, 2020.
  22. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021.
  23. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022, 2021.
  24. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European Conference on Computer Vision, pp. 213–229, 2020.
  25. Y. Wang, V. C. Guizilini, T. Zhang, Y. Wang, H. Zhao, and J. Solomon, “Detr3d: 3d object detection from multi-view images via 3d-to-2d queries,” in Conference on Robot Learning, pp. 180–191, PMLR, 2022.
  26. E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” 2021.
  27. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
  28. G. Bertasius, H. Wang, and L. Torresani, “Is space-time attention all you need for video understanding?,” in ICML, vol. 2, p. 4, 2021.
  29. A. Arnab, M. Dehghani, G. Heigold, C. Sun, M. Lučić, and C. Schmid, “Vivit: A video vision transformer,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 6836–6846, 2021.
  30. D. Neimark, O. Bar, M. Zohar, and D. Asselmann, “Video transformer network,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 3163–3172, 2021.
  31. C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” in Advances in Neural Information Processing Systems, vol. 30, Curran Associates, Inc., 2017.
  32. I. Misra, R. Girdhar, and A. Joulin, “An end-to-end transformer model for 3d object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2906–2917, 2021.
  33. J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016.
  34. S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015.
  35. T.-X. Xu, Y.-C. Guo, Y.-K. Lai, and S.-H. Zhang, “Cxtrack: Improving 3d point cloud tracking with contextual information,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1084–1093, 2023.
  36. M. Kristan, J. Matas, A. Leonardis, T. Vojíř, R. Pflugfelder, G. Fernandez, G. Nebehay, F. Porikli, and L. Čehovin, “A novel performance evaluation methodology for single-target trackers,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 11, pp. 2137–2155, 2016.
  37. Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic graph cnn for learning on point clouds,” ACM Transactions on Graphics (tog), vol. 38, no. 5, pp. 1–12, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yu Lin (50 papers)
  2. Zhiheng Li (67 papers)
  3. Yubo Cui (14 papers)
  4. Zheng Fang (104 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.