Papers
Topics
Authors
Recent
AI Research Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 89 tok/s
Gemini 2.5 Pro 43 tok/s Pro
GPT-5 Medium 24 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 112 tok/s Pro
Kimi K2 199 tok/s Pro
GPT OSS 120B 449 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Small Object Tracking in LiDAR Point Cloud: Learning the Target-awareness Prototype and Fine-grained Search Region (2401.13285v1)

Published 24 Jan 2024 in cs.CV

Abstract: Single Object Tracking in LiDAR point cloud is one of the most essential parts of environmental perception, in which small objects are inevitable in real-world scenarios and will bring a significant barrier to the accurate location. However, the existing methods concentrate more on exploring universal architectures for common categories and overlook the challenges that small objects have long been thorny due to the relative deficiency of foreground points and a low tolerance for disturbances. To this end, we propose a Siamese network-based method for small object tracking in the LiDAR point cloud, which is composed of the target-awareness prototype mining (TAPM) module and the regional grid subdivision (RGS) module. The TAPM module adopts the reconstruction mechanism of the masked decoder to learn the prototype in the feature space, aiming to highlight the presence of foreground points that will facilitate the subsequent location of small objects. Through the above prototype is capable of accentuating the small object of interest, the positioning deviation in feature maps still leads to high tracking errors. To alleviate this issue, the RGS module is proposed to recover the fine-grained features of the search region based on ViT and pixel shuffle layers. In addition, apart from the normal settings, we elaborately design a scaling experiment to evaluate the robustness of the different trackers on small objects. Extensive experiments on KITTI and nuScenes demonstrate that our method can effectively improve the tracking performance of small targets without affecting normal-sized objects.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. A. I. Comport, É. Marchand, and F. Chaumette, “Robust model-based tracking for robot vision,” 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), vol. 1, pp. 692–697 vol.1, 2004.
  2. W. Luo, B. Yang, and R. Urtasun, “Fast and furious: Real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3569–3577, 2018.
  3. E. Machida, M. Cao, T. Murao, and H. Hashimoto, “Human motion tracking of mobile robot with kinect 3d sensor,” 2012 Proceedings of SICE Annual Conference (SICE), pp. 2207–2211, 2012.
  4. H. Qi, C. Feng, Z. CAO, F. Zhao, and Y. Xiao, “P2b: Point-to-box network for 3d object tracking in point clouds,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6328–6337, 2020.
  5. L. Hui, L. Wang, M. Cheng, J. Xie, and J. Yang, “3d siamese voxel-to-bev tracker for sparse point clouds,” in Neural Information Processing Systems, 2021.
  6. L. Hui, L. Wang, L.-Y. Tang, K. Lan, J. Xie, and J. Yang, “3d siamese transformer network for single object tracking on point clouds,” ArXiv, vol. abs/2207.11995, 2022.
  7. A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361, 2012.
  8. H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11 618–11 628, 2019.
  9. C. Chen, M.-Y. Liu, O. Tuzel, and J. Xiao, “R-cnn for small object detection,” in Asian Conference on Computer Vision, 2016.
  10. T. Ngo, B.-S. Hua, and K. D. M. Nguyen, “Isbnet: a 3d point cloud instance segmentation network with instance-aware sampling and box-aware dynamic convolution,” ArXiv, vol. abs/2303.00246, 2023.
  11. B. Graham, M. Engelcke, and L. van der Maaten, “3d semantic segmentation with submanifold sparse convolutional networks,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9224–9232, 2017.
  12. W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1874–1883, 2016.
  13. S. Giancola, J. Zarzar, and B. Ghanem, “Leveraging shape completion for 3d siamese tracking,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1359–1368, 2019.
  14. C. Qi, O. Litany, K. He, and L. J. Guibas, “Deep hough voting for 3d object detection in point clouds,” 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9276–9285, 2019.
  15. Z. Wang, Q. Xie, Y. Lai, J. Wu, K. Long, and J. Wang, “Mlvsnet: Multi-level voting siamese network for 3d visual tracking,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3081–3090, 2021.
  16. C. Zheng, X. Yan, J. Gao, W. Zhao, W. Zhang, Z. Li, and S. Cui, “Box-aware feature enhancement for single object tracking on point clouds,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13 179–13 188, 2021.
  17. A. Vaswani, N. M. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” ArXiv, vol. abs/1706.03762, 2017.
  18. Y. Cui, Z. Fang, J. Shan, Z. Gu, and S. Zhou, “3d object tracking with transformer,” in British Machine Vision Conference, 2021.
  19. J. Shan, S. Zhou, Z. Fang, and Y. Cui, “Ptt: Point-track-transformer module for 3d single object tracking in point clouds,” 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1310–1316, 2021.
  20. C. Zhou, Z. Luo, Y. Luo, T. Liu, L. Pan, Z. Cai, H. Zhao, and S. Lu, “Pttr: Relational 3d point cloud object tracking with transformer,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8521–8530, 2021.
  21. K. Wang, B. Fan, K. Zhang, and W. Zhou, “Accurate 3d single object tracker in point clouds with transformer,” 2022 China Automation Congress (CAC), pp. 6415–6420, 2022.
  22. C. Zheng, X. Yan, H. Zhang, B. Wang, S.-H. Cheng, S. Cui, and Z. Li, “Beyond 3d siamese tracking: A motion-centric paradigm for 3d single object tracking in point clouds,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8101–8110, 2022.
  23. T. Xu, Y. Guo, Y. Lai, and S. Zhang, “Cxtrack: Improving 3d point cloud tracking with contextual information,” ArXiv, vol. abs/2211.08542, 2022.
  24. T.-Y. Lin, P. Dollár, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie, “Feature pyramid networks for object detection,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944, 2016.
  25. S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8759–8768, 2018.
  26. C. Guo, B. Fan, Q. Zhang, S. Xiang, and C. Pan, “Augfpn: Improving multi-scale feature learning for object detection,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12 592–12 601, 2019.
  27. Y. Li, Y. Chen, N. Wang, and Z. Zhang, “Scale-aware trident networks for object detection,” 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6053–6062, 2019.
  28. S. Bell, C. L. Zitnick, K. Bala, and R. B. Girshick, “Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2874–2883, 2015.
  29. X. Tang, D. K. Du, Z. He, and J. Liu, “Pyramidbox: A context-assisted single shot face detector,” ArXiv, vol. abs/1803.07737, 2018.
  30. H. Hu, J. Gu, Z. Zhang, J. Dai, and Y. Wei, “Relation networks for object detection,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3588–3597, 2017.
  31. Y. Chen, P. Zhang, Z. Li, Y. Li, X. Zhang, G. Meng, S. Xiang, J. Sun, and J. Jia, “Stitcher: Feedback-driven data provider for object detection,” ArXiv, vol. abs/2004.12432, 2020.
  32. M. Kisantal, Z. Wojna, J. Murawski, J. Naruniec, and K. Cho, “Augmentation for small object detection,” ArXiv, vol. abs/1902.07296, 2019.
  33. Y. Bai, Y. Zhang, M. Ding, and B. Ghanem, “Sod-mtgan: Small object detection via multi-task generative adversarial network,” in European Conference on Computer Vision, 2018.
  34. J. Li, X. Liang, Y. Wei, T. Xu, J. Feng, and S. Yan, “Perceptual generative adversarial networks for small object detection,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1951–1959, 2017.
  35. J. Noh, W. Bae, W. Lee, J. Seo, and G. Kim, “Better to follow, follow to be better: Towards precise supervision of feature super-resolution for small object detection,” 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9724–9733, 2019.
  36. S. Zhang, X. Zhu, Z. Lei, H. Shi, X. Wang, and S. Li, “S3fd: Single shot scale-invariant face detector,” 2017 IEEE International Conference on Computer Vision (ICCV), pp. 192–201, 2017.
  37. G. Liu, J. Han, and W. Rong, “Feedback-driven loss function for small object detection,” Image Vis. Comput., vol. 111, p. 104197, 2021.
  38. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” ArXiv, vol. abs/2010.11929, 2020.
  39. M. Kristan, J. Matas, A. Leonardis, T. Vojír, R. P. Pflugfelder, G. J. Fernandez, G. Nebehay, F. M. Porikli, and L. Cehovin, “A novel performance evaluation methodology for single-target trackers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, pp. 2137–2155, 2015.
  40. Z. Fang, S. Zhou, Y. Cui, and S. A. Scherer, “3d-siamrpn: An end-to-end learning method for real-time 3d single object tracking using raw point cloud,” IEEE Sensors Journal, vol. 21, pp. 4995–5011, 2021.

Summary

We haven't generated a summary for this paper yet.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube