Are Dense Labels Always Necessary for 3D Object Detection from Point Cloud? (2403.02818v1)
Abstract: Current state-of-the-art (SOTA) 3D object detection methods often require a large amount of 3D bounding box annotations for training. However, collecting such large-scale densely-supervised datasets is notoriously costly. To reduce the cumbersome data annotation process, we propose a novel sparsely-annotated framework, in which we just annotate one 3D object per scene. Such a sparse annotation strategy could significantly reduce the heavy annotation burden, while inexact and incomplete sparse supervision may severely deteriorate the detection performance. To address this issue, we develop the SS3D++ method that alternatively improves 3D detector training and confident fully-annotated scene generation in a unified learning scheme. Using sparse annotations as seeds, we progressively generate confident fully-annotated scenes based on designing a missing-annotated instance mining module and reliable background mining module. Our proposed method produces competitive results when compared with SOTA weakly-supervised methods using the same or even more annotation costs. Besides, compared with SOTA fully-supervised methods, we achieve on-par or even better performance on the KITTI dataset with about 5x less annotation cost, and 90% of their performance on the Waymo dataset with about 15x less annotation cost. The additional unlabeled training scenes could further boost the performance. The code will be available at https://github.com/gaocq/SS3D2.
- J. Shu, X. Yuan, D. Meng, and Z. Xu, “Dac-mr: Data augmentation consistency based meta-regularization for meta-learning,” arXiv preprint arXiv:2305.07892, 2023.
- J. Shu, D. Meng, and Z. Xu, “Learning an explicit hyperparameter prediction function conditioned on tasks,” Journal of Machine Learning Research, 2023.
- R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill et al., “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258, 2021.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment anything,” arXiv preprint arXiv:2304.02643, 2023.
- J. Shu, X. Yuan, and D. Meng, “Cmw-net: an adaptive robust algorithm for sample selection and label correction,” National Science Review, vol. 10, no. 6, p. nwad084, 2023.
- S. Shi, C. Guo, L. Jiang, Z. Wang, J. Shi, X. Wang, and H. Li, “Pv-rcnn: Point-voxel feature set abstraction for 3d object detection,” in CVPR, 2020, pp. 10 529–10 538.
- T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in ICCV, 2017, pp. 2980–2988.
- J. Shu, D. Meng, and Z. Xu, “Meta self-paced learning,” Scientia Sinica Informationis, vol. 50, no. 6, pp. 781–793, 2020.
- P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine et al., “Scalability in perception for autonomous driving: Waymo open dataset,” in CVPR, 2020, pp. 2446–2454.
- Y. Yan, Y. Mao, and B. Li, “Second: Sparsely embedded convolutional detection,” Sensors, vol. 18, no. 10, p. 3337, 2018.
- H. Wang, Y. Cong, O. Litany, Y. Gao, and L. J. Guibas, “3dioumatch: Leveraging iou prediction for semi-supervised 3d object detection,” in CVPR, 2021, pp. 14 615–14 624.
- W. Zheng, W. Tang, L. Jiang, and C.-W. Fu, “Se-ssd: Self-ensembling single-stage object detector from point cloud,” in CVPR, 2021, pp. 14 494–14 503.
- A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” Int. J. Rob. Res., vol. 32, no. 11, pp. 1231–1237, 2013.
- S. Shi, X. Wang, and H. Li, “Pointrcnn: 3d object proposal generation and detection from point cloud,” in CVPR, 2019, pp. 770–779.
- S. Shi, Z. Wang, J. Shi, X. Wang, and H. Li, “From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network,” IEEE TPAMI, vol. 43, no. 8, pp. 2647–2664, 2021.
- J. Mao, S. Shi, X. Wang, and H. Li, “3d object detection for autonomous driving: A review and new outlooks,” arXiv:2206.09474, 2022.
- T. Wang, T. Yang, J. Cao, and X. Zhang, “Co-mining: Self-supervised learning for sparsely annotated object detection,” in AAAI, vol. 35, no. 4, 2021, pp. 2800–2808.
- H. Zhang, F. Chen, Z. Shen, Q. Hao, C. Zhu, and M. Savvides, “Solving missing-annotation object detection with background recalibration loss,” in ICASSP, 2020, pp. 1888–1892.
- Y. Niitani, T. Akiba, T. Kerola, T. Ogawa, S. Sano, and S. Suzuki, “Sampling techniques for large-scale object detection from sparsely annotated objects,” in CVPR, 2019, pp. 6510–6518.
- C. R. Qi, O. Litany, K. He, and L. J. Guibas, “Deep hough voting for 3d object detection in point clouds,” in ICCV, 2019, pp. 9277–9286.
- Z. Wu, N. Bodla, B. Singh, M. Najibi, R. Chellappa, and L. S. Davis, “Soft sampling for robust object detection,” in BMVC, 2019, p. 225.
- O. D. Team, “Openpcdet: An open-source toolbox for 3d object detection from point clouds,” https://github.com/open-mmlab/OpenPCDet, 2020.
- J. Deng, S. Shi, P. Li, W. Zhou, Y. Zhang, and H. Li, “Voxel r-cnn: Towards high performance voxel-based 3d object detection,” in AAAI, vol. 35, no. 2, 2021, pp. 1201–1209.
- Q. Meng, W. Wang, T. Zhou, J. Shen, Y. Jia, and L. Van Gool, “Towards a weakly supervised framework for 3d point cloud object detection and annotation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 8, pp. 4454–4468, 2021.
- Z. Yang, Y. Sun, S. Liu, and J. Jia, “3dssd: Point-based 3d single stage object detector,” in CVPR, 2020, pp. 11 040–11 048.
- N. Zhao, T.-S. Chua, and G. H. Lee, “Sess: Self-ensembling semi-supervised 3d object detection,” in CVPR, 2020, pp. 11 079–11 087.
- A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” in CVPR, 2019, pp. 12 697–12 705.
- W. Zheng, W. Tang, S. Chen, L. Jiang, and C.-W. Fu, “Cia-ssd: Confident iou-aware single-stage object detector from point cloud,” in AAAI, vol. 35, no. 4, 2021, pp. 3555–3562.
- Z. Yang, Y. Sun, S. Liu, X. Shen, and J. Jia, “Std: Sparse-to-dense 3d object detector for point cloud,” in ICCV, 2019, pp. 1951–1960.
- Y. Zhang, D. Huang, and Y. Wang, “Pc-rgnn: Point cloud completion and graph neural network for 3d object detection,” in AAAI, vol. 35, no. 4, 2021, pp. 3430–3437.
- I. Loshchilov and F. Hutter, “SGDR: stochastic gradient descent with warm restarts,” in ICLR, 2017.
- W. Shi and R. Rajkumar, “Point-gnn: Graph neural network for 3d object detection in a point cloud,” in CVPR, 2020, pp. 1711–1719.
- C. He, H. Zeng, J. Huang, X.-S. Hua, and L. Zhang, “Structure aware single-stage 3d object detection from point cloud,” in CVPR, 2020, pp. 11 873–11 882.
- C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in CVPR, 2017, pp. 652–660.
- K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. Raffel, E. D. Cubuk, A. Kurakin, and C. Li, “Fixmatch: Simplifying semi-supervised learning with consistency and confidence,” in NeurIPS, 2020.
- C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” in NeurIPS, 2017, pp. 5099–5108.
- Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in ICML, 2009, pp. 41–48.
- M. Shi and V. Ferrari, “Weakly supervised object localization using size estimates,” in ECCV, 2016, pp. 105–121.
- J. Wang, X. Wang, and W. Liu, “Weakly-and semi-supervised faster r-cnn with curriculum learning,” in ICPR, 2018, pp. 2416–2421.
- R. Liu, J. Gao, J. Zhang, D. Meng, and Z. Lin, “Investigating bi-level optimization for learning and vision from a unified perspective: A survey and beyond,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- L. Franceschi, P. Frasconi, S. Salzo, R. Grazzi, and M. Pontil, “Bilevel programming for hyperparameter optimization and meta-learning,” in International Conference on Machine Learning, 2018, pp. 1568–1577.
- D. Meng, Q. Zhao, and L. Jiang, “A theoretical understanding of self-paced learning,” Information Sciences, vol. 414, pp. 319–328, 2017.
- X. Wang, Y. Chen, and W. Zhu, “A survey on curriculum learning,” TPAMI, 2021.
- J. Shu, Q. Xie, L. Yi, Q. Zhao, S. Zhou, Z. Xu, and D. Meng, “Meta-weight-net: Learning an explicit mapping for sample weighting,” in Advances in neural information processing systems, vol. 32, 2019.
- J. Shu, X. Yuan, D. Meng, and Z. Xu, “Cmw-net: Learning a class-aware sample weighting mapping for robust deep learning,” PAMI, 2023.
- A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” in Advances in neural information processing systems, vol. 30, 2017.
- P. Soviany, R. T. Ionescu, P. Rota, and N. Sebe, “Curriculum learning: A survey,” arXiv preprint arXiv:2101.10382, 2021.
- A. Shrivastava, A. Gupta, and R. Girshick, “Training region-based object detectors with online hard example mining,” in CVPR, 2016, pp. 761–769.
- D. Zhang, J. Han, L. Zhao, and D. Meng, “Leveraging prior-knowledge for weakly supervised object detection under a collaborative self-paced curriculum learning framework,” IJCV, vol. 127, no. 4, pp. 363–380, 2019.
- L. Jiang, D. Meng, Q. Zhao, S. Shan, and A. G. Hauptmann, “Self-paced curriculum learning,” in AAAI, 2015.
- S. Li, X. Zhu, Q. Huang, H. Xu, and C.-C. J. Kuo, “Multiple instance curriculum learning for weakly supervised object detection,” arXiv preprint arXiv:1711.09191, 2017.
- C. Liu, C. Gao, F. Liu, J. Liu, D. Meng, and X. Gao, “Ss3d: Sparsely-supervised 3d object detection from point cloud,” in CVPR, 2022, pp. 8428–8437.
- M. Kumar, B. Packer, and D. Koller, “Self-paced learning for latent variable models,” in NeurIPS, 2010.
- E. Sangineto, M. Nabi, D. Culibrk, and N. Sebe, “Self paced deep learning for weakly supervised object detection,” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 3, pp. 712–725, 2018.
- P. Soviany, R. T. Ionescu, P. Rota, and N. Sebe, “Curriculum self-paced learning for cross-domain object detection,” Computer Vision and Image Understanding, vol. 204, p. 103166, 2021.
- S. Shi, L. Jiang, J. Deng, Z. Wang, C. Guo, J. Shi, X. Wang, and H. Li, “Pv-rcnn++: Point-voxel feature set abstraction with local vector representation for 3d object detection,” arXiv preprint arXiv:2102.00463, 2021.
- L. Du, X. Ye, X. Tan, E. Johns, B. Chen, E. Ding, X. Xue, and J. Feng, “Ago-net: Association-guided 3d point cloud object detection network,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- X. Dong, L. Zheng, F. Ma, Y. Yang, and D. Meng, “Few-example object detection with model communication,” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 7, pp. 1641–1654, 2018.
- Y. Wei, S. Su, J. Lu, and J. Zhou, “Fgr: Frustum-aware geometric reasoning for weakly supervised 3d vehicle detection,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 4348–4354.
- Z. Qin, J. Wang, and Y. Lu, “Weakly supervised 3d object detection from point clouds,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 4144–4152.
- N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, “Soft-nms–improving object detection with one line of code,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 5561–5569.
- X. Wang, R. Zhang, T. Kong, L. Li, and C. Shen, “Solov2: Dynamic and fast instance segmentation,” Advances in Neural information processing systems, vol. 33, pp. 17 721–17 732, 2020.
- T. Yin, X. Zhou, and P. Krahenbuhl, “Center-based 3d object detection and tracking,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 11 784–11 793.
- C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, “Frustum pointnets for 3d object detection from rgb-d data,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 918–927.
- H. Wang, C. Shi, S. Shi, M. Lei, S. Wang, D. He, B. Schiele, and L. Wang, “Dsvt: Dynamic sparse voxel transformer with rotated sets,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 13 520–13 529.
- J. Park, C. Xu, Y. Zhou, M. Tomizuka, and W. Zhan, “Detmatch: Two teachers are better than one for joint 2d and 3d semi-supervised object detection,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part X. Springer, 2022, pp. 370–389.
- C. Liu, C. Gao, F. Liu, P. Li, D. Meng, and X. Gao, “Hierarchical supervision and shuffle data augmentation for 3d semi-supervised object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 23 819–23 828.
- J. Yin, J. Fang, D. Zhou, L. Zhang, C.-Z. Xu, J. Shen, and W. Wang, “Semi-supervised 3d object detection with proficient teachers,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVIII. Springer, 2022, pp. 727–743.
- Chenqiang Gao (21 papers)
- Chuandong Liu (4 papers)
- Jun Shu (19 papers)
- Fangcen Liu (10 papers)
- Jiang Liu (143 papers)
- Luyu Yang (8 papers)
- Xinbo Gao (194 papers)
- Deyu Meng (182 papers)