Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Point-DETR3D: Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection (2403.15317v2)

Published 22 Mar 2024 in cs.CV and cs.AI

Abstract: Training high-accuracy 3D detectors necessitates massive labeled 3D annotations with 7 degree-of-freedom, which is laborious and time-consuming. Therefore, the form of point annotations is proposed to offer significant prospects for practical applications in 3D detection, which is not only more accessible and less expensive but also provides strong spatial information for object localization. In this paper, we empirically discover that it is non-trivial to merely adapt Point-DETR to its 3D form, encountering two main bottlenecks: 1) it fails to encode strong 3D prior into the model, and 2) it generates low-quality pseudo labels in distant regions due to the extreme sparsity of LiDAR points. To overcome these challenges, we introduce Point-DETR3D, a teacher-student framework for weakly semi-supervised 3D detection, designed to fully capitalize on point-wise supervision within a constrained instance-wise annotation budget.Different from Point-DETR which encodes 3D positional information solely through a point encoder, we propose an explicit positional query initialization strategy to enhance the positional prior. Considering the low quality of pseudo labels at distant regions produced by the teacher model, we enhance the detector's perception by incorporating dense imagery data through a novel Cross-Modal Deformable RoI Fusion (D-RoI).Moreover, an innovative point-guided self-supervised learning technique is proposed to allow for fully exploiting point priors, even in student models.Extensive experiments on representative nuScenes dataset demonstrate our Point-DETR3D obtains significant improvements compared to previous works. Notably, with only 5% of labeled data, Point-DETR3D achieves over 90% performance of its fully supervised counterpart.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers.
  2. Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 1090–1099.
  3. Weakly supervised deep detection networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2846–2854.
  4. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11621–11631.
  5. Points as queries: Weakly semi-supervised object detection by points. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8823–8832.
  6. A simple framework for contrastive learning of visual representations. In International conference on machine learning, 1597–1607. PMLR.
  7. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15750–15758.
  8. Autoalignv2: Deformable feature aggregation for dynamic multi-modal 3d object detection. arXiv preprint arXiv:2207.10316.
  9. Graph-DETR3D: rethinking overlapping regions for multi-view 3D object detection. In Proceedings of the 30th ACM International Conference on Multimedia, 5999–6008.
  10. Autoalign: Pixel-instance feature aggregation for multi-modal 3d object detection. arXiv preprint arXiv:2201.06493.
  11. Embracing Single Stride 3D Object Detector with Sparse Transformer. arXiv e-prints.
  12. Point-teaching: weakly semi-supervised object detection with point annotations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 667–675.
  13. The Devil is in the Points: Weakly Semi-Supervised Instance Segmentation via Point-Guided Mask Representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11360–11370.
  14. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 12697–12705.
  15. Ficklenet: Weakly and semi-supervised semantic image segmentation using stochastic inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5267–5276.
  16. Dn-detr: Accelerate detr training by introducing query denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13619–13627.
  17. Weakly-and semi-supervised panoptic segmentation. In Proceedings of the European conference on computer vision (ECCV), 102–118.
  18. Exploring geometry-aware contrast and clustering harmonization for self-supervised 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 3293–3302.
  19. Bevfusion: A simple and robust lidar-camera fusion framework. Advances in Neural Information Processing Systems, 35: 10421–10434.
  20. Weakly supervised 3d object detection from lidar point cloud. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII, 515–531. Springer.
  21. Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In Proceedings of the IEEE international conference on computer vision, 1742–1750.
  22. Detmatch: Two teachers are better than one for joint 2d and 3d semi-supervised object detection. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part X, 370–389. Springer.
  23. Weakm3d: Towards weakly supervised monocular 3d object detection. arXiv preprint arXiv:2203.08332.
  24. Crowdsourcing annotations for visual object detection. In Workshops at the twenty-sixth AAAI conference on artificial intelligence. Citeseer.
  25. Multiple instance detection network with online instance classifier refinement. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2843–2851.
  26. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems, 30.
  27. Attention is all you need. Advances in neural information processing systems, 30.
  28. 3dioumatch: Leveraging iou prediction for semi-supervised 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14615–14624.
  29. Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In Conference on Robot Learning, 180–191. PMLR.
  30. Object dgcnn: 3d object detection using dynamic graphs. Advances in Neural Information Processing Systems, 34: 20745–20758.
  31. SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection. arXiv preprint arXiv:2304.14340.
  32. Back to reality: Weakly-supervised 3d object detection with shape-guided label enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8438–8447.
  33. Towards precise end-to-end weakly supervised object detection network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 8372–8381.
  34. Deepinteraction: 3d object detection via modality interaction. Advances in Neural Information Processing Systems, 35: 1992–2005.
  35. Semi-supervised 3D object detection with proficient teachers. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVIII, 727–743. Springer.
  36. Center-based 3d object detection and tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11784–11793.
  37. Group R-CNN for weakly semi-supervised object detection with points. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9417–9426.
  38. Improving Data Augmentation for Multi-Modality 3D Object Detection. In International Conference on Learning Representations Workshop on Scene Representations for Autonomous Driving, 1–10.
  39. Sess: Self-ensembling semi-supervised 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11079–11087.
  40. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4490–4499.

Summary

We haven't generated a summary for this paper yet.