Improving Distant 3D Object Detection Using 2D Box Supervision (2403.09230v1)
Abstract: Improving the detection of distant 3d objects is an important yet challenging task. For camera-based 3D perception, the annotation of 3d bounding relies heavily on LiDAR for accurate depth information. As such, the distance of annotation is often limited due to the sparsity of LiDAR points on distant objects, which hampers the capability of existing detectors for long-range scenarios. We address this challenge by considering only 2D box supervision for distant objects since they are easy to annotate. We propose LR3D, a framework that learns to recover the missing depth of distant objects. LR3D adopts an implicit projection head to learn the generation of mapping between 2D boxes and depth using the 3D supervision on close objects. This mapping allows the depth estimation of distant objects conditioned on their 2D boxes, making long-range 3D detection with 2D supervision feasible. Experiments show that without distant 3D annotations, LR3D allows camera-based methods to detect distant objects (over 200m) with comparable accuracy to full 3D supervision. Our framework is general, and could widely benefit 3D detection methods to a large extent.
- FCOS3D: Fully convolutional one-stage monocular 3d object detection. In ICCV Workshops, 2021.
- Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. arXiv preprint arXiv:2203.17270, 2022.
- BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection. In AAAI, 2023.
- PETR: Position Embedding Transformation for Multi-View 3D Object Detection. In ECCV, 2022.
- BEVFormer v2: Adapting Modern Image Backbones to Bird’s-Eye-View Recognition via Perspective Supervision. In CVPR, 2023.
- Did-m3d: Decoupling instance depth for monocular 3d object detection. In ECCV, 2022.
- MV-FCOS3D++: Multi-View camera-only 4d object detection with pretrained monocular backbones. arXiv preprint, 2022.
- Braking distance. https://en.wikipedia.org/wiki/Braking_distance.
- R4D: utilizing reference objects for long-range distance estimation. In ICLR, 2022.
- Federal Motor Carrier Safety Administration. Long stopping distances, Federal Motor Carrier Safety Administration. https://www.fmcsa.dot.gov/ourroads/long-stopping-distances, 2016. Accessed: Sep-09-2021.
- Visual performance during nighttime driving in fog. Technical report, FHWA-HRT-04-137, 2005.
- Vision meets robotics: The KITTI dataset. I. J. Robotics Res., 2013.
- nuscenes: A multimodal dataset for autonomous driving. CVPR, 2020.
- Scalability in perception for autonomous driving: Waymo open dataset. In CVPR, 2020.
- Argoverse 2: Next generation datasets for self-driving perception and forecasting. In NeurIPS Datasets and Benchmarks 2021, 2021.
- Cityscapes 3d: Dataset and benchmark for 9 dof vehicle detection. arXiv preprint arXiv:2006.07864, 2020.
- Pointrcnn: 3d object proposal generation and detection from point cloud. In CVPR, 2019.
- Frustum pointnets for 3d object detection from RGB-D data. CVPR, 2018.
- 3dssd: Point-based 3d single stage object detector, 2020.
- STD: sparse-to-dense 3d object detector for point cloud. ICCV, 2019.
- IPOD: intensive point-based object detector for point cloud. CoRR, 2018.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In CVPR, 2017.
- Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In NIPS, 2017.
- Pointcnn: Convolution on x-transformed points. In NIPS, 2018.
- Point transformer. In ICCV, 2021.
- A unified query-based paradigm for point cloud understanding. In CVPR, 2022.
- CN: channel normalization for point cloud recognition. In ECCV, 2020.
- Center-based 3d object detection and tracking. CVPR, 2021.
- 3d-man: 3d multi-frame attention network for object detection. In CVPR, 2021.
- Pointpillars: Fast encoders for object detection from point clouds. CVPR, 2019.
- PV-RCNN: point-voxel feature set abstraction for 3d object detection. In CVPR, 2020.
- 3d semantic segmentation with submanifold sparse convolutional networks. In CVPR, 2018.
- Ben Graham. Sparse 3d convolutional neural networks. In BMVC, 2015.
- 4d spatio-temporal convnets: Minkowski convolutional neural networks. In CVPR, 2019.
- Second: Sparsely embedded convolutional detection. Sensors, 2018.
- PIXOR: real-time 3d object detection from point clouds. In CVPR, 2018.
- Self-supervised pre-training with masked shape prediction for 3d scene understanding. In CVPR, 2023.
- Range conditioned dilated convolutions for scale invariant 3d object detection. In Jens Kober, Fabio Ramos, and Claire J. Tomlin, editors, CoRL, 2020.
- RSN: range sparse net for efficient, accurate lidar 3d object detection. In CVPR, 2021.
- Vehicle detection from 3d lidar using fully convolutional network. arXiv preprint arXiv:1608.07916, 2016.
- Rangedet: In defense of range view for lidar-based 3d object detection. In ICCV, 2021.
- To the point: Efficient 3d object detection in the range image with graph convolution kernels. In CVPR, 2021.
- Deep residual learning for image recognition. In CVPR, 2016.
- Probabilistic and Geometric Depth: Detecting objects in perspective. In CoRL, 2021.
- M3D-RPN: monocular 3d region proposal network for object detection. In ICCV, 2019.
- Disentangling monocular 3d object detection. In ICCV, 2019.
- Single-shot 3d detection of vehicles from monocular RGB images via geometry constrained keypoints in real-time. arXiv preprint arXiv:2006.13084, 2020.
- Dsgn: Deep stereo geometry network for 3d object detection. CVPR, 2020.
- Liga-stereo: Learning lidar geometry aware representations for stereo-based 3d detector. In ICCV, 2021.
- Point-based multi-view stereo network. In ICCV, 2019.
- Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. arXiv, 2022.
- Imvoxelnet: Image to voxels projection for monocular and multi-view general-purpose 3d object detection. In WACV, 2022.
- Categorical depth distributionnetwork for monocular 3d object detection. CVPR, 2021.
- Unifying Voxel-based Representation with Transformer for 3D Object Detection. In NeurIPS, 2022.
- Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In ECCV, 2020.
- Visual point cloud forecasting enables scalable autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024.
- Fully sparse 3d panoptic occupancy prediction, 2023.
- Far3d: Expanding the horizon for surround-view 3d object detection. arXiv preprint arXiv:2308.09616, 2023.
- Vision-based detection and distance estimation of micro unmanned aerial vehicles. Sensors, 2015.
- DisNet: A novel method for distance estimation from monocular camera. IROS, 2018.
- Jing Zhu and Yi Fang. Learning object-specific distance from a monocular image. In ICCV, 2019.
- Ross B. Girshick. Fast R-CNN. In ICCV, 2015.
- Stereo r-cnn based 3d object detection for autonomous driving. In CVPR, 2019.
- Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, NIPS, 2017.
- Objects are different: Flexible monocular 3d object detection. In CVPR, 2021.
- Geometry uncertainty projection network for monocular 3d object detection. arXiv preprint arXiv:2107.13774, 2021.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Vision transformers for dense prediction. ArXiv preprint, 2021.
- Voxelnet: End-to-end learning for point cloud based 3d object detection. CVPR, 2018.
- MMDetection3D Contributors. MMDetection3D: OpenMMLab next-generation platform for general 3D object detection. https://github.com/open-mmlab/mmdetection3d, 2020.
- Multi-modality cut and paste for 3d object detection. arXiv preprint arXiv:2012.12741, 2020.
- Pointaugmenting: Cross-modal augmentation for 3d object detection. In CVPR, 2021.
- Zetong Yang (14 papers)
- Zhiding Yu (94 papers)
- Chris Choy (5 papers)
- Renhao Wang (14 papers)
- Anima Anandkumar (236 papers)
- Jose M. Alvarez (90 papers)