Towards Generalizable Multi-Camera 3D Object Detection via Perspective Debiasing (2310.11346v3)
Abstract: Detecting objects in 3D space using multiple cameras, known as Multi-Camera 3D Object Detection (MC3D-Det), has gained prominence with the advent of bird's-eye view (BEV) approaches. However, these methods often struggle when faced with unfamiliar testing environments due to the lack of diverse training data encompassing various viewpoints and environments. To address this, we propose a novel method that aligns 3D detection with 2D camera plane results, ensuring consistent and accurate detections. Our framework, anchored in perspective debiasing, helps the learning of features resilient to domain shifts. In our approach, we render diverse view maps from BEV features and rectify the perspective bias of these maps, leveraging implicit foreground volumes to bridge the camera and BEV planes. This two-step process promotes the learning of perspective- and context-independent features, crucial for accurate object detection across varying viewpoints, camera parameters, and environmental conditions. Notably, our model-agnostic approach preserves the original network structure without incurring additional inference costs, facilitating seamless integration across various models and simplifying deployment. Furthermore, we also show our approach achieves satisfactory results in real data when trained only with virtual datasets, eliminating the need for real scene annotations. Experimental results on both Domain Generalization (DG) and Unsupervised Domain Adaptation (UDA) clearly demonstrate its effectiveness. The codes are available at https://github.com/EnVision-Research/Generalizable-BEV.
- nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11621–11631, 2020.
- Domain adaptive faster r-cnn for object detection in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3339–3348, 2018.
- Domain generalization via model-agnostic learning of semantic features. Advances in Neural Information Processing Systems, 32, 2019.
- Cam-convs: Camera-aware multi-scale convolutions for single-view depth. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11826–11835, 2019.
- Unsupervised domain adaptation by backpropagation. In ICML, pp. 1180–1189. PMLR, 2015.
- Domain adaptive object detection via asymmetric tri-way faster-rcnn. In European conference on computer vision, pp. 309–324. Springer, 2020.
- Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. arXiv preprint arXiv:2112.11790, 2021.
- Driveadapter: Breaking the coupling barrier of perception and planning in end-to-end autonomous driving. In Proceedings of the IEEE international conference on computer vision, 2023.
- Level 5 perception dataset 2020. https://level-5.global/level5/data/, 2019.
- Crash to not crash: Learn to identify dangerous vehicles using a simulator. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 2583–2589, 2022. doi: 10.1109/ICRA46639.2022.9812038.
- Domain generalization with adversarial feature learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5400–5409, 2018.
- Delving into the devils of bird’s-eye-view perception: A review, evaluation and recipe. arXiv preprint arXiv:2209.05324, 2022a.
- V2x-sim: Multi-agent collaborative perception dataset and benchmark for autonomous driving. IEEE Robotics and Automation Letters, 7(4):10914–10921, 2022b. doi: 10.1109/LRA.2022.3192802.
- Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. Proceedings of the AAAI Conference on Artificial Intelligence, 2023a.
- Unsupervised domain adaptation for monocular 3d object detection via self-training. In European conference on computer vision, pp. 245–262. Springer, 2022c.
- Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. European conference on computer vision, 2022d.
- Fb-occ: Forward-backward view transformations for occupancy prediction. In Proceedings of the IEEE international conference on computer vision, 2023b.
- Semi-supervised monocular 3d object detection by multi-view consistency. In European Conference on Computer Vision, pp. 715–731. Springer, 2022.
- Focal loss for dense object detection. pp. 2980–2988, 2017.
- Petr: Position embedding transformation for multi-view 3d object detection. arXiv preprint arXiv:2203.05625, 2022.
- Petrv2: A unified framework for 3d perception from multi-camera images. Proceedings of the IEEE international conference on computer vision, 2023.
- Fixing weight decay regularization in adam. 2018.
- Vision-centric bev perception: A survey. arXiv preprint arXiv:2208.02797, 2022.
- Domain generalization via invariant feature representation. In International Conference on Machine Learning, pp. 10–18. PMLR, 2013.
- Is pseudo-lidar needed for monocular 3d object detection? In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3142–3152, 2021.
- Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In European Conference on Computer Vision, pp. 194–210. Springer, 2020.
- Orthographic feature transform for monocular 3d object detection. In BMVC, 2019.
- Deep coral: Correlation alignment for deep domain adaptation. In ECCV, pp. 443–450. Springer, 2016.
- SHIFT: a synthetic driving dataset for continuous multi-task domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 21371–21382, June 2022.
- Towards domain generalization for multi-view 3d object detection in bird-eye-view. pp. 13333–13342, June 2023a.
- Deepaccident: A motion and accident prediction benchmark for v2x autonomous driving. arXiv preprint arXiv:2304.01168, 2023b.
- Deepaccident: A motion and accident prediction benchmark for v2x autonomous driving. arXiv preprint arXiv:2304.01168, 2023c.
- Ssda3d: Semi-supervised domain adaptation for 3d object detection from point cloud. Proceedings of the AAAI Conference on Artificial Intelligence, 2023d.
- Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In Conference on Robot Learning, pp. 180–191. PMLR, 2022.
- Object as query: Lifting any 2d object detector to 3d detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3791–3800, 2023e.
- Exploring categorical regularization for domain adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11724–11733, 2020.
- Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In 2022 International Conference on Robotics and Automation, pp. 2583–2589, 2022. doi: 10.1109/ICRA46639.2022.9812038.
- V2v4real: A real-world large-scale dataset for vehicle-to-vehicle cooperative perception. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13712–13722, June 2023.
- Bevformer v2: Adapting modern image backbones to bird’s-eye-view recognition via perspective supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17830–17839, 2023.
- St3d: Self-training for unsupervised domain adaptation on 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10368–10378, June 2021.
- Towards 3d object detection with 2d supervision. arXiv preprint arXiv:2211.08287, 2022.
- Center-based 3d object detection and tracking. pp. 11784–11793, 2021.
- Bi3d: Bi-domain active learning for cross-domain 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15599–15608, June 2023.
- Collaborative training between region proposal localization and classification for domain adaptive object detection. In European Conference on Computer Vision, pp. 86–102. Springer, 2020.
- Hao Lu (99 papers)
- Yunpeng Zhang (30 papers)
- Qing Lian (19 papers)
- Dalong Du (32 papers)
- Yingcong Chen (35 papers)