CMDA: Cross-Modal and Domain Adversarial Adaptation for LiDAR-Based 3D Object Detection (2403.03721v2)
Abstract: Recent LiDAR-based 3D Object Detection (3DOD) methods show promising results, but they often do not generalize well to target domains outside the source (or training) data distribution. To reduce such domain gaps and thus to make 3DOD models more generalizable, we introduce a novel unsupervised domain adaptation (UDA) method, called CMDA, which (i) leverages visual semantic cues from an image modality (i.e., camera images) as an effective semantic bridge to close the domain gap in the cross-modal Bird's Eye View (BEV) representations. Further, (ii) we also introduce a self-training-based learning strategy, wherein a model is adversarially trained to generate domain-invariant features, which disrupt the discrimination of whether a feature instance comes from a source or an unseen target domain. Overall, our CMDA framework guides the 3DOD model to generate highly informative and domain-adaptive features for novel data distributions. In our extensive experiments with large-scale benchmarks, such as nuScenes, Waymo, and KITTI, those mentioned above provide significant performance gains for UDA tasks, achieving state-of-the-art performance.
- Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1090–1099.
- nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11621–11631.
- Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 1907–1915.
- Focal Sparse Convolutional Networks for 3D Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
- Voxel r-cnn: Towards high performance voxel-based 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 1201–1209.
- Domain-adversarial training of neural networks. The journal of machine learning research, 17(1): 2096–2030.
- Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Conference on Computer Vision and Pattern Recognition (CVPR).
- Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 17556–17566.
- Joint 3d proposal generation and object detection from view aggregation. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 1–8. IEEE.
- Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 12697–12705.
- Bevfusion: A simple and robust lidar-camera fusion framework. Advances in Neural Information Processing Systems, 35: 10421–10434.
- Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), 2774–2781. IEEE.
- Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, 194–210. Springer.
- 3D object detection for autonomous driving: A survey. Pattern Recognition, 130: 108796.
- Ora3d: Overlap region aware multi-view 3d object detection. arXiv preprint arXiv:2207.00865.
- Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10529–10538.
- Pointrcnn: 3d object proposal generation and detection from point cloud. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 770–779.
- Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2446–2454.
- Visualizing High-Dimensional Data Using t-629 SNE. Journal of Machine Learning Research, 9(2579-2605): 630.
- Pointpainting: Sequential fusion for 3d object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 4604–4612.
- Pointaugmenting: Cross-modal augmentation for 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11794–11803.
- Train in germany, test in the usa: Making 3d object detectors generalize. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11713–11723.
- Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In Conference on Robot Learning, 180–191. PMLR.
- Lidar distillation: Bridging the beam-induced domain gap for 3d object detection. arXiv preprint arXiv:2203.14956.
- PolarMix: A General Data Augmentation Technique for LiDAR Point Clouds. arXiv preprint arXiv:2208.00223.
- Self-training with noisy student improves imagenet classification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 10687–10698.
- Spg: Unsupervised domain adaptation for 3d object detection via semantic point generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 15446–15456.
- Second: Sparsely embedded convolutional detection. Sensors, 18(10): 3337.
- Hdnet: Exploiting hd maps for 3d object detection. In Conference on Robot Learning, 146–155. PMLR.
- St3d: Self-training for unsupervised domain adaptation on 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10368–10378.
- ST3D++: Denoised Self-Training for Unsupervised Domain Adaptation on 3D Object Detection. IEEE transactions on pattern analysis and machine intelligence, 45(5): 6354–6371.
- 3dssd: Point-based 3d single stage object detector. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11040–11048.
- Std: Sparse-to-dense 3d object detector for point cloud. In Proceedings of the IEEE/CVF international conference on computer vision, 1951–1960.
- Learning transferable features for point cloud detection via 3d contrastive co-training. Advances in Neural Information Processing Systems, 34: 21493–21504.
- Center-based 3d object detection and tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11784–11793.
- CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 908–917.
- Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4490–4499.
- Single image 3D object detection and pose estimation for grasping. In 2014 IEEE International Conference on Robotics and Automation (ICRA), 3936–3943. IEEE.
- Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In Proceedings of the European conference on computer vision (ECCV), 289–305.
- Gyusam Chang (4 papers)
- Wonseok Roh (4 papers)
- Sujin Jang (6 papers)
- Dongwook Lee (32 papers)
- Daehyun Ji (4 papers)
- Gyeongrok Oh (7 papers)
- Jinsun Park (11 papers)
- Jinkyu Kim (51 papers)
- Sangpil Kim (35 papers)