DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection (2312.15742v1)
Abstract: Vehicle-to-Everything (V2X) collaborative perception has recently gained significant attention due to its capability to enhance scene understanding by integrating information from various agents, e.g., vehicles, and infrastructure. However, current works often treat the information from each agent equally, ignoring the inherent domain gap caused by the utilization of different LiDAR sensors of each agent, thus leading to suboptimal performance. In this paper, we propose DI-V2X, that aims to learn Domain-Invariant representations through a new distillation framework to mitigate the domain discrepancy in the context of V2X 3D object detection. DI-V2X comprises three essential components: a domain-mixing instance augmentation (DMA) module, a progressive domain-invariant distillation (PDD) module, and a domain-adaptive fusion (DAF) module. Specifically, DMA builds a domain-mixing 3D instance bank for the teacher and student models during training, resulting in aligned data representation. Next, PDD encourages the student models from different domains to gradually learn a domain-invariant feature representation towards the teacher, where the overlapping regions between agents are employed as guidance to facilitate the distillation process. Furthermore, DAF closes the domain gap between the students by incorporating calibration-aware domain-adaptive attention. Extensive experiments on the challenging DAIR-V2X and V2XSet benchmark datasets demonstrate DI-V2X achieves remarkable performance, outperforming all the previous V2X models. Code is available at https://github.com/Serenos/DI-V2X
- Domain Adaptation in LiDAR Semantic Segmentation by Aligning Class Distributions. arXiv:2010.12239.
- F-Cooper: Feature Based Cooperative Perception for Autonomous Vehicle Edge Computing System Using 3D Point Clouds. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing.
- Cooper: Cooperative perception for connected autonomous vehicles based on 3d point clouds. In 2019 IEEE 39th International Conference on Distributed Computing Systems.
- PointMixup: Augmentation for Point Clouds. In ECCV.
- Where2comm: Communication-efficient collaborative perception via spatial confidence maps. NeurIPS.
- PointPillars: Fast Encoders for Object Detection From Point Clouds. In CVPR.
- Domain Transfer for Semantic Segmentation of LiDAR Data using Deep Neural Networks. In IROS.
- Regularization Strategy for Point Cloud via Rigidly Mixed Sample. In CVPR.
- Learning Distilled Collaboration Graph for Multi-Agent Perception. In NeurIPS.
- When2com: Multi-Agent Perception via Communication Graph Grouping. In CVPR.
- Who2com: Collaborative Perception via Learnable Handshake Communication. In ICRA.
- Robust collaborative 3d object detection in presence of pose errors. In ICRA.
- Mix3D: Out-of-Context Data Augmentation for 3D Scenes. In 2021 International Conference on 3D Vision (3DV).
- Instant Domain Augmentation for LiDAR Semantic Segmentation. In CVPR.
- A Survey on Deep Domain Adaptation for LiDAR Perception. In 2021 IEEE Intelligent Vehicles Symposium Workshops (IV Workshops).
- V2vnet: Vehicle-to-vehicle communication for joint perception and prediction. In ECCV.
- Model-agnostic multi-agent perception framework. In ICRA.
- Bridging the domain gap for multi-agent perception. In ICRA.
- V2x-vit: Vehicle-to-everything cooperative perception with vision transformer. In ECCV.
- Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In ICRA.
- SECOND: Sparsely Embedded Convolutional Detection. Sensors, 18(10): 3337.
- Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. In CVPR.
- V2X-Seq: A large-scale sequential dataset for vehicle-infrastructure cooperative perception and forecasting. In CVPR.
- mixup: Beyond Empirical Risk Minimization. In ICLR.