Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection (2312.15742v1)

Published 25 Dec 2023 in cs.CV and cs.AI

Abstract: Vehicle-to-Everything (V2X) collaborative perception has recently gained significant attention due to its capability to enhance scene understanding by integrating information from various agents, e.g., vehicles, and infrastructure. However, current works often treat the information from each agent equally, ignoring the inherent domain gap caused by the utilization of different LiDAR sensors of each agent, thus leading to suboptimal performance. In this paper, we propose DI-V2X, that aims to learn Domain-Invariant representations through a new distillation framework to mitigate the domain discrepancy in the context of V2X 3D object detection. DI-V2X comprises three essential components: a domain-mixing instance augmentation (DMA) module, a progressive domain-invariant distillation (PDD) module, and a domain-adaptive fusion (DAF) module. Specifically, DMA builds a domain-mixing 3D instance bank for the teacher and student models during training, resulting in aligned data representation. Next, PDD encourages the student models from different domains to gradually learn a domain-invariant feature representation towards the teacher, where the overlapping regions between agents are employed as guidance to facilitate the distillation process. Furthermore, DAF closes the domain gap between the students by incorporating calibration-aware domain-adaptive attention. Extensive experiments on the challenging DAIR-V2X and V2XSet benchmark datasets demonstrate DI-V2X achieves remarkable performance, outperforming all the previous V2X models. Code is available at https://github.com/Serenos/DI-V2X

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. Domain Adaptation in LiDAR Semantic Segmentation by Aligning Class Distributions. arXiv:2010.12239.
  2. F-Cooper: Feature Based Cooperative Perception for Autonomous Vehicle Edge Computing System Using 3D Point Clouds. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing.
  3. Cooper: Cooperative perception for connected autonomous vehicles based on 3d point clouds. In 2019 IEEE 39th International Conference on Distributed Computing Systems.
  4. PointMixup: Augmentation for Point Clouds. In ECCV.
  5. Where2comm: Communication-efficient collaborative perception via spatial confidence maps. NeurIPS.
  6. PointPillars: Fast Encoders for Object Detection From Point Clouds. In CVPR.
  7. Domain Transfer for Semantic Segmentation of LiDAR Data using Deep Neural Networks. In IROS.
  8. Regularization Strategy for Point Cloud via Rigidly Mixed Sample. In CVPR.
  9. Learning Distilled Collaboration Graph for Multi-Agent Perception. In NeurIPS.
  10. When2com: Multi-Agent Perception via Communication Graph Grouping. In CVPR.
  11. Who2com: Collaborative Perception via Learnable Handshake Communication. In ICRA.
  12. Robust collaborative 3d object detection in presence of pose errors. In ICRA.
  13. Mix3D: Out-of-Context Data Augmentation for 3D Scenes. In 2021 International Conference on 3D Vision (3DV).
  14. Instant Domain Augmentation for LiDAR Semantic Segmentation. In CVPR.
  15. A Survey on Deep Domain Adaptation for LiDAR Perception. In 2021 IEEE Intelligent Vehicles Symposium Workshops (IV Workshops).
  16. V2vnet: Vehicle-to-vehicle communication for joint perception and prediction. In ECCV.
  17. Model-agnostic multi-agent perception framework. In ICRA.
  18. Bridging the domain gap for multi-agent perception. In ICRA.
  19. V2x-vit: Vehicle-to-everything cooperative perception with vision transformer. In ECCV.
  20. Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In ICRA.
  21. SECOND: Sparsely Embedded Convolutional Detection. Sensors, 18(10): 3337.
  22. Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. In CVPR.
  23. V2X-Seq: A large-scale sequential dataset for vehicle-infrastructure cooperative perception and forecasting. In CVPR.
  24. mixup: Beyond Empirical Risk Minimization. In ICLR.
Citations (12)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com