Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception (2403.10068v1)

Published 15 Mar 2024 in cs.CV and cs.MA

Abstract: Multi-agent perception (MAP) allows autonomous systems to understand complex environments by interpreting data from multiple sources. This paper investigates intermediate collaboration for MAP with a specific focus on exploring "good" properties of collaborative view (i.e., post-collaboration feature) and its underlying relationship to individual views (i.e., pre-collaboration features), which were treated as an opaque procedure by most existing works. We propose a novel framework named CMiMC (Contrastive Mutual Information Maximization for Collaborative Perception) for intermediate collaboration. The core philosophy of CMiMC is to preserve discriminative information of individual views in the collaborative view by maximizing mutual information between pre- and post-collaboration features while enhancing the efficacy of collaborative views by minimizing the loss function of downstream tasks. In particular, we define multi-view mutual information (MVMI) for intermediate collaboration that evaluates correlations between collaborative views and individual views on both global and local scales. We establish CMiMNet based on multi-view contrastive learning to realize estimation and maximization of MVMI, which assists the training of a collaboration encoder for voxel-level feature fusion. We evaluate CMiMC on V2X-Sim 1.0, and it improves the SOTA average precision by 3.08% and 4.44% at 0.5 and 0.7 IoU (Intersection-over-Union) thresholds, respectively. In addition, CMiMC can reduce communication volume to 1/32 while achieving performance comparable to SOTA. Code and Appendix are released at https://github.com/77SWF/CMiMC.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Cooperative perception for 3D object detection in driving scenarios using infrastructure sensors. IEEE Transactions on Intelligent Transportation Systems, 23(3): 1852–1864.
  2. Learning representations by maximizing mutual information across views. Advances in Neural Information Processing Systems, 32.
  3. Mutual information neural estimation. In International Conference on Machine Learning, 531–540. PMLR.
  4. Cooper: Cooperative perception for connected autonomous vehicles based on 3d point clouds. In International Conference on Distributed Computing Systems, 514–524. IEEE.
  5. Multi-view 3d object detection network for autonomous driving. In IEEE Conference on Computer Vision and Pattern Recognition, 1907–1915.
  6. Coopernaut: End-to-end driving with cooperative perception for networked vehicles. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 17252–17262.
  7. Estimation of the information by an adaptive partitioning of the observation space. IEEE Transactions on Information Theory, 45(4): 1315–1321.
  8. CARLA: An open urban driving simulator. In Annual Conference on Robot Learning, 1–16. PMLR.
  9. Statistical outlier detection using direct density ratio estimation. Knowledge and information systems, 26: 309–336.
  10. Learning deep representations by mutual information estimation and maximization. In International Conference for Learning Representations.
  11. Where2comm: Communication-efficient collaborative perception via spatial confidence maps. Advances in Neural Information Processing Systems, 35: 4874–4886.
  12. V2X-Sim: Multi-agent collaborative perception dataset and benchmark for autonomous driving. IEEE Robotics and Automation Letters, 7(4): 10914–10921.
  13. Learning distilled collaboration graph for multi-agent perception. Advances in Neural Information Processing Systems, 34: 29541–29552.
  14. Linsker, R. 1988. Self-organization in a perceptual network. Computer, 21(3): 105–117.
  15. When2com: Multi-agent perception via communication graph grouping. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4106–4115.
  16. Who2com: Collaborative perception via learnable handshake communication. In IEEE International Conference on Robotics and Automation, 6876–6883. IEEE.
  17. Complementarity-enhanced and redundancy-minimized collaboration network for multi-agent perception. In ACM International Conference on Multimedia, 3578–3586.
  18. Cooperative perception and localization for cooperative driving. In IEEE International Conference on Robotics and Automation, 1256–1262. IEEE.
  19. f-gan: Training generative neural samplers using variational divergence minimization. Advances in Neural Information Processing Systems, 29.
  20. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748.
  21. On variational bounds of mutual information. In International Conference on Machine Learning, 5171–5180. PMLR.
  22. Sanghi, A. 2020. Info3d: Representation learning on 3d objects using mutual information maximization and contrastive learning. In European Conference on Computer Vision, 626–642. Springer.
  23. Silverman, B. W. 1981. Using kernel density estimates to investigate multimodality. Journal of the Royal Statistical Society: Series B (Methodological), 43(1): 97–99.
  24. Contrastive multiview coding. In European Conference on Computer Vision, 776–794. Springer.
  25. Core: Cooperative reconstruction for multi-agent perception. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 8710–8720.
  26. V2vnet: Vehicle-to-vehicle communication for joint perception and prediction. In European Conference on Computer Vision, 605–621. Springer.
  27. Bridging the domain gap for multi-agent perception. In International Conference on Robotics and Automation (ICRA), 6035–6042. IEEE.
  28. V2v4real: A real-world large-scale dataset for vehicle-to-vehicle cooperative perception. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13712–13722.
  29. V2x-vit: Vehicle-to-everything cooperative perception with vision transformer. In European Conference on Computer Vision, 107–124. Springer.
  30. Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 21361–21370.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Wanfang Su (1 paper)
  2. Lixing Chen (26 papers)
  3. Yang Bai (205 papers)
  4. Xi Lin (135 papers)
  5. Gaolei Li (29 papers)
  6. Zhe Qu (46 papers)
  7. Pan Zhou (220 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.