Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pragmatic Communication in Multi-Agent Collaborative Perception (2401.12694v1)

Published 23 Jan 2024 in cs.CV

Abstract: Collaborative perception allows each agent to enhance its perceptual abilities by exchanging messages with others. It inherently results in a trade-off between perception ability and communication costs. Previous works transmit complete full-frame high-dimensional feature maps among agents, resulting in substantial communication costs. To promote communication efficiency, we propose only transmitting the information needed for the collaborator's downstream task. This pragmatic communication strategy focuses on three key aspects: i) pragmatic message selection, which selects task-critical parts from the complete data, resulting in spatially and temporally sparse feature vectors; ii) pragmatic message representation, which achieves pragmatic approximation of high-dimensional feature vectors with a task-adaptive dictionary, enabling communicating with integer indices; iii) pragmatic collaborator selection, which identifies beneficial collaborators, pruning unnecessary communication links. Following this strategy, we first formulate a mathematical optimization framework for the perception-communication trade-off and then propose PragComm, a multi-agent collaborative perception system with two key components: i) single-agent detection and tracking and ii) pragmatic collaboration. The proposed PragComm promotes pragmatic communication and adapts to a wide range of communication conditions. We evaluate PragComm for both collaborative 3D object detection and tracking tasks in both real-world, V2V4Real, and simulation datasets, OPV2V and V2X-SIM2.0. PragComm consistently outperforms previous methods with more than 32.7K times lower communication volume on OPV2V. Code is available at github.com/PhyllisH/PragComm.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. T.-H. Wang, S. Manivasagam, M. Liang, B. Yang, W. Zeng, and R. Urtasun, “V2vnet: Vehicle-to-vehicle communication for joint perception and prediction,” in European Conference on Computer Vision.   Springer, 2020, pp. 605–621.
  2. Y. Li, S. Ren, P. Wu, S. Chen, C. Feng, and W. Zhang, “Learning distilled collaboration graph for multi-agent perception,” Advances in Neural Information Processing Systems, vol. 34, 2021.
  3. S. Chen, B. Liu, C. Feng, C. Vallespi-Gonzalez, and C. K. Wellington, “3d point cloud processing and learning for autonomous driving: Impacting map creation, localization, and perception,” IEEE Signal Processing Magazine, vol. 38, pp. 68–86, 2021.
  4. Z. Li, A. V. Barenji, J. Jiang, R. Y. Zhong, and G. Xu, “A mechanism for scheduling multi robot intelligent warehouse system face with dynamic demand,” Journal of Intelligent Manufacturing, vol. 31, no. 2, pp. 469–480, 2020.
  5. M. Zaccaria, M. Giorgini, R. Monica, and J. Aleotti, “Multi-robot multiple camera people detection and tracking in automated warehouses,” in 2021 IEEE 19th International Conference on Industrial Informatics (INDIN).   IEEE, 2021, pp. 1–6.
  6. J. Scherer, S. Yahyanejad, S. Hayat, E. Yanmaz, T. Andre, A. Khan, V. Vukadinovic, C. Bettstetter, H. Hellwagner, and B. Rinner, “An autonomous multi-uav system for search and rescue,” in Proceedings of the First Workshop on Micro Aerial Vehicle Networks, Systems, and Applications for Civilian Use, 2015, pp. 33–38.
  7. E. T. Alotaibi, S. S. Alqefari, and A. Koubaa, “Lsar: Multi-uav collaboration for search and rescue missions,” IEEE Access, vol. 7, pp. 55 817–55 832, 2019.
  8. Y. Hu, S. Fang, W. Xie, and S. Chen, “Aerial monocular 3d object detection,” IEEE Robotics and Automation Letters, 2023.
  9. Y.-C. Liu, J. Tian, N. Glaser, and Z. Kira, “When2com: Multi-agent perception via communication graph grouping,” in Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 2020, pp. 4106–4115.
  10. Y.-C. Liu, J. Tian, C.-Y. Ma, N. Glaser, C.-W. Kuo, and Z. Kira, “Who2com: Collaborative perception via learnable handshake communication,” in IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 6876–6883.
  11. Y. Hu, S. Fang, Z. Lei, Y. Zhong, and S. Chen, “Where2comm: Communication-efficient collaborative perception via spatial confidence maps,” Advances in neural information processing systems, vol. 35, pp. 4874–4886, 2022.
  12. Y. Hu, Y. Lu, R. Xu, W. Xie, S. Chen, and Y. Wang, “Collaboration helps camera overtake lidar in 3d detection,” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  13. Y. Li, S. Ren, P. Wu, S. Chen, C. Feng, and W. Zhang, “Learning distilled collaboration graph for multi-agent perception,” Advances in Neural Information Processing Systems, vol. 34, pp. 29 541–29 552, 2021.
  14. C. de Alwis, A. Kalla, V. Q. Pham, P. Kumar, K. Dev, W.-J. Hwang, and M. Liyanage, “Survey on 6g frontiers: Trends, applications, requirements, technologies and future research,” IEEE Open Journal of the Communications Society, vol. 2, pp. 836–886, 2021.
  15. C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J., vol. 27, pp. 623–656, 1948.
  16. D. J. C. MacKay, “Information theory, inference, and learning algorithms,” IEEE Transactions on Information Theory, vol. 50, pp. 2544–2545, 2004.
  17. A. D. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,” IEEE Trans. Inf. Theory, vol. 22, pp. 1–10, 1976.
  18. T. M. Cover and J. A. Thomas, “Elements of information theory,” 2005.
  19. D. A. Huffman, “A method for the construction of minimum-redundancy codes,” Resonance, vol. 11, pp. 91–99, 1952.
  20. J. van Leeuwen, “On the construction of huffman trees,” in International Colloquium on Automata, Languages and Programming, 1976.
  21. D. Salomon, “Data compression: The complete reference, 3rd edition,” 2004.
  22. R. Xu, H. Xiang, X. Xia, X. Han, J. Liu, and J. Ma, “OPV2V: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication,” ICRA, 2022.
  23. R. Xu, H. Xiang, Z. Tu, X. Xia, M.-H. Yang, and J. Ma, “V2X-ViT: Vehicle-to-everything cooperative perception with vision transformer,” ECCV, 2022.
  24. Y. Lu, Q. Li, B. Liu, M. Dianat, C. Feng, S. Chen, and Y. Wang, “Robust collaborative 3d object detection in presence of pose errors,” IEEE International Conference on Robotics and Automation (ICRA), 2023.
  25. K. Yang, D. Yang, J. Zhang, M. Li, Y. Liu, J. Liu, H. Wang, P. Sun, and L. Song, “Spatio-temporal domain awareness for multi-agent collaborative perception,” Proceedings of the 31st ACM International Conference on Multimedia, 2023.
  26. K. Yang, D. Yang, J. Zhang, H. Wang, P. Sun, and L. Song, “What2comm: Towards communication-efficient collaborative perception via feature decoupling,” Proceedings of the 31st ACM International Conference on Multimedia, 2023.
  27. T. Wang, G. Chen, K. Chen, Z. Liu, B. Zhang, A. Knoll, and C. Jiang, “Umc: A unified bandwidth-efficient and multi-resolution based collaborative perception framework,” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  28. D. Yang, K. Yang, Y. Wang, J. Liu, Z. Xu, R. Yin, P. Zhai, and L. Zhang, “How2comm: Communication-efficient and collaboration-pragmatic multi-agent perception,” Advances in Neural Information Processing Systems, 2023.
  29. X. Zhang, H. Zhang, N. Glazer, O. Cohen, E. Reznitskiy, S. Savariego, M. Namer, and Y. C. Eldar, “Hardware implementation of task-based quantization in multiuser signal recovery,” IEEE Transactions on Industrial Electronics, 2023.
  30. F. Xi, N. Shlezinger, and Y. C. Eldar, “Bilimo: Bit-limited mimo radar via task-based quantization,” IEEE Transactions on Signal Processing, vol. 69, pp. 6267–6282, 2020.
  31. N. I. Bernardo, J. Zhu, Y. C. Eldar, and J. S. Evans, “Design and analysis of hardware-limited non-uniform task-based quantizers,” IEEE Transactions on Signal Processing, vol. 71, pp. 1551–1562, 2022.
  32. Y. Li, Z. An, Z. Wang, Y. Zhong, S. Chen, and C. Feng, “V2X-Sim: A virtual collaborative perception dataset for autonomous driving,” IEEE Robotics and Automation Letters, vol. 7, 2022.
  33. H. Yu, Y. Luo, M. Shu, Y. Huo, Z. Yang, Y. Shi, Z. Guo, H. Li, X. Hu, J. Yuan et al., “DAIR-V2X: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection,” In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition (CVPR), 2022.
  34. R. Xu, X. Xia, J. Li, H. Li, S. Zhang, Z. Tu, Z. Meng, H. Xiang, X. Dong, R. Song, H. Yu, B. Zhou, and J. Ma, “V2v4real: A real-world large-scale dataset for vehicle-to-vehicle cooperative perception,” in The IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2023.
  35. R. Xu, Z. Tu, H. Xiang, W. Shao, B. Zhou, and J. Ma, “CoBEVT: Cooperative bird’s eye view semantic segmentation with sparse transformers,” CoRL, 2022.
  36. Z. Lei, S. Ren, Y. Hu, W. Zhang, and S. Chen, “Latency-aware collaborative perception,” ECCV, 2022.
  37. Q. Chen, S. Tang, Q. Yang, and S. Fu, “Cooper: Cooperative perception for connected autonomous vehicles based on 3d point clouds,” 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), pp. 514–524, 2019.
  38. J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” arXiv preprint arXiv:1802.01436, 2018.
  39. J. Masci, U. Meier, D. Cirecsan, and J. Schmidhuber, “Stacked convolutional auto-encoders for hierarchical feature extraction,” in Artificial Neural Networks and Machine Learning–ICANN 2011: 21st International Conference on Artificial Neural Networks, Espoo, Finland, June 14-17, 2011, Proceedings, Part I 21.   Springer, 2011, pp. 52–59.
  40. S. Singh, S. Abu-El-Haija, N. Johnston, J. Ballé, A. Shrivastava, and G. Toderici, “End-to-end learning of compressible features,” in IEEE International Conference on Image Processing, 2020, pp. 3349–3353.
  41. Y. Dubois, B. Bloem-Reddy, K. Ullrich, and C. J. Maddison, “Lossy compression for lossless prediction,” Advances in Neural Information Processing Systems, vol. 34, pp. 14 014–14 028, 2021.
  42. S. Reddy, A. Dragan, and S. Levine, “Pragmatic image compression for human-in-the-loop decision-making,” Advances in Neural Information Processing Systems, vol. 34, pp. 26 499–26 510, 2021.
  43. Z. Qin, X. Tao, J. Lu, and G. Y. Li, “Semantic communications: Principles and challenges,” ArXiv, vol. abs/2201.01389, 2021.
  44. M. Kalfa, M. Gok, A. Atalik, B. Tegin, T. M. Duman, and O. Arikan, “Towards goal-oriented semantic signal processing: Applications and future challenges,” Digit. Signal Process., vol. 119, p. 103134, 2021.
  45. H. Xie, Z. Qin, and G. Y. Li, “Task-oriented multi-user semantic communications for vqa,” IEEE Wireless Communications Letters, vol. 11, pp. 553–557, 2021.
  46. H. Xie and Z. Qin, “A lite distributed semantic communication system for internet of things,” IEEE Journal on Selected Areas in Communications, vol. 39, pp. 142–153, 2020.
  47. Z. Weng and Z. Qin, “Semantic communication systems for speech transmission,” IEEE Journal on Selected Areas in Communications, vol. 39, pp. 2434–2444, 2021.
  48. H. Xie, Z. Qin, G. Y. Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,” IEEE Transactions on Signal Processing, vol. 69, pp. 2663–2675, 2020.
  49. ——, “Deep learning enabled semantic communication systems,” IEEE Transactions on Signal Processing, vol. 69, pp. 2663–2675, 2021.
  50. Z. Qin, X. Tao, J. Lu, and G. Y. Li, “Semantic communications: Principles and challenges,” arXiv preprint arXiv:2201.01389, 2021.
  51. X. Luo, H.-H. Chen, and Q. Guo, “Semantic communications: Overview, open issues, and future research directions,” IEEE Wireless Communications, vol. 29, no. 1, pp. 210–219, 2022.
  52. N. Farsad, M. Rao, and A. J. Goldsmith, “Deep learning for joint source-channel coding of text,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2326–2330, 2018.
  53. Q. Lan, D. Wen, Z. Zhang, Q. Zeng, X. Chen, P. Popovski, and K. Huang, “What is semantic communication? a view on conveying meaning in the era of machine intelligence,” ArXiv, vol. abs/2110.00196, 2021.
  54. R. Carnap and Y. Bar-Hillel, “An outline of a theory of semantic information,” RLE Technical Reports 247, Research Laboratory of Electronics, Massachusetts Institute of Technology., Cambridge MA, Oct. 1952, 1952.
  55. J. Bao, P. Basu, M. Dean, C. Partridge, A. Swami, W. Leland, and J. A. Hendler, “Towards a theory of semantic communication,” 2011 IEEE Network Science Workshop, pp. 110–117, 2011.
  56. D. Lahat, T. Adal, and C. Jutten, “Multimodal data fusion: An overview of methods, challenges, and prospects,” Proceedings of the IEEE, vol. 103, pp. 1449–1477, 2015.
  57. D. B. Kurka and D. Gunduz, “Deepjscc-f: Deep joint source-channel coding of images with feedback,” IEEE Journal on Selected Areas in Information Theory, vol. 1, pp. 178–193, 2019.
  58. ——, “Bandwidth-agile image transmission with deep joint source-channel coding,” IEEE Transactions on Wireless Communications, vol. 20, pp. 8081–8095, 2020.
  59. S. P. Boyd, N. Parikh, E. K.-W. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn., vol. 3, pp. 1–122, 2011.
  60. X. Weng, J. Wang, D. Held, and K. Kitani, “3D Multi-Object Tracking: A Baseline and New Evaluation Metrics,” IROS, 2020.
  61. Y. Hu, Y. Lu, R. Xu, W. Xie, S. Chen, and Y. Wang, “Collaboration helps camera overtake lidar in 3d detection,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  62. X. Zhou, D. Wang, and P. Krähenbühl, “Objects as points,” in arXiv preprint arXiv:1904.07850, 2019.
  63. R. Xu, Y. Guo, X. Han, X. Xia, H. Xiang, and J. Ma, “Opencda: An open cooperative driving automation framework integrated with co-simulation,” 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), pp. 1155–1162, 2021.
  64. A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “Carla: An open urban driving simulator,” in Conference on robot learning.   PMLR, 2017, pp. 1–16.
  65. C. Reading, A. Harakeh, J. Chae, and S. L. Waslander, “Categorical depth distribution network for monocular 3d object detection,” CVPR, 2021.
  66. A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12 689–12 697, 2019.
  67. J. Luiten, A. Osep, P. Dendorfer, P. H. S. Torr, A. Geiger, L. Leal-Taixé, and B. Leibe, “Hota: A higher order metric for evaluating multi-object tracking,” International Journal of Computer Vision, vol. 129, pp. 548 – 578, 2020.
  68. K. Bernardin and R. Stiefelhagen, “Evaluating multiple object tracking performance: The clear mot metrics,” EURASIP Journal on Image and Video Processing, vol. 2008, pp. 1–10, 2008.
  69. H. Xiang, R. Xu, and J. Ma, “Hm-vit: Hetero-modal vehicle-to-vehicle cooperative perception with vision transformer,” ICCV, 2023.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub