Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception (2403.10145v2)

Published 15 Mar 2024 in cs.CV and cs.RO

Abstract: The value of roadside perception, which could extend the boundaries of autonomous driving and traffic management, has gradually become more prominent and acknowledged in recent years. However, existing roadside perception approaches only focus on the single-infrastructure sensor system, which cannot realize a comprehensive understanding of a traffic area because of the limited sensing range and blind spots. Orienting high-quality roadside perception, we need Roadside Cooperative Perception (RCooper) to achieve practical area-coverage roadside perception for restricted traffic areas. Rcooper has its own domain-specific challenges, but further exploration is hindered due to the lack of datasets. We hence release the first real-world, large-scale RCooper dataset to bloom the research on practical roadside cooperative perception, including detection and tracking. The manually annotated dataset comprises 50k images and 30k point clouds, including two representative traffic scenes (i.e., intersection and corridor). The constructed benchmarks prove the effectiveness of roadside cooperation perception and demonstrate the direction of further research. Codes and dataset can be accessed at: https://github.com/AIR-THU/DAIR-RCooper.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Fast and robust registration of partially overlapping point clouds. IEEE Robotics and Automation Letters (RAL), 7(2):1502–1509, 2021.
  2. Cooperative perception for 3d object detection in driving scenarios using infrastructure sensors. IEEE Transactions on Intelligent Transportation Systems (TITS), 23(3):1852–1864, 2022.
  3. A survey and framework of cooperative perception: From heterogeneous singleton to hierarchical cooperation. arXiv preprint arXiv:2208.10590, 2022.
  4. Vinet: Lightweight, scalable, and heterogeneous cooperative perception for 3d object detection. Mechanical Systems and Signal Processing, 204:110723, 2023.
  5. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11621–11631, 2020.
  6. Survey on cooperative perception in an automotive context. IEEE Transactions on Intelligent Transportation Systems (TITS), 23(9):14204–14223, 2022.
  7. F-cooper: Feature based cooperative perception for autonomous vehicle edge computing system using 3d point clouds. In Proceedings of the ACM/IEEE Symposium on Edge Computing, pages 88–100, 2019a.
  8. Cooper: Cooperative perception for connected autonomous vehicles based on 3d point clouds. In IEEE International Conference on Distributed Computing Systems, pages 514–524, 2019b.
  9. Model-agnostic multi-agent perception framework. In IEEE International Conference on Robotics and Automation (ICRA), pages 1471–1478, 2023.
  10. A9-dataset: Multi-sensor infrastructure-based dataset for mobility research. In IEEE Intelligent Vehicles Symposium (IV), pages 965–970, 2022.
  11. Coopernaut: end-to-end driving with cooperative perception for networked vehicles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17252–17262, 2022.
  12. Baai-vanjee roadside dataset: Towards the connected automated vehicle highway technologies in challenging environments of china. arXiv preprint arXiv:2105.14370, 2021.
  13. CARLA: An open urban driving simulator. In Conference on robot learning (CoRL), pages 1–16, 2017.
  14. Calibration-free bev representation for infrastructure perception. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 9008–9013, 2023a.
  15. QUEST: Query stream for vehicle-infrastructure cooperative perception. arXiv preprint arXiv:2308.01804, 2023b.
  16. Lidar-cs dataset: Lidar point cloud dataset with cross-sensors for 3d object detection. arXiv preprint arXiv:2301.12515, 2023.
  17. Are we ready for autonomous driving? the kitti vision benchmark suite. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3354–3361, 2012.
  18. Mona: The munich motion dataset of natural driving. In IEEE International Intelligent Transportation Systems Conference (ITSC), pages 2093–2100, 2022.
  19. Collaborative perception in autonomous driving: Methods, datasets and challenges. IEEE Intelligent Transportation Systems Magazine, 15(6):131–151, 2023.
  20. Where2comm: Communication-efficient collaborative perception via spatial confidence maps. In Conference on Neural Information Processing Systems (NeurIPS), pages 4874–4886, 2022.
  21. Collaboration helps camera overtake lidar in 3d detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9243–9252, 2023.
  22. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2014.
  23. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12697–12705, 2019.
  24. Latency-aware collaborative perception. In European Conference on Computer Vision (ECCV), pages 316–332, 2022.
  25. Learning distilled collaboration graph for multi-agent perception. In Conference on Neural Information Processing Systems (NeurIPS), pages 29541–29552, 2021.
  26. V2x-sim: Multi-agent collaborative perception dataset and benchmark for autonomous driving. IEEE Robotics and Automation Letters (RAL), 7(4):10914–10921, 2022.
  27. When2com: Multi-agent perception via communication graph grouping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4106–4115, 2020a.
  28. Who2com: Collaborative perception via learnable handshake communication. In IEEE International Conference on Robotics and Automation (ICRA), pages 6876–6883, 2020b.
  29. Decoupled weight decay regularization. In International Conference on Learning Representations (ICLR), 2018.
  30. DOLPHINS: Dataset for collaborative perception enabled harmonious and interconnected self-driving. In Proceedings of the Asian Conference on Computer Vision (ACCV), pages 4361–4377, 2022.
  31. Adaptive feature fusion for cooperative perception using lidar point clouds. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 1186–1195, 2023.
  32. Boxcars: Improving fine-grained recognition of vehicles using 3-d bounding boxes in traffic surveillance. IEEE Transactions on Intelligent Transportation Systems (TITS), 20(1):97–108, 2018.
  33. V2vnet: Vehicle-to-vehicle communication for joint perception and prediction. In European Conference on Computer Vision (ECCV), pages 605–621, 2020.
  34. VIMI: Vehicle-infrastructure multi-view intermediate fusion for camera-based 3d object detection. arXiv preprint arXiv:2303.10975, 2023.
  35. 3d multi-object tracking: A baseline and new evaluation metrics. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10359–10366, 2020.
  36. OpenCDA: an open cooperative driving automation framework integrated with co-simulation. In IEEE International Intelligent Transportation Systems Conference (ITSC), pages 1155–1162, 2021.
  37. V2X-ViT: Vehicle-to-everything cooperative perception with vision transformer. In European Conference on Computer Vision (ECCV), pages 107–124, 2022a.
  38. Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In IEEE International Conference on Robotics and Automation (ICRA), pages 2583–2589, 2022b.
  39. CoBEVT: Cooperative bird’s eye view semantic segmentation with sparse transformers. In Conference on robot learning (CoRL), pages 989–1000, 2023a.
  40. V2V4Real: A real-world large-scale dataset for vehicle-to-vehicle cooperative perception. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13712–13722, 2023b.
  41. BEVHeight++: Toward robust visual centric 3d object detection. arXiv preprint arXiv:2309.16179, 2023a.
  42. MonoGAE: Roadside monocular 3d object detection with ground-aware embeddings. arXiv preprint arXiv:2310.00400, 2023b.
  43. BEVHeight: A robust framework for vision-based roadside 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21611–21620, 2023c.
  44. Rope3D: the roadside perception dataset for autonomous driving and monocular 3d object detection task. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21341–21350, 2022.
  45. Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21361–21370, 2022.
  46. Vehicle-infrastructure cooperative 3d object detection via feature flow prediction. In Conference on Neural Information Processing Systems (NeurIPS), 2023a.
  47. V2X-Seq: A large-scale sequential dataset for vehicle-infrastructure cooperative perception and forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5486–5495, 2023b.
  48. Keypoints-based deep feature fusion for cooperative vehicle detection of autonomous driving. IEEE Robotics and Automation Letters (RAL), 7(2):3054–3061, 2022.
  49. Data-driven intelligent transportation systems: A survey. IEEE Transactions on Intelligent Transportation Systems (TITS), 12(4):1624–1639, 2011.
  50. Multi-robot collaborative perception with graph neural networks. IEEE Robotics and Automation Letters (RAL), 7(2):2289–2296, 2022.
Citations (13)

Summary

  • The paper introduces a novel dataset with 50,000 images and 30,000 point clouds, providing detailed annotations for cooperative perception tasks.
  • It employs multi-sensor configurations of cameras and LiDARs to overcome the limitations of single-infrastructure systems in handling occlusions and blind spots.
  • Benchmark evaluations demonstrate that cooperative perception enhances detection and tracking performance in corridor scenes, while intersection scenarios indicate areas for further research.

Overview of "RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception"

The paper entitled "RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception" introduces a critical advancement in the development of datasets geared towards enhancing roadside cooperative perception (RCooper) systems. Roadside perception is becoming foundational to advancing autonomous driving (AD) and traffic management systems. The authors illuminate the limitations of current single-infrastructure sensor systems, citing their inadequacy due to limited sensing ranges and the presence of blind spots. They advocate for RCooper to enable more comprehensive and effective area-coverage, particularly in confined traffic areas.

The release of the RCooper dataset marks a significant contribution towards improving practical roadside cooperative perception. Comprising 50,000 images and 30,000 point clouds, this extensive dataset captures two primary traffic scenarios: intersections and corridors. The benchmarks set forth provide a solid foundation for further research into this domain, focusing on detection and tracking tasks. The dataset was meticulously curated with manual annotations, providing high-quality data for the exploration and development of real-world applications. The dataset includes ten semantic classes with 3D bounding boxes and trajectory annotations, offering a structure ripe for advanced AI research in perception tasks.

Technical Contributions

The paper details three major challenges associated with the RCooper framework: data heterogeneity, cooperative representation enhancement, and the need for improved perception performance. The cooperative perception task involves integrating data from various roadside infrastructures to enhance the overall perception accuracy, taking full advantage of the extended and overlapping sensory ranges made possible by multiple cooperating sensors.

RCooper's substantial contribution is anchored on the three-part installations of sensors, which include combinations of cameras and LiDAR systems. These installations are designed specifically for different roadside settings; for instance, corridor scenes use a pair of cameras with a group of multiline LiDARs to cover extended areas, while intersection scenes use a hybrid setup to tackle complexities like occlusions.

Benchmark Evaluations and Results

The authors conducted comprehensive benchmark experiments using existing state-of-the-art (SOTA) methods for perception tasks. For corridor scenes, results showed that cooperative methods surpass single-infrastructure approaches, with early and intermediate feature fusion methods demonstrating favorable outcomes. However, intersection scenes posed additional challenges due to the data heterogeneity from the various LiDAR types used. In these scenes, even advanced fusion methods struggled, and basic fusion methods like late fusion often yielded better results, highlighting the need for further method specialization to improve performance under heterogenous data conditions.

Additionally, authors implemented tracking-by-detection benchmarks revealing that cooperative perception can improve tracking accuracy, though dependent on robust initial detection accuracy. The challenges indicated—in particular within intersection environments—underscore areas for continued research, such as enhancing cooperative representations that can handle data variance robustly.

Implications and Future Directions

The introduction of the RCooper dataset provides a fertile ground for AI research to expand the horizons of roadside perception capabilities. By addressing data heterogeneity and enhancing cooperative perception strategies using this dataset, researchers can improve AD systems and intelligent traffic management. Furthermore, the dataset offers opportunities to delve into novel methodologies for effective perception in real-world scenarios.

Future advancements in RCooper could branch into developing end-to-end cooperative perception solutions that integrate spatial and temporal elements more fluidly. Overcoming practical challenges such as calibration tolerance, sensor fusion integrity, and real-time data processing under variable conditions could further extend the utility and applicability of cooperative roadside systems.

In summary, the RCooper dataset provides an unprecedented opportunity to address the needs of comprehensive perception capabilities necessary for future AD and traffic management systems. Its emergence sets a new standard for real-world large-scale datasets, paving the way for richer, more resilient cooperative perception frameworks.

X Twitter Logo Streamline Icon: https://streamlinehq.com