Papers
Topics
Authors
Recent
2000 character limit reached

Collaborative Dynamic 3D Scene Graphs for Automated Driving (2309.06635v3)

Published 12 Sep 2023 in cs.RO

Abstract: Maps have played an indispensable role in enabling safe and automated driving. Although there have been many advances on different fronts ranging from SLAM to semantics, building an actionable hierarchical semantic representation of urban dynamic scenes and processing information from multiple agents are still challenging problems. In this work, we present Collaborative URBan Scene Graphs (CURB-SG) that enable higher-order reasoning and efficient querying for many functions of automated driving. CURB-SG leverages panoptic LiDAR data from multiple agents to build large-scale maps using an effective graph-based collaborative SLAM approach that detects inter-agent loop closures. To semantically decompose the obtained 3D map, we build a lane graph from the paths of ego agents and their panoptic observations of other vehicles. Based on the connectivity of the lane graph, we segregate the environment into intersecting and non-intersecting road areas. Subsequently, we construct a multi-layered scene graph that includes lane information, the position of static landmarks and their assignment to certain map sections, other vehicles observed by the ego agents, and the pose graph from SLAM including 3D panoptic point clouds. We extensively evaluate CURB-SG in urban scenarios using a photorealistic simulator. We release our code at http://curb.cs.uni-freiburg.de.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. B. Yang, M. Liang, and R. Urtasun, “HDNET: Exploiting HD maps for 3D object detection,” in Conf. on Robot Learning, 2018.
  2. D. Cattaneo, D. G. Sorrenti, and A. Valada, “CMRNet++: Map and camera agnostic monocular visual localization in LiDAR maps,” Int. Conf. on Robotics and Automation Workshop on Emerging Learning and Alg. Methods for Data Association in Robotics, 2020.
  3. A. Diaz-Diaz, M. Ocaña, A. Llamazares, C. Gómez-Huélamo, P. Revenga, and L. M. Bergasa, “HD maps: Exploiting opendrive potential for path planning and map monitoring,” in IEEE Intelligent Vehicles Symposium, 2022, pp. 1211–1217.
  4. R. Trumpp, M. Büchner, A. Valada, and M. Caccamo, “Efficient learning of urban driving policies using bird’s-eye-view state representations,” Int. Conf. on Intelligent Transportation Systems, 2023.
  5. F. Poggenhans, J.-H. Pauls, J. Janosovits, S. Orf, M. Naumann, F. Kuhnt, and M. Mayr, “Lanelet2: A high-definition map framework for the future of automated driving,” in Int. Conf. on Intelligent Transportation Systems, 2018.
  6. X. Chen, A. Milioto, E. Palazzolo, P. Giguère, J. Behley, and C. Stachniss, “SuMa++: Efficient LiDAR-based semantic SLAM,” in Int. Conf. on Intelligent Robots and Systems, 2019, pp. 4530–4537.
  7. K. Koide, J. Miura, and E. Menegatti, “A portable three-dimensional LiDAR-based system for long-term and wide-area people behavior measurement,” Int. Journal of Adv. Rob. Systems, vol. 16, no. 2, 2019.
  8. K. Wong, Y. Gu, and S. Kamijo, “Mapping for autonomous driving: Opportunities and challenges,” IEEE Intelligent Transportation Systems Magazine, vol. 13, no. 1, pp. 91–106, 2020.
  9. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., “Learning transferable visual models from natural language supervision,” in Int. Conf. on Machine Learning.   PMLR, 2021, pp. 8748–8763.
  10. H. Bavle, J. L. Sanchez-Lopez, M. Shaheer, J. Civera, and H. Voos, “S-Graphs+: Real-time localization and mapping leveraging hierarchical representations,” IEEE Robotics and Automation Letters, vol. 8, no. 8, pp. 4927–4934, 2023.
  11. N. Hughes, Y. Chang, and L. Carlone, “Hydra: A real-time spatial perception system for 3D scene graph construction and optimization,” in Robotics: Science and Systems, 2022.
  12. I. Armeni, Z.-Y. He, A. Zamir, J. Gwak, J. Malik, M. Fischer, and S. Savarese, “3D scene graph: A structure for unified semantics, 3D space, and camera,” in Int. Conf. on Computer Vision, 2019, pp. 5663–5672.
  13. A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” in Conf. on Robot Learning, 2017.
  14. J. Zhang and S. Singh, “LOAM: Lidar odometry and mapping in real-time.” in Robotics: Science and Systems, 2014.
  15. J. Arce, N. Vödisch, D. Cattaneo, W. Burgard, and A. Valada, “PADLoC: LiDAR-based deep loop closure detection and registration using panoptic attention,” IEEE Robotics and Automation Letters, vol. 8, no. 3, pp. 1319–1326, 2023.
  16. D. Zou, P. Tan, and W. Yu, “Collaborative visual SLAM for multiple agents: A brief survey,” Virtual Reality and Intelligent Hardware, vol. 1, no. 5, pp. 461–482, 2019.
  17. L. Riazuelo, J. Civera, and J. Montiel, “C2TAM: A cloud framework for cooperative tracking and mapping,” Robotics and Autonomous Systems, vol. 62, no. 4, pp. 401–413, 2014.
  18. M. Karrer, P. Schmuck, and M. Chli, “CVI-SLAM — collaborative visual-inertial SLAM,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 2762–2769, 2018.
  19. P. Schmuck, T. Ziegler, M. Karrer, J. Perraudin, and M. Chli, “COVINS: Visual-inertial SLAM for centralized collaboration,” in IEEE Int. Symp. on Mixed and Augmented Reality Adjunct, 2021, pp. 171–176.
  20. Y. Chang, K. Ebadi, C. E. Denniston, M. F. Ginting, A. Rosinol, A. Reinke, et al., “LAMP 2.0: A robust multi-robot SLAM system for operation in challenging large-scale underground environments,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9175–9182, 2022.
  21. P.-Y. Lajoie and G. Beltrame, “Swarm-SLAM: Sparse decentralized collaborative simultaneous localization and mapping framework for multi-robot systems,” arXiv preprint arXiv:2301.06230, 2023.
  22. Y. Tian, Y. Chang, F. Herrera Arias, C. Nieto-Granda, J. P. How, and L. Carlone, “Kimera-Multi: Robust, distributed, dense metric-semantic SLAM for multi-robot systems,” IEEE Trans. on Robotics, vol. 38, no. 4, pp. 2022–2038, 2022.
  23. Y. Huang, T. Shan, F. Chen, and B. Englot, “DiSCo-SLAM: Distributed scan context-enabled multi-robot LiDAR SLAM with two-stage global-local graph optimization,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 1150–1157, 2022.
  24. G. Kim and A. Kim, “Scan Context: Egocentric spatial descriptor for place recognition within 3D point cloud map,” in Int. Conf. on Intelligent Robots and Systems, 2018, pp. 4802–4809.
  25. H. Bavle, J. L. Sanchez-Lopez, M. Shaheer, J. Civera, and H. Voos, “Situational graphs for robot navigation in structured indoor environments,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9107–9114, 2022.
  26. C. Lang, A. Braun, and A. Valada, “Robust object detection using knowledge graph embeddings,” in DAGM German Conference on Pattern Recognition, 2022, pp. 445–461.
  27. M. Büchner, J. Zürn, I.-G. Todoran, A. Valada, and W. Burgard, “Learning and aggregating lane graphs for urban automated driving,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2023, pp. 13 415–13 424.
  28. Liao, Bencheng and Chen, Shaoyu and Wang, Xinggang and Cheng, Tianheng, and Zhang, Qian and Liu, Wenyu and Huang, Chang, “MapTR: Structured modeling and learning for online vectorized HD map construction,” in Int. Conf. on Learning Representations, 2023.
  29. S. He and H. Balakrishnan, “Lane-level street map extraction from aerial imagery,” in IEEE Winter Conference on Applications of Computer Vision, January 2022, pp. 2080–2089.
  30. N. Gosala, K. Petek, P. L. Drews-Jr, W. Burgard, and A. Valada, “Skyeye: Self-supervised bird’s-eye-view semantic mapping using monocular frontal view images,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2023, pp. 14 901–14 910.
  31. R. Mohan and A. Valada, “Amodal panoptic segmentation,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2022, pp. 21 023–21 032.
  32. R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, et al., “Visual genome: Connecting language and vision using crowdsourced dense image annotations,” Int. Journal of Computer Vision, vol. 123, pp. 32–73, 2017.
  33. A. Rosinol, A. Gupta, M. Abate, J. Shi, and L. Carlone, “3D dynamic scene graphs: Actionable spatial perception with places, objects, and humans,” Robotics: Science and Systems, 2020.
  34. U.-H. Kim, J.-M. Park, T.-j. Song, and J.-H. Kim, “3-D scene graph: A sparse and semantic representation of physical environments for intelligent agents,” IEEE Trans. on Cybernetics, vol. 50, no. 12, pp. 4921–4933, 2020.
  35. J. Johnson, B. Hariharan, L. Van Der Maaten, J. Hoffman, L. Fei-Fei, C. Lawrence Zitnick, and R. Girshick, “Inferring and executing programs for visual reasoning,” in Int. Conf. on Computer Vision, 2017, pp. 2989–2998.
  36. J. Wald, H. Dhamo, N. Navab, and F. Tombari, “Learning 3d semantic scene graphs from 3d indoor reconstructions,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2020.
  37. S.-C. Wu, J. Wald, K. Tateno, N. Navab, and F. Tombari, “SceneGraphFusion: Incremental 3D scene graph prediction from RGB-D sequences,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition, June 2021, pp. 7515–7525.
  38. K. Sirohi, R. Mohan, D. Büscher, W. Burgard, and A. Valada, “EfficientLPS: Efficient LiDAR panoptic segmentation,” IEEE Trans. on Robotics, vol. 38, no. 3, pp. 1894–1914, 2022.
  39. K. Koide, M. Yokozuka, S. Oishi, and A. Banno, “Voxelized GICP for fast and accurate 3D point cloud registration,” in Int. Conf. on Robotics and Automation, 2021, pp. 11 054–11 059.
  40. R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard, “g2o: A general framework for graph optimization,” in Int. Conf. on Robotics and Automation, 2011, pp. 3607–3613.
  41. M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al., “A density-based algorithm for discovering clusters in large spatial databases with noise,” in ACM SIGKDD Conf. on Knowledge Discovery and Data Mining, vol. 96, no. 34, 1996, pp. 226–231.
  42. G. Jozkow, P. Wieczorek, M. Karpina, A. Walicka, and A. Borkowski, “Performance evaluation of sUAS equipped with Velodyne HDL-32E LiDAR sensor,” The Int. Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 42, pp. 171–177, 2017.
  43. A. Van, D. Lindenbaum, and T. M. Bacastow, “SpaceNet: A remote sensing dataset and challenge series,” arXiv preprint arXiv:1807.01232, 2018.
Citations (17)

Summary

  • The paper proposes a centralized collaborative SLAM framework that aggregates multi-agent LiDAR data and optimizes global pose graphs.
  • It constructs a hierarchical, semantic-rich 3D scene graph that segments and categorizes both static and dynamic urban elements.
  • Empirical results in CARLA simulations demonstrate reduced localization errors and enhanced mapping precision through multi-agent cooperation.

An Analytical Overview of Collaborative Dynamic 3D Scene Graphs for Automated Driving

The paper presents an advanced methodology for constructing Collaborative Dynamic 3D Scene Graphs (CURB-SG) specifically tailored for automated driving applications. This work situates itself at the intersection of simultaneous localization and mapping (SLAM), high-definition (HD) semantic mapping, and automated driving (AD), addressing the inherent challenges in representing dynamic urban environments. The authors propose a centralized collaborative SLAM approach that leverages multi-agent LiDAR data, enriched with semantic segmentation, to construct and maintain an evolving hierarchical scene graph.

Technical Contributions and Methodological Innovations

  1. Collaborative SLAM Framework: At the core of this research is a centralized SLAM approach that efficiently aggregates observations from multiple agents. Agents independently process LiDAR data to estimate odometry and detect static and dynamic elements, transmitting keyframe packages to a central server. The server, in turn, detects intra- and inter-agent loop closures to optimize a global pose graph. This method enables fast and frequent map updates over large-scale environments by dynamically restructuring the pose graph using edge contraction techniques.
  2. Semantic Scene Graph Construction: CURB-SG introduces a multi-layered scene graph structure. Using the trajectories of ego agents and their observations, a lane graph is constructed, facilitating the segmentation of the environment into intersecting and non-intersecting road areas. The hierarchical scene graph further deciphers these areas into categories per the presence of static and dynamic entities, significantly improving spatial and semantic querying.
  3. Panoptic Data Integration: The approach predicates on integrating panoptic segmentation data to enhance the semantic representation within the SLAM framework. The allocation of distinct voxel resolutions for different semantic objects allows for higher granularity where necessary, improving the precision of map localization and robustness to sensor noise.
  4. Interconnection of SLAM and Graph-based Representations: The tightly coupled integration of scene graphs with SLAM-derived pose graphs is a distinctive feature of this work. It combines the strengths of metric and semantic mapping with topological abstractions, seamlessly unifying spatial data for subsequent autonomous driving applications like perception, planning, and control.

Quantitative Insights and Empirical Analysis

The experimental evaluation conducted using the CARLA simulator demonstrates that CURB-SG accurately constructs scene graphs over multiple urban scenarios. Results indicate a significant reduction in localization errors, with increased agent collaboration leading to improved mapping accuracy and exploration efficiency. Notably, localization and mapping errors decrease when more agents participate, supporting the system's efficacy in cooperative settings.

The constructed lane graphs, evaluated via metrics such as TOPO, GEO, and APLS, show enhancements in recall and graph IoU with the incorporation of vehicle observations, affirming the utility of integrating observed dynamic entities into the mapping process. The partitioning of environments into meaningful urban structures, verified against ground-truth data, showcases high precision and recall, underlying the viability of the proposed method for real-world applications.

Implications and Future Directions

The CURB-SG framework offers theoretical and practical implications for the field of automated driving. By leveraging collaborative and hierarchical representations, it bridges the gap between rich environmental semantics and real-time mapping demands. The framework's potential to integrate diverse data sources supports future scalability and adaptability to complex urban landscapes.

Looking forward, further developments could address decentralized implementations to improve real-time performance under resource constraints. Extending the framework to include pedestrian dynamics and additional topological features like road boundaries could provide a more comprehensive urban schema. The transition from simulation to real-world deployment remains a compelling challenge, necessitating robust handling of variable data fidelity and real-world uncertainties.

In summary, the CURB-SG approach presents a significant advancement in managing semantic-rich urban environments for automated driving. Its innovative blend of collaborative SLAM and semantic scene graph construction offers a robust platform for realizing high-level autonomous vehicle functionalities.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Youtube Logo Streamline Icon: https://streamlinehq.com