Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments (2403.09412v2)

Published 14 Mar 2024 in cs.CV, cs.AI, and cs.RO

Abstract: Environment representations endowed with sophisticated semantics are pivotal for facilitating seamless interaction between robots and humans, enabling them to effectively carry out various tasks. Open-vocabulary maps, powered by Visual-LLMs (VLMs), possess inherent advantages, including zero-shot learning and support for open-set classes. However, existing open-vocabulary maps are primarily designed for small-scale environments, such as desktops or rooms, and are typically geared towards limited-area tasks involving robotic indoor navigation or in-place manipulation. They face challenges in direct generalization to outdoor environments characterized by numerous objects and complex tasks, owing to limitations in both understanding level and map structure. In this work, we propose OpenGraph, the first open-vocabulary hierarchical graph representation designed for large-scale outdoor environments. OpenGraph initially extracts instances and their captions from visual images, enhancing textual reasoning by encoding them. Subsequently, it achieves 3D incremental object-centric mapping with feature embedding by projecting images onto LiDAR point clouds. Finally, the environment is segmented based on lane graph connectivity to construct a hierarchical graph. Validation results from public dataset SemanticKITTI demonstrate that OpenGraph achieves the highest segmentation and query accuracy. The source code of OpenGraph is publicly available at https://github.com/BIT-DYN/OpenGraph.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. A. Hornung, et al., “Octomap: An efficient probabilistic 3d mapping framework based on octrees,” Autonomous robots, vol. 34, no. 3, pp. 189–206, 2013.
  2. Y. Deng, et al., “See-csom: Sharp-edged and efficient continuous semantic occupancy mapping for mobile robots,” IEEE Transactions on Industrial Electronics, 2023.
  3. A. Radford, et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning, pp. 8748–8763.   PMLR, 2021.
  4. C. Jia, et al., “Scaling up visual and vision-language representation learning with noisy text supervision,” in International conference on machine learning, pp. 4904–4916.   PMLR, 2021.
  5. J. Li, et al., “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in International Conference on Machine Learning, pp. 12 888–12 900.   PMLR, 2022.
  6. J. Achiam, et al., “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
  7. H. Touvron, et al., “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023.
  8. A. Chowdhery, et al., “Palm: Scaling language modeling with pathways,” Journal of Machine Learning Research, vol. 24, no. 240, pp. 1–113, 2023.
  9. C. Huang, et al., “Visual language maps for robot navigation,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 10 608–10 615.   IEEE, 2023.
  10. N. M. M. Shafiullah, et al., “Clip-fields: Weakly supervised semantic fields for robotic memory,” arXiv preprint arXiv:2210.05663, 2022.
  11. K. Blomqvist, et al., “Neural implicit vision-language feature fields,” arXiv preprint arXiv:2303.10962, 2023.
  12. K. M. Jatavallabhula, et al., “Conceptfusion: Open-set multimodal 3d mapping,” arXiv preprint arXiv:2302.07241, 2023.
  13. E. Greve, et al., “Collaborative dynamic 3d scene graphs for automated driving,” arXiv preprint arXiv:2309.06635, 2023.
  14. S. Yang, et al., “Semantic 3d occupancy mapping through efficient high order crfs,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 590–597.   IEEE, 2017.
  15. Y. Deng, et al., “S-mki: Incremental dense semantic occupancy reconstruction through multi-entropy kernel inference,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3824–3829.   IEEE, 2022.
  16. Y. Deng, et al., “Hd-ccsom: Hierarchical and dense collaborative continuous semantic occupancy mapping through label diffusion,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2417–2422.   IEEE, 2022.
  17. Y. Haghighi, et al., “Neural implicit dense semantic slam,” arXiv preprint arXiv:2304.14560, 2023.
  18. Y. Shi, et al., “City-scale continual neural semantic mapping with three-layer sampling and panoptic representation,” Knowledge-Based Systems, vol. 284, p. 111145, 2024.
  19. S. Peng, et al., “Openscene: 3d scene understanding with open vocabularies,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 815–824, 2023.
  20. B. Li, et al., “Language-driven semantic segmentation,” CoRR, vol. abs/2201.03546, 2022. [Online]. Available: https://arxiv.org/abs/2201.03546
  21. J. Kerr, et al., “Lerf: Language embedded radiance fields,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 19 729–19 739, 2023.
  22. S. Lu, et al., “Ovir-3d: Open-vocabulary 3d instance retrieval without training on 3d data,” in Conference on Robot Learning, pp. 1610–1620.   PMLR, 2023.
  23. A. Takmaz, et al., “Openmask3d: Open-vocabulary 3d instance segmentation,” arXiv preprint arXiv:2306.13631, 2023.
  24. P. D. Nguyen, et al., “Open3dis: Open-vocabulary 3d instance segmentation with 2d mask guidance,” arXiv preprint arXiv:2312.10671, 2023.
  25. N. Reimers, et al., “Sentence-bert: Sentence embeddings using siamese bert-networks,” arXiv preprint arXiv:1908.10084, 2019.
  26. Q. Gu, et al., “Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning,” arXiv preprint arXiv:2309.16650, 2023.
  27. N. Hughes, et al., “Hydra: A real-time spatial perception system for 3d scene graph construction and optimization,” arXiv preprint arXiv:2201.13360, 2022.
  28. S.-C. Wu, et al., “Scenegraphfusion: Incremental 3d scene graph prediction from rgb-d sequences,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7515–7525, 2021.
  29. S.-C. Wu, et al., “Incremental 3d semantic scene graph prediction from rgb sequences,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5064–5074, 2023.
  30. H. Chang, et al., “Context-aware entity grounding with open-vocabulary 3d scene graphs,” arXiv preprint arXiv:2309.15940, 2023.
  31. Y. Zhang, et al., “Recognize anything: A strong image tagging model,” arXiv preprint arXiv:2306.03514, 2023.
  32. S. Liu, et al., “Grounding dino: Marrying dino with grounded pre-training for open-set object detection,” arXiv preprint arXiv:2303.05499, 2023.
  33. T. Pan, et al., “Tokenize anything via prompting,” arXiv preprint arXiv:2312.09128, 2023.
  34. B. Mersch, et al., “Receding Moving Object Segmentation in 3D LiDAR Data Using Sparse 4D Convolutions,” IEEE Robotics and Automation Letters (RA-L), vol. 7, no. 3, pp. 7503–7510, 2022.
  35. A. Milioto, et al., “Rangenet++: Fast and accurate lidar semantic segmentation,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4213–4220.   IEEE, 2019.
  36. L.-C. Chen, et al., “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV), pp. 801–818, 2018.
  37. X. Li, et al., “Improving semantic segmentation via decoupled body and edge supervision,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVII 16, pp. 435–452.   Springer, 2020.
  38. J. Behley, et al., “Towards 3D LiDAR-based semantic scene understanding of 3D point cloud sequences: The SemanticKITTI Dataset,” The International Journal on Robotics Research, vol. 40, DOI 10.1177/02783649211006735, no. 8-9, pp. 959–967, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yinan Deng (7 papers)
  2. Jiahui Wang (46 papers)
  3. Jingyu Zhao (14 papers)
  4. Xinyu Tian (22 papers)
  5. Guangyan Chen (5 papers)
  6. Yi Yang (856 papers)
  7. Yufeng Yue (28 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com