Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SEEK: Semantic Reasoning for Object Goal Navigation in Real World Inspection Tasks (2405.09822v2)

Published 16 May 2024 in cs.RO

Abstract: This paper addresses the problem of object-goal navigation in autonomous inspections in real-world environments. Object-goal navigation is crucial to enable effective inspections in various settings, often requiring the robot to identify the target object within a large search space. Current object inspection methods fall short of human efficiency because they typically cannot bootstrap prior and common sense knowledge as humans do. In this paper, we introduce a framework that enables robots to use semantic knowledge from prior spatial configurations of the environment and semantic common sense knowledge. We propose SEEK (Semantic Reasoning for Object Inspection Tasks) that combines semantic prior knowledge with the robot's observations to search for and navigate toward target objects more efficiently. SEEK maintains two representations: a Dynamic Scene Graph (DSG) and a Relational Semantic Network (RSN). The RSN is a compact and practical model that estimates the probability of finding the target object across spatial elements in the DSG. We propose a novel probabilistic planning framework to search for the object using relational semantic knowledge. Our simulation analyses demonstrate that SEEK outperforms the classical planning and LLMs-based methods that are examined in this study in terms of efficiency for object-goal inspection tasks. We validated our approach on a physical legged robot in urban environments, showcasing its practicality and effectiveness in real-world inspection scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. NeBula: TEAM CoSTAR’s Robotic Autonomy Solution that Won Phase II of DARPA Subterranean Challenge. Field Robotics, 2:1432–1506, 2022.
  2. Localization from semantic observations via the matrix permanent. The International Journal of Robotics Research, 35(1-3):73–99, 2016.
  3. Objectnav revisited: On evaluation of embodied agents navigating to objects. arXiv preprint arXiv:2006.13171, 2020.
  4. Structural inspection path planning via iterative viewpoint resampling with application to aerial robotics. In IEEE International Conference on Robotics and Automation (ICRA), 2015.
  5. Separating the brier score into calibration and refinement components: A graphical exposition. The American Statistician, 39(1):26–32, 1985.
  6. Boston Dynamics. Autonomy Technical Summary - Spot 3.3.2 documentation. https://dev.bostondynamics.com/docs/concepts/autonomy/graphnav_tech_summary, 2023. Accessed: 2024-02-02.
  7. Autonomous Spot:long-range autonomous exploration of extreme environments with legged locomotion. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, 2020.
  8. Adaptive coverage path planning for efficient exploration of unknown environments. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 2022.
  9. Language models are few-shot learners. In Advances in Neural Information Processing Systems, 2020.
  10. Tare: A hierarchical framework for efficiently exploring complex 3d environments. In Robotics: Science and Systems, 2021.
  11. Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV), 2017.
  12. Object goal navigation using goal-oriented semantic exploration. Advances in Neural Information Processing Systems, 2020.
  13. Information-theoretic planning with trajectory optimization for dense 3d mapping. In Robotics: Science and Systems, 2015.
  14. How to not train your dragon: Training-free embodied object goal navigation with semantic frontiers. In Robotics: Science and Systems, 2023.
  15. Pali: A jointly-scaled multilingual language-image model. In International Conference on Learning Representations, 2022.
  16. Autonomous search for underground mine rescue using aerial robots. In IEEE Aerospace Conference, 2020.
  17. Robothor: An open simulation-to-real embodied ai platform. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  18. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, 2019.
  19. M. Dharmadhikari and K. Alexis. Semantics-aware exploration and inspection path planning. In IEEE International Conference on Robotics and Automation (ICRA), London, United Kingdom, 2023.
  20. Can an embodied agent find your “cat-shaped mug”? llm-based zero-shot object navigation. IEEE Robotics and Automation Letters, 2023.
  21. Path planning with modified a star algorithm for a mobile robot. Procedia Engineering, 96, 2014.
  22. Step: Stochastic traversability evaluation and planning for risk-aware off-road navigation. Robotics: Science and Systems, 2021.
  23. CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  24. Anymal in the field: Solving industrial inspection of an offshore hvdc platform with a quadrupedal robot. In Field and Service Robotics: Results of the 12th International Conference. Springer, 2021.
  25. Semantic belief behavior graph: Enabling autonomous robot inspection in unknown environments. arXiv preprint arXiv:2401.17191, 2024.
  26. Volumetric instance-aware semantic mapping and 3d object discovery. IEEE Robotics and Automation Letters, 4(3):3037–3044, 2019.
  27. Mask R-CNN. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  28. Heterogeneous ground and air platforms, homogeneous sensing: Team CSIRO data61’s approach to the DARPA subterranean challenge. Field Robotics, 2(1):595–636, 2022.
  29. Towards a generic solution for inspection of industrial sites. In Field and Service Robotics, pages 575–589. Springer, 2018.
  30. Conceptfusion: Open-set multimodal 3d mapping. Robotics: Science and Systems, 2023.
  31. Semantic-aware quality assessment of building elements using graph neural networks. Automation in Construction, 155:105054, 2023.
  32. Lerf: Language embedded radiance fields. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  33. Algorithms for Decision Making. MIT Press, 2022.
  34. Panoptic neural fields: A semantic object-aware neural scene representation. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  35. Review of robotic infrastructure inspection systems. Journal of Infrastructure Systems, 23(3):04017004, 2017.
  36. ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings. Advances in Neural Information Processing Systems (NeurIPS), 2022.
  37. Thda: Treasure hunt data augmentation for semantic navigation. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  38. Mteb: Massive text embedding benchmark. In Conference of the European Chapter of the Association for Computational Linguistics, 2023.
  39. Fig-op: Exploring large-scale unknown environments on a fixed time budget. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022.
  40. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), 2021.
  41. Poni: Potential functions for objectgoal navigation with interaction-free learning. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  42. You only look once: Unified, real-time object detection. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  43. Locus 2.0: Robust and computationally efficient lidar odometry for real-time 3d mapping. IEEE Robotics and Automation Letters, 7(4):9043–9050, 2022.
  44. Kimera: From SLAM to spatial perception with 3d dynamic scene graphs. The International Journal of Robotics Research, 40(12-14):1510–1546, 2021.
  45. Habitat: A platform for embodied ai research. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  46. Clip-fields: Weakly supervised semantic fields for robotic memory. In ICRA Workshop on Pretraining for Robotics (PT4R), 2023.
  47. Navigation with large language models: Semantic guesswork as a heuristic for planning. In Conference on Robot Learning (CoRL), 2023.
  48. Progprompt: Generating situated robot task plans using large language models. In IEEE International Conference on Robotics and Automation (ICRA), 2023.
  49. Habitat 2.0: Training home assistants to rearrange their habitat. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
  50. Automatic inspection data collection of building surface based on BIM and UAV. Automation in Construction, 131:103881, 2021.
  51. Probabilistic Robotics. MIT Press, 2005.
  52. Cerberus in the darpa subterranean challenge. Science Robotics, 2022.
  53. Large language models still can’t plan (a benchmark for llms on planning and reasoning about change). In NeurIPS Foundation Models for Decision Making Workshop, 2022.
  54. Building information modeling (bim) for existing buildings—literature review and future needs. Automation in construction, 2014.
  55. Learning object-conditioned exploration using distributed soft actor critic. In Conference on Robot Learning. PMLR, 2021.
  56. Apriltag 2: Efficient and robust fiducial detection. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016.
  57. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 2022.
  58. Tidybot: personalized robot assistance with large language models. Autonomous Robots, 47(8):1087–1102, 2023.
  59. C-pack: Packaged resources to advance general chinese embedding. arXiv preprint arXiv:2309.07597, 2023.
  60. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:2302.05128, 2023.
  61. Habitat-matterport 3d semantics dataset. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2023a.
  62. Karmesh Yadav et al. Habitat challenge 2023. https://aihabitat.org/challenge/2023/, 2023b.
  63. Vlfm: Vision-language frontier maps for zero-shot semantic navigation. In IEEE International Conference on Robotics and Automation (ICRA), 2024.
  64. Esc: Exploration with soft commonsense constraints for zero-shot object navigation. In International Conference on Machine Learning (ICML), 2023.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com