Stairway to Success: Zero-Shot Floor-Aware Object-Goal Navigation via LLM-Driven Coarse-to-Fine Exploration (2505.23019v2)

Published 29 May 2025 in cs.RO

Abstract: Object-Goal Navigation (OGN) remains challenging in real-world, multi-floor environments and under open-vocabulary object descriptions. We observe that most episodes in widely used benchmarks such as HM3D and MP3D involve multi-floor buildings, with many requiring explicit floor transitions. However, existing methods are often limited to single-floor settings or predefined object categories. To address these limitations, we tackle two key challenges: (1) efficient cross-level planning and (2) zero-shot object-goal navigation (ZS-OGN), where agents must interpret novel object descriptions without prior exposure. We propose ASCENT, a framework that combines a Multi-Floor Spatial Abstraction module for hierarchical semantic mapping and a Coarse-to-Fine Frontier Reasoning module leveraging LLMs for context-aware exploration, without requiring additional training on new object semantics or locomotion data. Our method outperforms state-of-the-art ZS-OGN approaches on HM3D and MP3D benchmarks while enabling efficient multi-floor navigation. We further validate its practicality through real-world deployment on a quadruped robot, achieving successful object exploration across unseen floors.

Authors (9)

Zeying Gong (5 papers)
Rong Li (70 papers)
Tianshuai Hu (5 papers)
Ronghe Qiu (7 papers)
Lingdong Kong (49 papers)
Lingfeng Zhang (24 papers)
Yiyi Ding (1 paper)
Leying Zhang (9 papers)
Junwei Liang (47 papers)

Summary

The paper introduces ASCENT, a novel zero-shot Object-Goal Navigation framework for multi-floor environments that uses LLMs for interpreting open-vocabulary object descriptions without prior training.
ASCENT employs a Multi-Floor Spatial Abstraction module for hierarchical semantic mapping and a Coarse-to-Fine Frontier Reasoning module with LLM-driven contextual guidance to enhance exploration efficiency.
Evaluated in Habitat simulator (HM3D, MP3D) and real-world tests, ASCENT significantly outperformed state-of-the-art zero-shot methods in success rate and path efficiency.

Overview of "Stairway to Success: Zero-Shot Floor-Aware Object-Goal Navigation via LLM-Driven Coarse-to-Fine Exploration"

The paper "Stairway to Success: Zero-Shot Floor-Aware Object-Goal Navigation via LLM-Driven Coarse-to-Fine Exploration" presents a novel framework named ASCENT, which addresses challenges in zero-shot Object-Goal Navigation (OGN) in multi-floor environments using a coarse-to-fine exploration approach. The research focuses on two main areas: cross-level planning and leveraging LLMs for interpreting open-vocabulary object descriptions without prior exposure.

Approach and Methodology

ASCENT introduces a multi-component framework that combines a Multi-Floor Spatial Abstraction module and a Coarse-to-Fine Frontier Reasoning module. The Multi-Floor Spatial Abstraction module is responsible for hierarchical semantic mapping using a novel strategy that builds distinct floor-specific representations and models inter-floor connectivity. This is critical for enabling efficient path planning across different floors and ensuring robust navigation even in the absence of explicit vertical localization data.

The Coarse-to-Fine Frontier Reasoning module enhances the exploration efficiency by constructing a value map based on semantic similarity and exploration cost. This process allows for identifying high-priority frontiers, followed by LLM-driven contextual reasoning to guide the navigation process. The framework's reliance on LLMs instead of task-specific training or object detectors marks a shift towards more generalized and adaptable navigation systems.

Experimental Results and Conclusions

The ASCENT framework was evaluated using the Habitat simulator on HM3D and MP3D datasets. These datasets were chosen due to their complexity and the prevalence of multi-floor scenarios, which provide a rigorous testing ground for OGN methods. The experimental results reveal that ASCENT significantly outperforms state-of-the-art zero-shot OGN methods, achieving improvements in Success Rate (SR) and Success weighted by Path Length (SPL) on both datasets. Specifically, ASCENT achieved a 65.4% SR and a 33.5% SPL on HM3D, showcasing its efficacy.

Additionally, the research team tested the ASCENT framework in real-world settings using a quadruped robot. The real-world tests validated the framework's practicality and adaptability, as the robot successfully navigated complex environments without explicit pre-training on specific object categories or floor layouts.

Implications and Future Directions

The research offers substantial contributions to the field of embodied AI and robotics, particularly in the domain of autonomous navigation in complex and unstructured environments. By utilizing LLMs for context-aware planning, ASCENT demonstrates a scalable approach that can potentially be extended to other navigation and perception tasks. This approach allows for seamless adaptation to new environments, which is crucial for real-world applications such as domestic robotics and automated surveillance.

Further work could explore the integration of more advanced multi-modal sensory data to enhance navigation robustness in even more complex scenarios. Additionally, refining the framework's ability to dynamically adapt to unexpected environmental changes or additional object categories could expand its practical applications.

In conclusion, this paper represents a significant step forward in addressing the challenges of zero-shot, floor-aware object-goal navigation in multi-floor environments, leveraging the capacities of LLMs to enhance the flexibility and effectiveness of robotic systems.

Related Papers

Tweets

https://twitter.com/ldkong1205/status/1929888549786267697

YouTube

Show All Videos