Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TopoNav: Topological Navigation for Efficient Exploration in Sparse Reward Environments (2402.04061v3)

Published 6 Feb 2024 in cs.RO and cs.LG

Abstract: Autonomous robots exploring unknown environments face a significant challenge: navigating effectively without prior maps and with limited external feedback. This challenge intensifies in sparse reward environments, where traditional exploration techniques often fail. In this paper, we present TopoNav, a novel topological navigation framework that integrates active mapping, hierarchical reinforcement learning, and intrinsic motivation to enable efficient goal-oriented exploration and navigation in sparse-reward settings. TopoNav dynamically constructs a topological map of the environment, capturing key locations and pathways. A two-level hierarchical policy architecture, comprising a high-level graph traversal policy and low-level motion control policies, enables effective navigation and obstacle avoidance while maintaining focus on the overall goal. Additionally, TopoNav incorporates intrinsic motivation to guide exploration toward relevant regions and frontier nodes in the topological map, addressing the challenges of sparse extrinsic rewards. We evaluate TopoNav both in the simulated and real-world off-road environments using a Clearpath Jackal robot, across three challenging navigation scenarios: goal-reaching, feature-based navigation, and navigation in complex terrains. We observe an increase in exploration coverage by 7- 20%, in success rates by 9-19%, and reductions in navigation times by 15-36% across various scenarios, compared to state-of-the-art methods

This paper introduces TopoNav, a framework designed for autonomous robot navigation and exploration, particularly in unknown environments where rewards are sparse. The core problem addressed is enabling robots to efficiently build maps and navigate towards goals without prior environmental knowledge and with limited feedback, a scenario where traditional SLAM, planning, and many RL methods struggle.

TopoNav combines several key techniques:

  1. Active Topological Mapping: Instead of relying solely on geometric maps, TopoNav dynamically constructs a topological map G=(V,E)\mathcal{G} = (\mathcal{V}, \mathcal{E}). Nodes vVv \in \mathcal{V} represent significant places or landmarks (like trees or objects detected by the robot), and edges eEe \in \mathcal{E} represent navigable paths between them. This map is built incrementally as the robot explores.
  2. Hierarchical Reinforcement Learning (HRL): Based on the H-DQN architecture, TopoNav uses a two-level policy structure:
    • Meta-controller (πh\pi_h): Operates at a higher level of abstraction. It selects the next subgoal (a node gg in the topological map) based on the current state ss (robot's sensory input and map M\mathcal{M}) and the overall goal GgG_g. It uses a Q-network Qμ(s,g)Q^\mu(s, g).
    • Sub-controller (πl\pi_l): Operates at a lower level. Given the current state ss and the subgoal gg selected by the meta-controller, it learns a policy to execute primitive actions aAla \in \mathcal{A}_l (like movement commands) to reach that subgoal. It uses a set of Q-networks Qg(s,a)Q^g(s, a).
  3. Attention-based Feature Detection: A ResNet-50 CNN combined with a Convolutional Block Attention Module (CBAM) processes RGB images to identify potential landmarks, objects, or trees. These detected features are used to create nodes (subgoals) in the topological map.
  4. Dynamic Subgoal Generation and Selection: When features are detected, they are compared to existing map nodes. If novel, they become candidate subgoals. If multiple landmarks are detected at similar distances, a strategic selection process (Algorithm 1) prioritizes landmarks based on a weighted score combining:
    • Novelty: N(l)=eλvisits(l)N(l) = e^{-\lambda \cdot \text{visits}(l)}, encouraging exploration of less-visited areas.
    • Goal-Directedness: GD(l)=vlvgvlvgGD(l) = \frac{\mathbf{v}_l \cdot \mathbf{v}_g}{||\mathbf{v}_l|| \cdot ||\mathbf{v}_g||}, favoring landmarks aligned with the final goal direction. If no landmarks are detected nearby, a new subgoal is generated along the robot's current trajectory to ensure progress.
  5. Intrinsic Motivation Reward Structure: To combat sparse extrinsic rewards (only given for reaching the final goal RgoalR_{goal} or milestones RmilestoneR_{milestone}), TopoNav uses a dense reward signal composed of:
    • Intrinsic Rewards: Encouraging exploration (visiting novel states rinr_{in}), discovering new subgoals rsgr_{sg}, reaching frontier nodes rfer_{fe}, increasing explored area repr_{ep}, and reducing uncertainty ruer_{ue}.
    • Penalties: Discouraging inefficient behavior like revisiting states rpr_p, selecting similar subgoals rsdr_{sd} (based on Euclidean distance between feature vectors, Eq. \ref{eq:similarity}), long periods without exploration rter_{te}, and hitting obstacles robr_{ob}. The total reward combines extrinsic, intrinsic, and penalty terms using weighting factors α,β,γ\alpha, \beta, \gamma.

Implementation and Evaluation:

  • TopoNav was implemented using PyTorch and trained in simulation (Unity 3D with ROS Noetic, Clearpath Husky) before real-world deployment.
  • Real-world tests used a Clearpath Jackal UGV with LiDAR and camera. Obstacle avoidance used the Dynamic Window Approach (DWA).
  • Experiments were conducted in simulation and three real-world outdoor off-road scenarios: goal reaching (open space), feature-based navigation (natural features), and complex terrain (obstacles and landmarks).
  • Performance was measured by success rate, navigation time, trajectory length, and exploration coverage.
  • Baselines included PlaceNav (Suomela et al., 2023 ), TopoMap (Savinov et al., 2018 ), ViNG (Pauly et al., 2021 ), and LTVN [2203.082 topological].

Results:

  • TopoNav significantly outperformed baselines across all metrics and scenarios. Compared to the best baseline (LTVN), TopoNav showed improvements of 9-19% in success rates, 15-36% reduction in navigation times, and 7-20% increase in exploration coverage.
  • Specifically, success rates were 98% (Scenario 1), 94% (Scenario 2), and 92% (Scenario 3).
  • The CBAM attention module improved landmark detection accuracy by 8% compared to not using attention.
  • Ablation studies confirmed that removing the topological map, hierarchical structure, or attention module degraded performance, demonstrating the contribution of each component.

Conclusion:

TopoNav presents an effective framework for robot navigation in challenging sparse-reward environments by integrating dynamic topological mapping with HRL and intrinsic motivation. While successful, limitations include potential scalability issues in very large environments, suggesting future work on map compression and multi-robot coordination.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jumman Hossain (10 papers)
  2. Abu-Zaher Faridee (5 papers)
  3. Nirmalya Roy (25 papers)
  4. Jade Freeman (9 papers)
  5. Timothy Gregory (5 papers)
  6. Theron T. Trout (2 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com