TopoNav: Topological Navigation for Efficient Exploration in Sparse Reward Environments (2402.04061v3)

Published 6 Feb 2024 in cs.RO and cs.LG

Abstract: Autonomous robots exploring unknown environments face a significant challenge: navigating effectively without prior maps and with limited external feedback. This challenge intensifies in sparse reward environments, where traditional exploration techniques often fail. In this paper, we present TopoNav, a novel topological navigation framework that integrates active mapping, hierarchical reinforcement learning, and intrinsic motivation to enable efficient goal-oriented exploration and navigation in sparse-reward settings. TopoNav dynamically constructs a topological map of the environment, capturing key locations and pathways. A two-level hierarchical policy architecture, comprising a high-level graph traversal policy and low-level motion control policies, enables effective navigation and obstacle avoidance while maintaining focus on the overall goal. Additionally, TopoNav incorporates intrinsic motivation to guide exploration toward relevant regions and frontier nodes in the topological map, addressing the challenges of sparse extrinsic rewards. We evaluate TopoNav both in the simulated and real-world off-road environments using a Clearpath Jackal robot, across three challenging navigation scenarios: goal-reaching, feature-based navigation, and navigation in complex terrains. We observe an increase in exploration coverage by 7- 20%, in success rates by 9-19%, and reductions in navigation times by 15-36% across various scenarios, compared to state-of-the-art methods

PDF HTML Abstract

This paper introduces TopoNav, a framework designed for autonomous robot navigation and exploration, particularly in unknown environments where rewards are sparse. The core problem addressed is enabling robots to efficiently build maps and navigate towards goals without prior environmental knowledge and with limited feedback, a scenario where traditional SLAM, planning, and many RL methods struggle.

TopoNav combines several key techniques:

Active Topological Mapping: Instead of relying solely on geometric maps, TopoNav dynamically constructs a topological map $\mathcal{G} = (\mathcal{V}, \mathcal{E})$ . Nodes $v \in \mathcal{V}$ represent significant places or landmarks (like trees or objects detected by the robot), and edges $e \in \mathcal{E}$ represent navigable paths between them. This map is built incrementally as the robot explores.
Hierarchical Reinforcement Learning (HRL): Based on the H-DQN architecture, TopoNav uses a two-level policy structure:
- Meta-controller ( $\pi_h$ ): Operates at a higher level of abstraction. It selects the next subgoal (a node $g$ in the topological map) based on the current state $s$ (robot's sensory input and map $\mathcal{M}$ ) and the overall goal $G_g$ . It uses a Q-network $Q^\mu(s, g)$ .
- Sub-controller ( $\pi_l$ ): Operates at a lower level. Given the current state $s$ and the subgoal $g$ selected by the meta-controller, it learns a policy to execute primitive actions $a \in \mathcal{A}_l$ (like movement commands) to reach that subgoal. It uses a set of Q-networks $Q^g(s, a)$ .
Attention-based Feature Detection: A ResNet-50 CNN combined with a Convolutional Block Attention Module (CBAM) processes RGB images to identify potential landmarks, objects, or trees. These detected features are used to create nodes (subgoals) in the topological map.
Dynamic Subgoal Generation and Selection: When features are detected, they are compared to existing map nodes. If novel, they become candidate subgoals. If multiple landmarks are detected at similar distances, a strategic selection process (Algorithm 1) prioritizes landmarks based on a weighted score combining:
- Novelty: $N(l) = e^{-\lambda \cdot \text{visits}(l)}$ , encouraging exploration of less-visited areas.
- Goal-Directedness: $GD(l) = \frac{\mathbf{v}_l \cdot \mathbf{v}_g}{||\mathbf{v}_l|| \cdot ||\mathbf{v}_g||}$ , favoring landmarks aligned with the final goal direction. If no landmarks are detected nearby, a new subgoal is generated along the robot's current trajectory to ensure progress.
Intrinsic Motivation Reward Structure: To combat sparse extrinsic rewards (only given for reaching the final goal $R_{goal}$ $R_{g o a l}$ or milestones $R_{milestone}$ $R_{mi l es t o n e}$ ), TopoNav uses a dense reward signal composed of:
- Intrinsic Rewards: Encouraging exploration (visiting novel states $r_{in}$ ), discovering new subgoals $r_{sg}$ , reaching frontier nodes $r_{fe}$ , increasing explored area $r_{ep}$ , and reducing uncertainty $r_{ue}$ .
- Penalties: Discouraging inefficient behavior like revisiting states $r_p$ , selecting similar subgoals $r_{sd}$ (based on Euclidean distance between feature vectors, Eq. \ref{eq:similarity}), long periods without exploration $r_{te}$ , and hitting obstacles $r_{ob}$ . The total reward combines extrinsic, intrinsic, and penalty terms using weighting factors $\alpha, \beta, \gamma$ .

Implementation and Evaluation:

TopoNav was implemented using PyTorch and trained in simulation (Unity 3D with ROS Noetic, Clearpath Husky) before real-world deployment.
Real-world tests used a Clearpath Jackal UGV with LiDAR and camera. Obstacle avoidance used the Dynamic Window Approach (DWA).
Experiments were conducted in simulation and three real-world outdoor off-road scenarios: goal reaching (open space), feature-based navigation (natural features), and complex terrain (obstacles and landmarks).
Performance was measured by success rate, navigation time, trajectory length, and exploration coverage.
Baselines included PlaceNav (Suomela et al., 2023 ), TopoMap (Savinov et al., 2018 ), ViNG (Pauly et al., 2021 ), and LTVN [2203.082 topological].

Results:

TopoNav significantly outperformed baselines across all metrics and scenarios. Compared to the best baseline (LTVN), TopoNav showed improvements of 9-19% in success rates, 15-36% reduction in navigation times, and 7-20% increase in exploration coverage.
Specifically, success rates were 98% (Scenario 1), 94% (Scenario 2), and 92% (Scenario 3).
The CBAM attention module improved landmark detection accuracy by 8% compared to not using attention.
Ablation studies confirmed that removing the topological map, hierarchical structure, or attention module degraded performance, demonstrating the contribution of each component.

Conclusion:

TopoNav presents an effective framework for robot navigation in challenging sparse-reward environments by integrating dynamic topological mapping with HRL and intrinsic motivation. While successful, limitations include potential scalability issues in very large environments, suggesting future work on map compression and multi-robot coordination.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Jumman Hossain (10 papers)
Abu-Zaher Faridee (5 papers)
Nirmalya Roy (25 papers)
Jade Freeman (9 papers)
Timothy Gregory (5 papers)
Theron T. Trout (2 papers)

Citations (1)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/OWW/status/1849251266267869536

https://twitter.com/OWW/status/1773842245428322815