This paper introduces TopoNav, a framework designed for autonomous robot navigation and exploration, particularly in unknown environments where rewards are sparse. The core problem addressed is enabling robots to efficiently build maps and navigate towards goals without prior environmental knowledge and with limited feedback, a scenario where traditional SLAM, planning, and many RL methods struggle.
TopoNav combines several key techniques:
- Active Topological Mapping: Instead of relying solely on geometric maps, TopoNav dynamically constructs a topological map . Nodes represent significant places or landmarks (like trees or objects detected by the robot), and edges represent navigable paths between them. This map is built incrementally as the robot explores.
- Hierarchical Reinforcement Learning (HRL): Based on the H-DQN architecture, TopoNav uses a two-level policy structure:
- Meta-controller (): Operates at a higher level of abstraction. It selects the next subgoal (a node in the topological map) based on the current state (robot's sensory input and map ) and the overall goal . It uses a Q-network .
- Sub-controller (): Operates at a lower level. Given the current state and the subgoal selected by the meta-controller, it learns a policy to execute primitive actions (like movement commands) to reach that subgoal. It uses a set of Q-networks .
- Attention-based Feature Detection: A ResNet-50 CNN combined with a Convolutional Block Attention Module (CBAM) processes RGB images to identify potential landmarks, objects, or trees. These detected features are used to create nodes (subgoals) in the topological map.
- Dynamic Subgoal Generation and Selection: When features are detected, they are compared to existing map nodes. If novel, they become candidate subgoals. If multiple landmarks are detected at similar distances, a strategic selection process (Algorithm 1) prioritizes landmarks based on a weighted score combining:
- Novelty: , encouraging exploration of less-visited areas.
- Goal-Directedness: , favoring landmarks aligned with the final goal direction. If no landmarks are detected nearby, a new subgoal is generated along the robot's current trajectory to ensure progress.
- Intrinsic Motivation Reward Structure: To combat sparse extrinsic rewards (only given for reaching the final goal or milestones ), TopoNav uses a dense reward signal composed of:
- Intrinsic Rewards: Encouraging exploration (visiting novel states ), discovering new subgoals , reaching frontier nodes , increasing explored area , and reducing uncertainty .
- Penalties: Discouraging inefficient behavior like revisiting states , selecting similar subgoals (based on Euclidean distance between feature vectors, Eq. \ref{eq:similarity}), long periods without exploration , and hitting obstacles . The total reward combines extrinsic, intrinsic, and penalty terms using weighting factors .
Implementation and Evaluation:
- TopoNav was implemented using PyTorch and trained in simulation (Unity 3D with ROS Noetic, Clearpath Husky) before real-world deployment.
- Real-world tests used a Clearpath Jackal UGV with LiDAR and camera. Obstacle avoidance used the Dynamic Window Approach (DWA).
- Experiments were conducted in simulation and three real-world outdoor off-road scenarios: goal reaching (open space), feature-based navigation (natural features), and complex terrain (obstacles and landmarks).
- Performance was measured by success rate, navigation time, trajectory length, and exploration coverage.
- Baselines included PlaceNav (Suomela et al., 2023 ), TopoMap (Savinov et al., 2018 ), ViNG (Pauly et al., 2021 ), and LTVN [2203.082 topological].
Results:
- TopoNav significantly outperformed baselines across all metrics and scenarios. Compared to the best baseline (LTVN), TopoNav showed improvements of 9-19% in success rates, 15-36% reduction in navigation times, and 7-20% increase in exploration coverage.
- Specifically, success rates were 98% (Scenario 1), 94% (Scenario 2), and 92% (Scenario 3).
- The CBAM attention module improved landmark detection accuracy by 8% compared to not using attention.
- Ablation studies confirmed that removing the topological map, hierarchical structure, or attention module degraded performance, demonstrating the contribution of each component.
Conclusion:
TopoNav presents an effective framework for robot navigation in challenging sparse-reward environments by integrating dynamic topological mapping with HRL and intrinsic motivation. While successful, limitations include potential scalability issues in very large environments, suggesting future work on map compression and multi-robot coordination.