Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Long Range Navigator (LRN): Extending robot planning horizons beyond metric maps (2504.13149v1)

Published 17 Apr 2025 in cs.RO

Abstract: A robot navigating an outdoor environment with no prior knowledge of the space must rely on its local sensing to perceive its surroundings and plan. This can come in the form of a local metric map or local policy with some fixed horizon. Beyond that, there is a fog of unknown space marked with some fixed cost. A limited planning horizon can often result in myopic decisions leading the robot off course or worse, into very difficult terrain. Ideally, we would like the robot to have full knowledge that can be orders of magnitude larger than a local cost map. In practice, this is intractable due to sparse sensing information and often computationally expensive. In this work, we make a key observation that long-range navigation only necessitates identifying good frontier directions for planning instead of full map knowledge. To this end, we propose Long Range Navigator (LRN), that learns an intermediate affordance representation mapping high-dimensional camera images to `affordable' frontiers for planning, and then optimizing for maximum alignment with the desired goal. LRN notably is trained entirely on unlabeled ego-centric videos making it easy to scale and adapt to new platforms. Through extensive off-road experiments on Spot and a Big Vehicle, we find that augmenting existing navigation stacks with LRN reduces human interventions at test-time and leads to faster decision making indicating the relevance of LRN. https://personalrobotics.github.io/lrn

Summary

  • The paper introduces LRN, a bi-level system that leverages learned visual affordance predictions to select intermediate navigation targets and reduce human interventions.
  • It employs a novel training approach using unlabeled egocentric videos and CoTracker-generated heatmaps to guide the lightweight decoder in predicting navigable frontiers.
  • Experiments on Boston Dynamics Spot and Racer Heavy robots demonstrate that LRN improves path efficiency and generalizes well across diverse outdoor environments.

This paper introduces the Long Range Navigator (LRN), a system designed to improve long-range navigation for mobile robots in unknown outdoor environments, particularly when the goal is significantly farther than the robot's local sensing and mapping capabilities. The core problem addressed is that traditional navigation systems relying on local metric maps often make myopic decisions, assigning a fixed cost to unknown space beyond their sensing horizon, which can lead to inefficient or unsafe paths.

The key insight is that generating a complete, large-scale map is often intractable and unnecessary for long-range navigation. Instead, identifying visually "affordable" frontiers—distant areas that appear navigable and allow continued progress—is sufficient. LRN learns an intermediate affordance representation directly from camera images to predict these affordable frontiers.

LRN operates as a bi-level system integrated with a standard local navigation stack (perception, planning, control):

  1. Affordance Backbone: This component takes egocentric camera images as input. It uses a frozen image encoder (like SAM2 or MobileSAM) followed by a trained lightweight decoder to predict affordance heatmaps in the image space. These heatmaps highlight regions likely corresponding to affordable frontiers, independent of the final goal.
  2. Goal Conditioned Head: The predicted affordance heatmaps are projected into a discrete set of possible robot headings (angular bins). These affordance scores are then combined (multiplied) with two Gaussian distributions: one centered around the direction to the final goal (gtg_t) and another centered around the heading selected in the previous timestep (fπf_\pi) to encourage temporal consistency. The heading with the maximum combined score is selected as the intermediate target (fπf_\pi) for the local planner.

A significant aspect of LRN is its training methodology. To avoid tedious manual labeling, it leverages unlabeled egocentric videos (e.g., from human walking). The CoTracker model [karaev2023cotracker] is used to track points in the video. By running CoTracker in reverse, the end-point of a visible trajectory segment is identified in the initial frame's image space and labeled as an "affordable hotspot" (score 1), while the path leading to it is labeled 0. This automatically generated dataset of (image, heatmap) pairs is used to train the Affordance Backbone's decoder via supervised learning (MSE loss).

Implementation Details:

  • Platforms: Tested on a Boston Dynamics Spot quadruped and a large tracked vehicle (Racer Heavy).
  • Sensors: Primarily uses RGB cameras (front/rear fisheye on Spot, multiple front/rear on Racer Heavy). Local planners use depth cameras/lidar for local mapping.
  • Models: SAM2 ravi2024sam2 or MobileSAM mobile_sam as image encoders, CoTracker [karaev2023cotracker] for automatic labeling.
  • Local Stacks: Spot used Elevation Mapping CuPy [miki2022elevation] with ARA* [likhachev2003]; Racer Heavy used a custom optimized stack with a search-based planner. Both plan within a limited horizon (16m map for Spot, 50m radius for Racer Heavy).
  • LRN Output: The chosen heading fπf_\pi is provided to the local planner as a target point just outside the local map boundary in that direction.
  • Performance: LRN runs at ~4Hz on a Jetson Orin AGX alongside the local navigation stack.
  • Parameters: Key hyperparameters include heatmap threshold (hthreshh_{thresh}), EMA filter weight (α\alpha), and standard deviations for goal (σg\sigma_g) and previous heading (σp\sigma_p) Gaussians. Values are provided for Spot and Racer Heavy setups.

Experiments and Results:

LRN was evaluated against baselines including:

  • Goal Heuristic: Plans towards the goal using only the local map, treating unknown space uniformly (implemented via LRN's goal conditioning head with uniform affordance scores).
  • NoMaD: A visual navigation policy [sridhar2023].
  • Traversability + Depth Anything V2: Combines a learned visual traversability model with monocular depth estimation [yang2024] to generate heatmaps.

Key findings:

  • Efficiency (Q1): LRN significantly reduced human interventions compared to baselines across various test courses (Dump, Night, Helipad for Spot; long-distance course for Racer Heavy). It often resulted in faster navigation or shorter paths by making earlier decisions to avoid large obstacles (e.g., walls, dense trees) invisible to the local map. On the Racer Heavy, LRN completed a 660m course without intervention, while the Goal Heuristic required a 60m intervention after getting stuck.
  • Affordance Quality (Q2): Offline evaluation showed LRN's heatmap predictions correlated better with human labels (using metrics like AUROC, F1) compared to the Traversability+Depth baseline. An ablation paper varying the heatmap threshold (hthreshh_{thresh}) showed a correlation between affordance quality (finding an optimal threshold) and navigation efficiency (distance traveled).
  • Generalization (Q3): LRN demonstrated generalization capabilities. The Racer Heavy model was trained on data from a different vehicle. The Spot model, trained on human walking videos during the day, successfully navigated at night, attributed to the robustness of the SAM-based embeddings.

Limitations:

  • LRN does not explicitly reason about depth, relying on the assumption that angular proximity to the goal in the image corresponds to spatial proximity. This can cause wandering in open areas with multiple equidistant hotspots.
  • Fluctuations in heatmap scores can lead to switching behavior between headings, although mitigated somewhat by EMA filtering and consistency weighting.
  • Automated labeling using CoTracker can introduce noise or inaccuracies (e.g., placing heat near obstacle edges), potentially impacting performance.
  • As a heuristic approach, LRN cannot guarantee finding the optimal path as it lacks complete environmental knowledge.

The paper concludes that LRN offers a practical way to extend robot planning horizons beyond local metric maps by learning an intermediate affordance representation from vision, leading to less myopic and more robust long-range navigation with fewer human interventions.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com