WayObject Costmap for Object-Relative Navigation

Updated 13 September 2025

WayObject Costmap is a high-level spatial representation that fuses object segmentation masks with planner-computed path costs to inform navigation decisions.
It enables object-relative control by decoupling sensor embodiment, leading to improved generalization and consistent performance across varied environments.
The approach yields significant performance gains, as shown by improved SPL metrics and robust navigation in scenarios with sensor and environmental variability.

The WayObject Costmap is a high-level spatial representation that encodes object-level information for navigation tasks, distinct from conventional pixel- or image-relative costmaps by its focus on scene objects and their relationships. Its core feature is the synthesis of segmentation masks (e.g., derived via SAM or FastSAM) with planner-computed path costs that capture how "attractive" or viable each object is in reaching a navigation goal. The approach is central to the ObjectReact framework (Garg et al., 11 Sep 2025), which demonstrates how object-relative control, conditioned on this costmap, achieves improved generalization, cross-embodiment invariance, and robust spatial reasoning compared to traditional image-relative policies.

1. Formal Definition and Construction

The WayObject Costmap is constructed from the set of visible objects in the agent's sensory field. For each object:

A binary mask $M_{H \times W \times N_m}$ locates its pixels within the image (height $H$ , width $W$ , $N_m$ detected objects per frame).
The planner computes a global cost $l_i$ for each object node: this is typically the shortest-path distance within a topometric scene graph (a relative 3D connectivity structure) from object $i$ to the current goal.
Each cost $l_i$ is normalized and encoded into a $D$ -dimensional embedding via sine–cosine positional encoding:

$E(l)_i = \begin{cases} \sin\left(\frac{l}{Z^{i/D}}\right), & \text{if } i \text{ even} \ \cos\left(\frac{l}{Z^{(i-1)/D}}\right), & \text{if } i \text{ odd} \end{cases}$

The costmap itself is formed by broadcasting the object masks to the embedding space and aggregating:

$W_{H \times W \times D} = M_{H \times W \times N_m} @ E_{N_m \times D}$

This encoding ensures each pixel inherits the global planning cost of its associated object, generating a dense, spatial map that conditions control actions.

2. Role in Object-Relative Control

Unlike image-relative control—where actions are chosen by matching current views against subgoal images—object-relative control leverages the invariance of object locations and costs. The local policy is trained to predict navigation trajectories directly from the costmap rather than from explicit RGB images. This decouples control from pose and sensor embodiment, yielding higher generalization when the agent's viewpoint or embodiment changes (e.g., differences in sensor height or camera orientation).

3. Topometric Scene Graph and Global Planning Cost

The foundation of the WayObject Costmap is a topometric map—a graph where:

Nodes represent objects identified via segmentation and 3D localization (from monocular depth estimation or other sources).
Edges capture spatial relations as Euclidean distances between object pairs in 3D, providing richer spatial context than 2D connectivity.
Inter-image connections are added by feature matching (e.g., SuperPoint–LightGlue) so that object instances spanning multiple frames are merged, preserving identity across time.
Dijkstra's algorithm computes the shortest path from each node to the designated goal, with these costs driving the encoding in the costmap.

This paradigm explicitly leverages the world-invariant properties of detected objects, enabling policies that can generalize to novel environments and reversed or alternate trajectories.

4. Advantages and Applications

The main advantages of the WayObject Costmap approach include:

Embodiment invariance: Control policies conditioned on WayObject Costmaps are less sensitive to sensor configurations and can generalize across different robots or sensor heights.
Decoupling of control from image matching: The controller no longer requires reference subgoal images, simplifying both training and deployment.
Improved navigation performance in non-standard tasks: Significant improvements in tasks such as alternate goal reaching, shortcutting, or reverse trajectory execution are demonstrated, with SPL (Success weighted by Path Length) scores on tasks such as Alt Goal rising from 2.17 (image-relative) to 21.74 (object-relative) (Garg et al., 11 Sep 2025).
Enabling new routes without imitation: The costmap formalism allows exploring previously unseen routes, which image-relative methods cannot handle without explicit training data.

Applications are found in visual navigation where dense 3D maps are impractical, environments requiring cross-embodiment deployment, and tasks needing robust spatial understanding under changing viewpoints or object arrangements.

5. Experimental Validation

Extensive quantitative results confirm the superiority of WayObject Costmap-conditioned ObjectReact controllers:

On challenging navigation tasks (Alt Goal, Shortcut, Reverse), SPL and Soft-SPL metrics are markedly higher than image-relative baselines.
Embodiment variations (such as changing sensor height from 1.3m to 0.4m between mapping and execution) affect image-relative SPL scores by nearly 48 percentage points, but object-relative SPL remains robust.
Sim-only policies generalize to real robot deployments without significant degradation, due to the object-centric abstraction.

6. Context in Current Research

The adoption of object-centric costmaps reflects a shift toward higher-level spatial reasoning, moving beyond pixel-relative and raw image-based control. This direction exploits recent advances in segmentation (e.g., SAM, FastSAM), monocular depth estimation, topometric mapping, and embedding techniques, unifying semantic understanding with actionable planning cost. The framework described provides a systematic integration of these modalities to form an interpretable and robust input for learning policies that can react efficiently and safely in diverse, real-world environments.

7. Summary and Outlook

The WayObject Costmap synthesizes object segmentation and global planning cost in a spatially distributed representation, enabling embodiment-invariant, interpretable, and highly generalizable control for visual navigation. When coupled with object-relative policies, this approach shows marked improvements in performance across a spectrum of navigation tasks and demonstrates robust transfer and generalization to real-world robotic platforms. As the field increasingly moves toward complex, dynamic environments, the WayObject Costmap provides a foundational structure for scalable, object-aware visual navigation systems.

Markdown Upgrade to Chat

References (1)

ObjectReact: Learning Object-Relative Control for Visual Navigation (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to WayObject Costmap.