Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 80 tok/s
Gemini 2.5 Pro 60 tok/s Pro
GPT-5 Medium 23 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 87 tok/s Pro
Kimi K2 173 tok/s Pro
GPT OSS 120B 433 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

ObjectReact: Learning Object-Relative Control for Visual Navigation (2509.09594v1)

Published 11 Sep 2025 in cs.RO, cs.AI, cs.CV, cs.LG, cs.SY, and eess.SY

Abstract: Visual navigation using only a single camera and a topological map has recently become an appealing alternative to methods that require additional sensors and 3D maps. This is typically achieved through an "image-relative" approach to estimating control from a given pair of current observation and subgoal image. However, image-level representations of the world have limitations because images are strictly tied to the agent's pose and embodiment. In contrast, objects, being a property of the map, offer an embodiment- and trajectory-invariant world representation. In this work, we present a new paradigm of learning "object-relative" control that exhibits several desirable characteristics: a) new routes can be traversed without strictly requiring to imitate prior experience, b) the control prediction problem can be decoupled from solving the image matching problem, and c) high invariance can be achieved in cross-embodiment deployment for variations across both training-testing and mapping-execution settings. We propose a topometric map representation in the form of a "relative" 3D scene graph, which is used to obtain more informative object-level global path planning costs. We train a local controller, dubbed "ObjectReact", conditioned directly on a high-level "WayObject Costmap" representation that eliminates the need for an explicit RGB input. We demonstrate the advantages of learning object-relative control over its image-relative counterpart across sensor height variations and multiple navigation tasks that challenge the underlying spatial understanding capability, e.g., navigating a map trajectory in the reverse direction. We further show that our sim-only policy is able to generalize well to real-world indoor environments. Code and supplementary material are accessible via project page: https://object-react.github.io/

Summary

  • The paper presents an object-relative control strategy that decouples image matching from trajectory prediction to achieve robust navigation.
  • It leverages a relative 3D scene graph and WayObject Costmap with Dijkstra's algorithm to plan trajectories, enhancing cross-embodiment generalization.
  • Experimental results on the Habitat-Matterport dataset demonstrate significant improvements in SPL and SSPL, indicating enhanced flexibility and scalability.

ObjectReact: Learning Object-Relative Control for Visual Navigation

Abstract

The paper "ObjectReact: Learning Object-Relative Control for Visual Navigation" proposes a novel approach to visual navigation using a single camera and a topological map. Traditional methods rely heavily on image-relative representations tied to the robot's pose and embodiment, which can be limiting. This paper introduces an object-relative control paradigm that allows for more flexible and robust navigation by leveraging object-level representations.

Introduction and Background

Visual navigation often uses dense 3D maps and sensors like LiDAR, which can be costly and complex. Alternatives, such as visual topological navigation, use simpler setups with a single camera, inspired by human navigation strategies. Earlier approaches, classified as image-relative, have limitations tied to the robot's pose and embodiment when utilizing image-based subgoal representations. These constraints affect flexibility and scalability, especially in varied environments or with different robot embodiments.

The paper proposes a shift toward object-relative control, decoupling control prediction from image matching and allowing cross-embodiment generalization. This is achieved through a relative 3D scene graph that represents object connectivity within and across images, enabling trajectory-invariant and robust navigation across a variety of tasks and environments.

Methodology

Mapping Phase: Relative 3D Scene Graph

The paper introduces a topometric map as a relative 3D scene graph, where image segments are used as object nodes linked by intra-image 3D Euclidean distances and inter-image object associations. Figure 1

Figure 1: Tasks: Each column shows a topdown view with the prior experience trajectory displayed as a purple path from the purple circle (start) to the green point (goal).

Execution Phase: Object Localizer, Global Planner, and Local Controller

The execution phase involves localizing query objects using the precomputed map and planning paths using Dijkstra's algorithm. The novel "WayObject Costmap" represents object path lengths and is used for control prediction, eliminating the need for explicit RGB inputs. Figure 2

Figure 2: Object-Relative Navigation Pipeline, illustrating mapping, execution, and training phases.

Training Phase: The ObjectReact Controller

The local controller, dubbed ObjectReact, conditions its trajectory rollouts on the WayObject Costmap. This approach improves upon traditional image-relative methods by utilizing an object-centric representation that provides contextual and spatial awareness without reliance on direct visual cues from RGB images.

Experimental Setup and Evaluation

The approach was tested using the Habitat-Matterport 3D dataset with varied tasks, including Imitate, Alt Goal, Shortcut, and Reverse tasks. The evaluation metrics included Success weighted by Path Length (SPL) and Soft-SPL (SSPL), with tests conducted on different robot embodiments.

Results and Discussion

The object-relative controller, ObjectReact, demonstrated significant performance improvements over image-relative baselines, especially in challenging tasks like Alt Goal and Reverse. The robustness to embodiment changes, such as sensor height differences between mapping and execution, highlighted the approach's flexibility. The results illustrate the advantages of object-centric representations, particularly in varied and dynamic environments. Figure 3

Figure 3: Examples of demonstration videos showing real-world and simulator deployments.

Conclusion

This research provides a compelling case for object-relative navigation, showcasing its potential for more adaptable and efficient robotic path planning in visual navigation tasks. The use of object-level connectivity and the WayObject Costmap allows for efficient generalization across different environments and embodiments, overcoming many limitations faced by traditional image-relative navigation methods. Future work could focus on integrating language-based goals or exploring more complex dynamic environments to further enhance the capabilities of robotic navigation systems.

In conclusion, ObjectReact opens new pathways for achieving sophisticated navigation strategies with reduced sensor and computational overhead, paving the way for practical deployment in real-world applications. Further enhancements in perception models and continued exploration of context-oriented navigational strategies would build on this foundation to achieve advanced autonomous navigation solutions.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 posts and received 9 likes.